在查询3中,您基本上是在针对mybigtable的每一行针对其自身执行子查询。
为避免这种情况,您需要进行两个主要更改:
主要变更#1:重构查询
这是您的原始查询
Select count(*) as total from mybigtable
where account_id=123 and email IN
(select distinct email from mybigtable where account_id=345)
你可以试试
select count(*) EmailCount from
(
select tbl123.email from
(select email from mybigtable where account_id=123) tbl123
INNER JOIN
(select distinct email from mybigtable where account_id=345) tbl345
using (email)
) A;
或每封电子邮件的计数
select email,count(*) EmailCount from
(
select tbl123.email from
(select email from mybigtable where account_id=123) tbl123
INNER JOIN
(select distinct email from mybigtable where account_id=345) tbl345
using (email)
) A group by email;
主要变化#2:正确编制索引
我认为您已经拥有了此功能,因为查询1和查询2运行很快。确保在(account_id,email)上具有复合索引。做SHOW CREATE TABLE mybigtable\G
,并确保您有一个。如果您没有它或不确定,那么无论如何都要创建索引:
ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);
更新2012-03-07 13:26 EST
如果要执行NOT IN(),请将更改INNER JOIN
为a LEFT JOIN
并检查右侧是否为NULL,如下所示:
select count(*) EmailCount from
(
select tbl123.email from
(select email from mybigtable where account_id=123) tbl123
LEFT JOIN
(select distinct email from mybigtable where account_id=345) tbl345
using (email)
WHERE tbl345.email IS NULL
) A;
更新2012-03-07 14:13 EST
请阅读有关执行JOIN的这两个链接
这是一部很棒的YouTube视频,我从中学习了如何重构查询以及该书所基于的书