8

查询1：

select distinct email from mybigtable where account_id=345

需要0.1秒

查询2：

Select count(*) as total from mybigtable where account_id=123 and email IN (<include all from above result>)

需要0.2秒

查询3：

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)

花费22分钟和90％的时间处于“准备”状态。为什么要花这么多时间。

表是在MySQL 5.0上具有320万行的innodb

— w
source

8

在查询3中，您基本上是在针对mybigtable的每一行针对其自身执行子查询。

为避免这种情况，您需要进行两个主要更改：

主要变更＃1：重构查询

这是您的原始查询

Select count(*) as total from mybigtable
where account_id=123 and email IN
(select distinct email from mybigtable where account_id=345)

你可以试试

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A;

或每封电子邮件的计数

select email,count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A group by email;

主要变化＃2：正确编制索引

我认为您已经拥有了此功能，因为查询1和查询2运行很快。确保在（account_id，email）上具有复合索引。做SHOW CREATE TABLE mybigtable\G，并确保您有一个。如果您没有它或不确定，那么无论如何都要创建索引：

ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);

更新2012-03-07 13:26 EST

如果要执行NOT IN（），请将更改INNER JOIN为a LEFT JOIN并检查右侧是否为NULL，如下所示：

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    LEFT JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
    WHERE tbl345.email IS NULL
) A;

更新2012-03-07 14:13 EST

请阅读有关执行JOIN的这两个链接

这是一部很棒的YouTube视频，我从中学习了如何重构查询以及该书所基于的书

— 罗兰多·MySQLDBA
source

9

在MySQL中，将对外部查询中的每一行重新执行IN子句中的子选择，从而创建O（n ^ 2）。简而言之，不要使用IN（SELECT）。

— 亚伦·布朗
source

1

您在account_id上有索引吗？
第二个问题可能是嵌套子查询在5.0中的性能很差。
带having子句的GROUP BY比DISTINCT快。
您正在尝试做什么？除了第3项之外，最好通过联接来完成？

— 斯蒂芬·森科马戈·穆索克
source

1

处理IN（）子查询（例如您的子查询）时涉及很多处理。您可以在此处了解更多信息。

我的第一个建议是尝试将子查询重写为JOIN。类似于（未经测试）的内容：

SELECT COUNT(*) AS total FROM mybigtable AS t1
 INNER JOIN 
   (SELECT DISTINCT email FROM mybigtable WHERE account_id=345) AS t2 
   ON t2.email=t1.email
WHERE account_id=123

— 德里克·唐尼
source

MySQL子查询速度大大降低，但它们独立运行良好

主要变更＃1：重构查询

主要变化＃2：正确编制索引

更新2012-03-07 13:26 EST

更新2012-03-07 14:13 EST