MySQL子查询速度大大降低,但它们独立运行良好


8

查询1:

select distinct email from mybigtable where account_id=345

需要0.1秒

查询2:

Select count(*) as total from mybigtable where account_id=123 and email IN (<include all from above result>)

需要0.2秒

查询3:

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)

花费22分钟和90%的时间处于“准备”状态。为什么要花这么多时间。

表是在MySQL 5.0上具有320万行的innodb

Answers:


8

在查询3中,您基本上是在针对mybigtable的每一行针对其自身执行子查询。

为避免这种情况,您需要进行两个主要更改:

主要变更#1:重构查询

这是您的原始查询

Select count(*) as total from mybigtable
where account_id=123 and email IN
(select distinct email from mybigtable where account_id=345)

你可以试试

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A;

或每封电子邮件的计数

select email,count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A group by email;

主要变化#2:正确编制索引

我认为您已经拥有了此功能,因为查询1和查询2运行很快。确保在(account_id,email)上具有复合索引。做SHOW CREATE TABLE mybigtable\G,并确保您有一个。如果您没有它或不确定,那么无论如何都要创建索引:

ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);

更新2012-03-07 13:26 EST

如果要执行NOT IN(),请将更改INNER JOIN为a LEFT JOIN并检查右侧是否为NULL,如下所示:

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    LEFT JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
    WHERE tbl345.email IS NULL
) A;

更新2012-03-07 14:13 EST

请阅读有关执行JOIN的这两个链接

这是一部很棒的YouTube视频,我从中学习了如何重构查询以及该书所基于的书


9

在MySQL中,将对外部查询中的每一行重新执行IN子句中的子选择,从而创建O(n ^ 2)。简而言之,不要使用IN(SELECT)。


1
  1. 您在account_id上有索引吗?

  2. 第二个问题可能是嵌套子查询在5.0中的性能很差。

  3. 带having子句的GROUP BY比DISTINCT快。

  4. 您正在尝试做什么?除了第3项之外,最好通过联接来完成?


1

处理IN()子查询(例如您的子查询)时涉及很多处理。您可以在此处了解更多信息。

我的第一个建议是尝试将子查询重写为JOIN。类似于(未经测试)的内容:

SELECT COUNT(*) AS total FROM mybigtable AS t1
 INNER JOIN 
   (SELECT DISTINCT email FROM mybigtable WHERE account_id=345) AS t2 
   ON t2.email=t1.email
WHERE account_id=123
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.