如何进一步优化此MySQL查询?


9

我的查询要花特别长的时间(15+秒),并且随着时间的推移,随着数据集的增长,查询只会变得越来越糟。我过去对此进行了优化,并添加了索引,代码级排序和其他优化,但是还需要进一步完善。

SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM `sounds` 
INNER JOIN ratings ON sounds.id = ratings.rateable_id 
WHERE (ratings.rateable_type = 'Sound' 
   AND sounds.blacklisted = false 
   AND sounds.ready_for_deployment = true 
   AND sounds.deployed = true 
   AND sounds.type = "Sound" 
   AND sounds.created_at > "2011-03-26 21:25:49") 
GROUP BY ratings.rateable_id

查询的目的是让我获得sound id以及最近发布的声音的平均评分。大约有1500种声音和200万种评级。

我有几个指数 sounds

mysql> show index from sounds;
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| Table  | Non_unique | Key_name                                 | Seq_in_index | Column_name          | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| sounds |          0 | PRIMARY                                  |            1 | id                   | A         |        1388 |     NULL | NULL   |      | BTREE      |         | 
| sounds |          1 | sounds_ready_for_deployment_and_deployed |            1 | deployed             | A         |           5 |     NULL | NULL   | YES  | BTREE      |         | 
| sounds |          1 | sounds_ready_for_deployment_and_deployed |            2 | ready_for_deployment | A         |          12 |     NULL | NULL   | YES  | BTREE      |         | 
| sounds |          1 | sounds_name                              |            1 | name                 | A         |        1388 |     NULL | NULL   |      | BTREE      |         | 
| sounds |          1 | sounds_description                       |            1 | description          | A         |        1388 |      128 | NULL   | YES  | BTREE      |         | 
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+

还有几个 ratings

mysql> show index from ratings;
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| Table   | Non_unique | Key_name                                | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| ratings |          0 | PRIMARY                                 |            1 | id          | A         |     2008251 |     NULL | NULL   |      | BTREE      |         | 
| ratings |          1 | index_ratings_on_rateable_id_and_rating |            1 | rateable_id | A         |          18 |     NULL | NULL   |      | BTREE      |         | 
| ratings |          1 | index_ratings_on_rateable_id_and_rating |            2 | rating      | A         |        9297 |     NULL | NULL   | YES  | BTREE      |         | 
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

这里是 EXPLAIN

mysql> EXPLAIN SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM sounds INNER JOIN ratings ON sounds.id = ratings.rateable_id WHERE (ratings.rateable_type = 'Sound' AND sounds.blacklisted = false AND sounds.ready_for_deployment = true AND sounds.deployed = true AND sounds.type = "Sound" AND sounds.created_at > "2011-03-26 21:25:49") GROUP BY ratings.rateable_id;
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
| id | select_type | table   | type   | possible_keys                                    | key                                     | key_len | ref                                     | rows    | Extra       |
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
|  1 | SIMPLE      | ratings | index  | index_ratings_on_rateable_id_and_rating          | index_ratings_on_rateable_id_and_rating | 9       | NULL                                    | 2008306 | Using where | 
|  1 | SIMPLE      | sounds  | eq_ref | PRIMARY,sounds_ready_for_deployment_and_deployed | PRIMARY                                 | 4       | redacted_production.ratings.rateable_id |       1 | Using where | 
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+-------------+

我确实会缓存获得的结果,因此站点性能不是什么大问题,但是由于此调用花费的时间太长,因此缓存预热器的运行时间越来越长,这已成为一个问题。在一个查询中似乎没有很多数字可以处理...

我还能做些什么来使它更好地执行


你能显示EXPLAIN输出吗?EXPLAIN SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM sounds INNER JOIN ratings ON sounds.id = ratings.rateable_id WHERE (ratings.rateable_type = 'Sound' AND sounds.blacklisted = false AND sounds.ready_for_deployment = true AND sounds.deployed = true AND sounds.type = "Sound" AND sounds.created_at > "2011-03-26 21:25:49") GROUP BY ratings.rateable_id
德里克·唐尼

@coneybeare今天对我来说这是一个非常有趣的挑战!为您的问题+1。我希望在不久的将来会出现更多类似的问题。
RolandoMySQLDBA 2011年

@coneybeare看起来新的EXPLAIN仅读取21540行(359 X 60),而不是2,008,306。请对我最初在答案中建议的查询运行EXPLAIN。我希望看到由此产生的行数。
RolandoMySQLDBA

@RolandoMySQLDBA新的解释的确确实显示了带有索引的行数较少,但是,执行查询的时间仍然约为15秒,显示没有任何改进
锥体

@coneybeare我微调了查询。请对我的新查询运行EXPLAIN。我将其附加到答案中。
RolandoMySQLDBA

Answers:


7

查看查询,表和WHERE AND GROUP BY子句后,我建议以下内容:

建议#1)重构查询

我重新组织了查询以执行三(3)件事:

  1. 创建较小的临时表
  2. 在这些临时表上处理WHERE子句
  3. 延迟加入到最后

这是我建议的查询:

SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);

建议#2)使用适合WHERE子句的索引来索引声音表

该索引的列包括WHERE子句中的所有列,这些列的第一个为静态值,最后一个为移动目标

ALTER TABLE sounds ADD INDEX support_index
(blacklisted,ready_for_deployment,deployed,type,created_at);

我真诚地相信您会感到惊喜。试试看 !!!

更新2011-05-21 19:04

我刚刚看到基数。哎呀!rateable_id的基数为1。男孩,我觉得很蠢!

更新2011-05-21 19:20

也许编制索引就足以改善事情。

更新2011-05-21 22:56

请运行以下命令:

EXPLAIN SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);

更新2011-05-21 23:34

我再次重构了。请尝试以下一项:

EXPLAIN
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
;

更新2011-05-21 23:55

我再次重构了。请尝试一个(最后一次):

EXPLAIN
  SELECT A.id,avg(B.rating) AS avg_rating, count(B.rating) AS votes FROM
  (
    SELECT BB.* FROM
    (
      SELECT id FROM sounds
      WHERE blacklisted = false 
      AND   ready_for_deployment = true 
      AND   deployed = true 
      AND   type = "Sound" 
      AND   created_at > '2011-03-26 21:25:49'
    ) AA INNER JOIN sounds BB USING (id)
  ) A INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) B
  ON A.id = B.rateable_id
  GROUP BY B.rateable_id;

更新2011-05-22 00:12

我讨厌放弃!

EXPLAIN
  SELECT A.*,avg(B.rating) AS avg_rating, count(B.rating) AS votes FROM
  (
    SELECT BB.* FROM
    (
      SELECT id FROM sounds
      WHERE blacklisted = false 
      AND   ready_for_deployment = true 
      AND   deployed = true 
      AND   type = "Sound" 
      AND   created_at > '2011-03-26 21:25:49'
    ) AA INNER JOIN sounds BB USING (id)
  ) A,
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
    AND AAA.rateable_id = A.id
  ) B
  GROUP BY B.rateable_id;

更新2011-05-22 07:51

令人困扰的是,EXPLAIN中的评分又回到了200万行。然后,它打击了我。您可能需要在评分表上以rateable_type开头的另一个索引:

ALTER TABLE ratings ADD INDEX
rateable_type_rateable_id_ndx (rateable_type,rateable_id);

该索引的目标是减少操纵评级的临时表,使其小于200万。如果我们可以使该临时表大大减小(至少减少一半),那么我们可以对您的查询有一个更好的希望,并且我的工作也可以更快。

建立该索引后,请重试我最初提出的查询并尝试以下操作:

SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);

更新2011-05-22 18:39:最后的词

我已经在存储过程中重构了一个查询,并添加了一个索引来帮助回答有关加快速度的问题。我得到6票赞成票,答案被接受,并获得200英镑的赏金。

我还重构了另一个查询(边际结果)并添加了索引(戏剧性结果)。我得到2票赞成票,并接受了答案。

我为另一个查询挑战添加了索引,并被投票一次

现在是你的问题

想要回答诸如此类的所有问题(包括您的问题),都受到了我在重构查询中观看的YouTube视频的启发。

再次感谢您,@ coneybeare!我想尽可能地回答这个问题,而不仅仅是接受观点或赞美。现在,我可以感觉到我赢得了积分!


我添加了索引,但时间没有任何改善。这是新的解释:cloud.coneybeare.net/6y7c
cubeybeare

关于建议1的查询的解释:cloud.coneybeare.net/6xZ2运行此查询大约花了30秒
锥状石棉2011年

由于某些原因,我确实不得不稍微修改一下语法(我在第一个查询之前添加了FROM,并且必须摆脱AAA别名)。解释如下:cloud.coneybeare.net/6xlq 实际查询花费了大约30秒的时间来运行
cubeybeare 2011年

@RolandoMySQLDBA:说明你的23:55更新:cloud.coneybeare.net/6wrN的实际查询跑过去一分钟,所以我杀了过程
coneybeare

第二个内部选择无法访问A选择表,因此A.id引发错误。
康尼比尔2011年

3

感谢您的EXPLAIN输出。从该声明中可以看出,之所以需要这么长时间,是因为评级表上的全表扫描。WHERE语句中没有任何内容过滤掉200万行。

您可以在rating.type上添加一个索引,但是我猜想CARDINALITY会非常低,并且您仍然会在上扫描很多行ratings

另外,您可以尝试使用索引提示来强制mysql使用声音索引。

更新:

如果是我,我会添加一个索引,sounds.created因为它有最好的机会过滤行,并且可能会迫使mysql查询优化器使用声音表索引。只是提防使用较长创建时间范围(1年零3个月,仅取决于声音表的大小)的查询。


您的建议似乎对@coneybeare很有用。我也+1。
RolandoMySQLDBA 2011年

创建的索引没有任何时间被刮掉。这是更新的说明。cloud.coneybeare.net/6xvc
–conyybeare

2

如果这必须是“即时”可用的查询,那么这会稍微限制您的选择。

我将建议分而治之。

--
-- Create an in-memory table
CREATE TEMPORARY TABLE rating_aggregates (
rateable_id INT,
avg_rating NUMERIC,
votes NUMERIC
);
--
-- For now, just aggregate. 
INSERT INTO rating_aggregates
SELECT ratings.rateable_id, 
avg(ratings.rating) AS avg_rating, 
count(ratings.rating) AS votes FROM `sounds`  
WHERE ratings.rateable_type = 'Sound' 
GROUP BY ratings.rateable_id;
--
-- Now get your final product --
SELECT 
sounds.*, 
rating_aggregates.avg_rating, 
rating_aggregates.votes AS votes,
rating_aggregates.rateable_id 
FROM rating_aggregates 
INNER JOIN sounds ON (sounds.id = rating_aggregates.rateable_id) 
WHERE 
ratings.rateable_type = 'Sound' 
   AND sounds.blacklisted = false 
   AND sounds.ready_for_deployment = true 
   AND sounds.deployed = true 
   AND sounds.type = "Sound" 
   AND sounds.created_at > "2011-03-26 21:25:49";

似乎@coneybeare在您的建议中看到了一些内容。从我+1!
RolandoMySQLDBA 2011年

我实际上无法使它正常工作。我收到了不确定如何处理的sql错误。我从来没有真正使用过临时桌子
锥虫病2011年

我没有得到它最终(我不得不添加FROM soundsratings到中间查询),但它锁定了我的SQL盒子,我不得不终止进程。
康尼比尔11/11/22

0

使用联接,而不是子查询。您的子查询尝试有帮助吗?

显示创建表声音\ G

显示创建表的评级\ G

具有“复合”索引而不是单列索引通常是有益的。也许是INDEX(type,created_at)

您正在JOIN中对两个表进行过滤;这很可能是性能问题。

大约有1500种声音和200万种评级。

建议您在上有一个auto_increment ID ratings,构建一个摘要表,并使用AI ID来跟踪您的“离开”位置。但是,请勿将平均值存储在汇总表中:

avg(ratings.rating)AS avg_rating,

而是保留SUM(ratings.rating)。平均值的平均值在数学上对计算平均值不正确;(总和)/(计数总和)是正确的。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.