显然,有很多不同的方法来获得相同的结果,您的问题似乎是在MySQL中获得每个组的最后结果的有效方法是什么。如果您要处理大量数据,并且假设您将InnoDB与MySQL的最新版本(例如5.7.21和8.0.4-rc)一起使用,则可能没有有效的方法。
有时我们需要对具有超过6000万行的表执行此操作。
对于这些示例,我将仅使用大约150万行的数据,其中查询将需要查找数据中所有组的结果。在我们的实际情况下,我们经常需要从大约2,000个组中返回数据(假设这不需要检查很多数据)。
我将使用以下表格:
CREATE TABLE temperature(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
groupID INT UNSIGNED NOT NULL,
recordedTimestamp TIMESTAMP NOT NULL,
recordedValue INT NOT NULL,
INDEX groupIndex(groupID, recordedTimestamp),
PRIMARY KEY (id)
);
CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));
温度表填充了约150万个随机记录以及100个不同的组。selected_group填充了这100个组(在我们的示例中,所有组通常小于20%)。
由于此数据是随机的,这意味着多行可以具有相同的recordedTimestamps。我们想要的是按组ID的顺序获取所有选定组的列表,每个组的最后一个记录的时间戳记,如果同一组具有多个匹配行,则该行的最后一个匹配ID。
如果假设MySQL具有last()函数,该函数从特殊ORDER BY子句的最后一行返回值,那么我们可以简单地执行以下操作:
SELECT
last(t1.id) AS id,
t1.groupID,
last(t1.recordedTimestamp) AS recordedTimestamp,
last(t1.recordedValue) AS recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
ORDER BY t1.recordedTimestamp, t1.id
GROUP BY t1.groupID;
在这种情况下,它只需要检查几百行,因为它不使用任何普通的GROUP BY函数。这将在0秒内执行,因此非常高效。请注意,通常在MySQL中,我们会在GROUP BY子句之后看到ORDER BY子句,但是此ORDER BY子句用于确定last()函数的ORDER,如果它在GROUP BY之后,则它将对GROUPS进行排序。如果不存在GROUP BY子句,则所有返回的行中的最后一个值将相同。
但是,MySQL没有此功能,因此让我们看一下它所具有的功能的不同观点,并证明所有这些都不有效。
例子1
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
SELECT t2.id
FROM temperature t2
WHERE t2.groupID = g.id
ORDER BY t2.recordedTimestamp DESC, t2.id DESC
LIMIT 1
);
这检查了3,009,254行,在5.7.21上花了〜0.859秒,在8.0.4-rc上花了更长的时间
例子2
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM temperature t1
INNER JOIN (
SELECT max(t2.id) AS id
FROM temperature t2
INNER JOIN (
SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
FROM selected_group g
INNER JOIN temperature t3 ON t3.groupID = g.id
GROUP BY t3.groupID
) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
GROUP BY t2.groupID
) t5 ON t5.id = t1.id;
这检查了1,505,331行,在5.7.21上花费了约1.25秒,在8.0.4-rc上花费了更长的时间
例子3
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM temperature t1
WHERE t1.id IN (
SELECT max(t2.id) AS id
FROM temperature t2
INNER JOIN (
SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
FROM selected_group g
INNER JOIN temperature t3 ON t3.groupID = g.id
GROUP BY t3.groupID
) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
GROUP BY t2.groupID
)
ORDER BY t1.groupID;
这检查了3,009,685行,在5.7.21上花了〜1.95秒,在8.0.4-rc上花了更长的时间
例子4
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
SELECT max(t2.id)
FROM temperature t2
WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
SELECT max(t3.recordedTimestamp)
FROM temperature t3
WHERE t3.groupID = g.id
)
);
这检查了6,137,810行,在5.7.21上花费了约2.2秒,在8.0.4-rc上花费了更长的时间
例子5
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM (
SELECT
t2.id,
t2.groupID,
t2.recordedTimestamp,
t2.recordedValue,
row_number() OVER (
PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
) AS rowNumber
FROM selected_group g
INNER JOIN temperature t2 ON t2.groupID = g.id
) t1 WHERE t1.rowNumber = 1;
这检查了6,017,808行,并在8.0.4-rc上花费了约4.2秒
例子6
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM (
SELECT
last_value(t2.id) OVER w AS id,
t2.groupID,
last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp,
last_value(t2.recordedValue) OVER w AS recordedValue
FROM selected_group g
INNER JOIN temperature t2 ON t2.groupID = g.id
WINDOW w AS (
PARTITION BY t2.groupID
ORDER BY t2.recordedTimestamp, t2.id
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
) t1
GROUP BY t1.groupID;
这检查了6,017,908行,并在8.0.4-rc上花费了约17.5秒
例子7
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
LEFT JOIN temperature t2
ON t2.groupID = g.id
AND (
t2.recordedTimestamp > t1.recordedTimestamp
OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
)
WHERE t2.id IS NULL
ORDER BY t1.groupID;
这是永远的,所以我不得不杀死它。