显然,有很多不同的方法来获得相同的结果,您的问题似乎是在MySQL中获得每个组的最后结果的有效方法是什么。如果您要处理大量数据,并且假设您将InnoDB与MySQL的最新版本(例如5.7.21和8.0.4-rc)一起使用,则可能没有有效的方法。
有时我们需要对具有超过6000万行的表执行此操作。
对于这些示例,我将仅使用大约150万行的数据,其中查询将需要查找数据中所有组的结果。在我们的实际情况下,我们经常需要从大约2,000个组中返回数据(假设这不需要检查很多数据)。
我将使用以下表格:
CREATE TABLE temperature(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT, 
  groupID INT UNSIGNED NOT NULL, 
  recordedTimestamp TIMESTAMP NOT NULL, 
  recordedValue INT NOT NULL,
  INDEX groupIndex(groupID, recordedTimestamp), 
  PRIMARY KEY (id)
);
CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id)); 
温度表填充了约150万个随机记录以及100个不同的组。selected_group填充了这100个组(在我们的示例中,所有组通常小于20%)。
由于此数据是随机的,这意味着多行可以具有相同的recordedTimestamps。我们想要的是按组ID的顺序获取所有选定组的列表,每个组的最后一个记录的时间戳记,如果同一组具有多个匹配行,则该行的最后一个匹配ID。
如果假设MySQL具有last()函数,该函数从特殊ORDER BY子句的最后一行返回值,那么我们可以简单地执行以下操作: 
SELECT 
  last(t1.id) AS id, 
  t1.groupID, 
  last(t1.recordedTimestamp) AS recordedTimestamp, 
  last(t1.recordedValue) AS recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
ORDER BY t1.recordedTimestamp, t1.id
GROUP BY t1.groupID;
在这种情况下,它只需要检查几百行,因为它不使用任何普通的GROUP BY函数。这将在0秒内执行,因此非常高效。请注意,通常在MySQL中,我们会在GROUP BY子句之后看到ORDER BY子句,但是此ORDER BY子句用于确定last()函数的ORDER,如果它在GROUP BY之后,则它将对GROUPS进行排序。如果不存在GROUP BY子句,则所有返回的行中的最后一个值将相同。
但是,MySQL没有此功能,因此让我们看一下它所具有的功能的不同观点,并证明所有这些都不有效。
例子1
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
  SELECT t2.id
  FROM temperature t2 
  WHERE t2.groupID = g.id
  ORDER BY t2.recordedTimestamp DESC, t2.id DESC
  LIMIT 1
);
这检查了3,009,254行,在5.7.21上花了〜0.859秒,在8.0.4-rc上花了更长的时间
例子2
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM temperature t1
INNER JOIN ( 
  SELECT max(t2.id) AS id   
  FROM temperature t2
  INNER JOIN (
    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
    FROM selected_group g
    INNER JOIN temperature t3 ON t3.groupID = g.id
    GROUP BY t3.groupID
  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
  GROUP BY t2.groupID
) t5 ON t5.id = t1.id;
这检查了1,505,331行,在5.7.21上花费了约1.25秒,在8.0.4-rc上花费了更长的时间
例子3
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM temperature t1
WHERE t1.id IN ( 
  SELECT max(t2.id) AS id   
  FROM temperature t2
  INNER JOIN (
    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
    FROM selected_group g
    INNER JOIN temperature t3 ON t3.groupID = g.id
    GROUP BY t3.groupID
  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
  GROUP BY t2.groupID
)
ORDER BY t1.groupID;
这检查了3,009,685行,在5.7.21上花了〜1.95秒,在8.0.4-rc上花了更长的时间
例子4
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
  SELECT max(t2.id)
  FROM temperature t2 
  WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
      SELECT max(t3.recordedTimestamp)
      FROM temperature t3 
      WHERE t3.groupID = g.id
    )
);
这检查了6,137,810行,在5.7.21上花费了约2.2秒,在8.0.4-rc上花费了更长的时间
例子5
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM (
  SELECT 
    t2.id, 
    t2.groupID, 
    t2.recordedTimestamp, 
    t2.recordedValue, 
    row_number() OVER (
      PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
    ) AS rowNumber
  FROM selected_group g 
  INNER JOIN temperature t2 ON t2.groupID = g.id
) t1 WHERE t1.rowNumber = 1;
这检查了6,017,808行,并在8.0.4-rc上花费了约4.2秒
例子6
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM (
  SELECT 
    last_value(t2.id) OVER w AS id, 
    t2.groupID, 
    last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp, 
    last_value(t2.recordedValue) OVER w AS recordedValue
  FROM selected_group g
  INNER JOIN temperature t2 ON t2.groupID = g.id
  WINDOW w AS (
    PARTITION BY t2.groupID 
    ORDER BY t2.recordedTimestamp, t2.id 
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
) t1
GROUP BY t1.groupID;
这检查了6,017,908行,并在8.0.4-rc上花费了约17.5秒
例子7
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
LEFT JOIN temperature t2 
  ON t2.groupID = g.id 
  AND (
    t2.recordedTimestamp > t1.recordedTimestamp 
    OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
  )
WHERE t2.id IS NULL
ORDER BY t1.groupID;
这是永远的,所以我不得不杀死它。