基准测试
使用Postgres 9.4和9.5以及200k行中purchases
和10k分别customer_id
(每位客户平均20行)的中途现实表测试最有趣的候选者。
对于Postgres 9.5,我对有效的86446个不同的客户进行了第二次测试。参见下文(每个客户平均2.3行)。
设定
主桌
CREATE TABLE purchases (
id serial
, customer_id int -- REFERENCES customer
, total int -- could be amount of money in Cent
, some_column text -- to make the row bigger, more realistic
);
我使用serial
(在下面添加了PK约束)和一个整数,customer_id
因为这是更典型的设置。还添加了some_column
以弥补通常更多的列。
虚拟数据,PK,索引-典型表也有一些死元组:
INSERT INTO purchases (customer_id, total, some_column) -- insert 200k rows
SELECT (random() * 10000)::int AS customer_id -- 10k customers
, (random() * random() * 100000)::int AS total
, 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM generate_series(1,200000) g;
ALTER TABLE purchases ADD CONSTRAINT purchases_id_pkey PRIMARY KEY (id);
DELETE FROM purchases WHERE random() > 0.9; -- some dead rows
INSERT INTO purchases (customer_id, total, some_column)
SELECT (random() * 10000)::int AS customer_id -- 10k customers
, (random() * random() * 100000)::int AS total
, 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM generate_series(1,20000) g; -- add 20k to make it ~ 200k
CREATE INDEX purchases_3c_idx ON purchases (customer_id, total DESC, id);
VACUUM ANALYZE purchases;
customer
表-用于高级查询
CREATE TABLE customer AS
SELECT customer_id, 'customer_' || customer_id AS customer
FROM purchases
GROUP BY 1
ORDER BY 1;
ALTER TABLE customer ADD CONSTRAINT customer_customer_id_pkey PRIMARY KEY (customer_id);
VACUUM ANALYZE customer;
在针对9.5的第二次测试中,我使用了相同的设置,但通过random() * 100000
生成customer_id
仅获得了几行customer_id
。
表的对象大小 purchases
使用此查询生成。
what | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
core_relation_size | 20496384 | 20 MB | 102
visibility_map | 0 | 0 bytes | 0
free_space_map | 24576 | 24 kB | 0
table_size_incl_toast | 20529152 | 20 MB | 102
indexes_size | 10977280 | 10 MB | 54
total_size_incl_toast_and_indexes | 31506432 | 30 MB | 157
live_rows_in_text_representation | 13729802 | 13 MB | 68
------------------------------ | | |
row_count | 200045 | |
live_tuples | 200045 | |
dead_tuples | 19955 | |
查询
1. row_number()
在CTE中,(请参阅其他答案)
WITH cte AS (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM purchases
)
SELECT id, customer_id, total
FROM cte
WHERE rn = 1;
2. row_number()
在子查询中(我的优化)
SELECT id, customer_id, total
FROM (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM purchases
) sub
WHERE rn = 1;
3. DISTINCT ON
(请参阅其他答案)
SELECT DISTINCT ON (customer_id)
id, customer_id, total
FROM purchases
ORDER BY customer_id, total DESC, id;
4.带子LATERAL
查询的rCTE (请参阅此处)
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT id, customer_id, total
FROM purchases
ORDER BY customer_id, total DESC
LIMIT 1
)
UNION ALL
SELECT u.*
FROM cte c
, LATERAL (
SELECT id, customer_id, total
FROM purchases
WHERE customer_id > c.customer_id -- lateral reference
ORDER BY customer_id, total DESC
LIMIT 1
) u
)
SELECT id, customer_id, total
FROM cte
ORDER BY customer_id;
5.带customer
表LATERAL
(见这里)
SELECT l.*
FROM customer c
, LATERAL (
SELECT id, customer_id, total
FROM purchases
WHERE customer_id = c.customer_id -- lateral reference
ORDER BY total DESC
LIMIT 1
) l;
6. array_agg()
与ORDER BY
(请参阅其他答案)
SELECT (array_agg(id ORDER BY total DESC))[1] AS id
, customer_id
, max(total) AS total
FROM purchases
GROUP BY customer_id;
结果
以上查询的执行时间EXPLAIN ANALYZE
(包括所有选项均关闭),最好执行5次。
所有查询都使用“ 仅索引扫描 ” purchases2_3c_idx
(在其他步骤中)。其中一些只是针对较小的索引大小,而其他一些则更有效。
A. Postgres 9.4,具有20万行,每行约20个 customer_id
1. 273.274 ms
2. 194.572 ms
3. 111.067 ms
4. 92.922 ms
5. 37.679 ms -- winner
6. 189.495 ms
B.与Postgres 9.5相同
1. 288.006 ms
2. 223.032 ms
3. 107.074 ms
4. 78.032 ms
5. 33.944 ms -- winner
6. 211.540 ms
C.与B.相同,但每个〜2.3行 customer_id
1. 381.573 ms
2. 311.976 ms
3. 124.074 ms -- winner
4. 710.631 ms
5. 311.976 ms
6. 421.679 ms
相关基准
这是在Postgres 11.5(截至2019年9月)上通过1000万行和6万个唯一“客户”进行的“ ogr”测试得出的新结果。结果仍然符合我们到目前为止所看到的:
2011年的原始(过时)基准
我使用PostgreSQL 9.1在实际的65579行表和涉及的三列中的每一列上的单列btree索引上进行了三个测试,并以5次运行的最佳执行时间进行了测试。
将@OMGPonies的第一个查询(A
)与上述DISTINCT ON
解决方案(B
)进行比较:
选择整个表,在这种情况下将导致5958行。
A: 567.218 ms
B: 386.673 ms
使用条件WHERE customer BETWEEN x AND y
导致1000行。
A: 249.136 ms
B: 55.111 ms
使用选择一个客户WHERE customer = x
。
A: 0.143 ms
B: 0.072 ms
使用另一个答案中描述的索引重复相同的测试
CREATE INDEX purchases_3c_idx ON purchases (customer, total DESC, id);
1A: 277.953 ms
1B: 193.547 ms
2A: 249.796 ms -- special index not used
2B: 28.679 ms
3A: 0.120 ms
3B: 0.048 ms
MAX(total)
?