非常慢的简单JOIN查询

12

简单的数据库结构（用于在线论坛）：

CREATE TABLE users (
    id integer NOT NULL PRIMARY KEY,
    username text
);
CREATE INDEX ON users (username);

CREATE TABLE posts (
    id integer NOT NULL PRIMARY KEY,
    thread_id integer NOT NULL REFERENCES threads (id),
    user_id integer NOT NULL REFERENCES users (id),
    date timestamp without time zone NOT NULL,
    content text
);
CREATE INDEX ON posts (thread_id);
CREATE INDEX ON posts (user_id);

表中约有8 users万个条目，posts表中有260 万个条目。这个简单的查询可按帖子获取前100名用户，耗时2.4秒：

EXPLAIN ANALYZE SELECT u.id, u.username, COUNT(p.id) AS PostCount FROM users u
                    INNER JOIN posts p on p.user_id = u.id
                    WHERE u.username IS NOT NULL
                    GROUP BY u.id
ORDER BY PostCount DESC LIMIT 100;

Limit  (cost=316926.14..316926.39 rows=100 width=20) (actual time=2326.812..2326.830 rows=100 loops=1)
  ->  Sort  (cost=316926.14..317014.83 rows=35476 width=20) (actual time=2326.809..2326.820 rows=100 loops=1)
        Sort Key: (count(p.id)) DESC
        Sort Method: top-N heapsort  Memory: 32kB
        ->  HashAggregate  (cost=315215.51..315570.27 rows=35476 width=20) (actual time=2311.296..2321.739 rows=34608 loops=1)
              Group Key: u.id
              ->  Hash Join  (cost=1176.89..308201.88 rows=1402727 width=16) (actual time=16.538..1784.546 rows=1910831 loops=1)
                    Hash Cond: (p.user_id = u.id)
                    ->  Seq Scan on posts p  (cost=0.00..286185.34 rows=1816634 width=8) (actual time=0.103..1144.681 rows=2173916 loops=1)
                    ->  Hash  (cost=733.44..733.44 rows=35476 width=12) (actual time=15.763..15.763 rows=34609 loops=1)
                          Buckets: 65536  Batches: 1  Memory Usage: 2021kB
                          ->  Seq Scan on users u  (cost=0.00..733.44 rows=35476 width=12) (actual time=0.033..6.521 rows=34609 loops=1)
                                Filter: (username IS NOT NULL)
                                Rows Removed by Filter: 11335

Execution time: 2301.357 ms

随着set enable_seqscan = false更糟糕：

Limit  (cost=1160881.74..1160881.99 rows=100 width=20) (actual time=2758.086..2758.107 rows=100 loops=1)
  ->  Sort  (cost=1160881.74..1160970.43 rows=35476 width=20) (actual time=2758.084..2758.098 rows=100 loops=1)
        Sort Key: (count(p.id)) DESC
        Sort Method: top-N heapsort  Memory: 32kB
        ->  GroupAggregate  (cost=0.79..1159525.87 rows=35476 width=20) (actual time=0.095..2749.859 rows=34608 loops=1)
              Group Key: u.id
              ->  Merge Join  (cost=0.79..1152157.48 rows=1402727 width=16) (actual time=0.036..2537.064 rows=1910831 loops=1)
                    Merge Cond: (u.id = p.user_id)
                    ->  Index Scan using users_pkey on users u  (cost=0.29..2404.83 rows=35476 width=12) (actual time=0.016..41.163 rows=34609 loops=1)
                          Filter: (username IS NOT NULL)
                          Rows Removed by Filter: 11335
                    ->  Index Scan using posts_user_id_index on posts p  (cost=0.43..1131472.19 rows=1816634 width=8) (actual time=0.012..2191.856 rows=2173916 loops=1)
Planning time: 1.281 ms
Execution time: 2758.187 ms

username在Postgres中缺少分组依据，因为它不是必需的（SQL Server表示username如果要选择用户名，则必须分组依据）。与分组username会增加ms在Postgres上的执行时间，或者什么都不做。

为了科学起见，我已经将Microsoft SQL Server安装到同一台服务器（运行archlinux，8核心xeon，24 gb ram，ssd），并从Postgres迁移了所有数据- 相同的表结构，相同的索引，相同的数据。相同的查询以获取前100名海报在0.3秒内运行：

SELECT TOP 100 u.id, u.username, COUNT(p.id) AS PostCount FROM dbo.users u
                    INNER JOIN dbo.posts p on p.user_id = u.id
                    WHERE u.username IS NOT NULL
                    GROUP BY u.id, u.username
ORDER BY PostCount DESC

从相同的数据产生相同的结果，但速度快8倍。而且它是Linux上MS SQL的测试版，我想它可以在它的“家用”操作系统Windows Server上运行，它可能会更快。

我的PostgreSQL查询是完全错误的，还是PostgreSQL速度很慢？

附加信息

版本几乎是最新的（9.6.1，当前最新的是9.6.2，ArchLinux的软件包已过时，更新速度很慢）。配置：

max_connections = 75
shared_buffers = 3584MB       
effective_cache_size = 10752MB
work_mem = 24466kB         
maintenance_work_mem = 896MB   
dynamic_shared_memory_type = posix  
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100

EXPLAIN ANALYZE输出：https : //pastebin.com/HxucRgnk

尝试使用所有索引，甚至使用GIN和GIST，PostgreSQL的最快方法（并且Googling确认有很多行）是使用顺序扫描。

MS SQL Server 14.0.405.200-1，默认配置。

我在一个API中使用了它（使用无选择的普通选择），然后用chrome调用此API端点，它说需要2500毫秒+，添加50毫秒的HTTP和Web服务器开销（API和SQL在同一服务器上运行） - 一样的。我不在乎这里的100毫秒，我在乎的是整整两秒。

explain analyze SELECT user_id, count(9) FROM posts group by user_id;需要700毫秒。posts表的大小是2154 MB。

postgresql query-performance postgresql-9.6

— 拉尔斯
source

2

听起来，您的用户发了很多好消息（平均约为1kB）。posts使用这样的表将它们与表的其余部分分离可能是有意义的，可以省掉CREATE TABLE post_content (post_id PRIMARY KEY REFERENCES posts (id), content text); 在这种类型的查询上“浪费”的大多数I / O。如果职位比这个更小VACUUM FULL的posts可以提供帮助。

— dezso

是的，帖子的内容列包含帖子的所有html。谢谢您的建议，明天再试。问题是-MSSQL帖子表的重量也超过1.5 GB，内容相同，但是设法更快-为什么？

— 拉斯，

2

您也可以从SQL Server发布实际的执行计划。甚至对于像我这样的Postgres人也可能真的很有趣。

— dezso

嗯，快速猜测一下，您可以将其更改GROUP BY u.id为此GROUP BY p.user_id并尝试一下吗？我的猜测是，即使您只需要发布user_id来获得前N个行，Postgres确实会首先连接并按第二个组进行连接，因为您正在按用户表标识符分组。

— UldisK

1

另一个好的查询变体是：

SELECT p.user_id, p.cnt AS PostCount
FROM users u
INNER JOIN (
    select user_id, count(id) as cnt from posts group by user_id
) as p on p.user_id = u.id
WHERE u.username IS NOT NULL          
ORDER BY PostCount DESC LIMIT 100;

它不会利用CTE并给出正确的答案（CTE示例在理论上可能会产生少于100行，因为它首先受到限制，然后才与用户加入）。

我想，MSSQL能够在其查询优化器中执行这种转换，而PostgreSQL无法在join下推动聚合。或MSSQL仅具有更快的哈希联接实现。

— funny_falcon
source

8

这可能行不通-我是基于直觉，认为它是在组和过滤器之前加入您的表的。我建议尝试以下操作：在尝试加入之前，使用CTE进行过滤和分组：

with
    __posts as(
        select
            user_id,
            count(1) as num_posts
        from
            posts
        group by
            user_id
        order by
            num_posts desc
        limit 100
    )
select
    users.username,
    __posts.num_posts
from
    users
    inner join __posts on(
        __posts.user_id = users.id
    )
order by
    num_posts desc

查询计划程序有时只需要一点指导。此解决方案在这里效果很好，但是CTE在某些情况下可能会很糟糕。CTE仅存储在内存中。结果，大数据返回可能超出Postgres分配的内存，并开始交换（在MS中分页）。CTE也无法建立索引，因此足够大的查询在查询CTE时仍可能导致速度显着下降。

真正可以带走的最佳建议是尝试多种方式并检查查询计划。

— 踏板车
source

-1

您是否尝试增加work_mem？24Mb似乎太小，因此Hash Join必须使用多个批处理（以临时文件写入）。

— 康斯坦丁·尼兹尼克
source

它不是太小。增加到240 MB不会执行任何操作。通过添加以下两行，可以在postgresql.conf中帮助启用并行查询：max_parallel_workers_per_gather = 4和max_worker_processes = 16

— Lars