限时降低ORDER BY

11

我有这个查询：

SELECT * 
FROM location 
WHERE to_tsvector('simple',unaccent2("city"))
   @@ to_tsquery('simple',unaccent2('wroclaw')) 
order by displaycount

我对此感到满意：

"Sort  (cost=3842.56..3847.12 rows=1826 width=123) (actual time=1.915..2.084 rows=1307 loops=1)"
"  Sort Key: displaycount"
"  Sort Method: quicksort  Memory: 206kB"
"  ->  Bitmap Heap Scan on location  (cost=34.40..3743.64 rows=1826 width=123) (actual time=0.788..1.208 rows=1307 loops=1)"
"        Recheck Cond: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"        ->  Bitmap Index Scan on location_lower_idx  (cost=0.00..33.95 rows=1826 width=0) (actual time=0.760..0.760 rows=1307 loops=1)"
"              Index Cond: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"Total runtime: 2.412 ms"

但是，当我添加LIMIT时，执行过程需要2秒钟以上：

SELECT * 
FROM location 
WHERE to_tsvector('simple',unaccent2("city"))
   @@ to_tsquery('simple',unaccent2('wroclaw')) 
order by displaycount 
limit 20

说明：

"Limit  (cost=0.00..1167.59 rows=20 width=123) (actual time=2775.452..2775.643 rows=20 loops=1)"
"  ->  Index Scan using location_displaycount_index on location  (cost=0.00..106601.25 rows=1826 width=123) (actual time=2775.448..2775.637 rows=20 loops=1)"
"        Filter: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"Total runtime: 2775.693 ms"

我认为这是ORDER BY和LIMIT的问题。如何强制PostgreSQL使用索引并在最后进行排序？

子查询无济于事：

SELECT * 
FROM (
    SELECT * 
    FROM location 
    WHERE to_tsvector('simple',unaccent2("city"))
       @@ to_tsquery('simple',unaccent2('wroclaw')) 
    order by displaycount
) t 
LIMIT 20;

要么：

SELECT * 
FROM (
    SELECT * 
    FROM location 
    WHERE to_tsvector('simple',unaccent2("city"))
       @@ to_tsquery('simple',unaccent2('wroclaw'))
) t 
order by displaycount 
LIMIT 20;

— 日里
source

12

我的猜测是，这将解决您的查询：

SELECT * 
FROM   location 
WHERE     to_tsvector('simple',unaccent2(city))
       @@ to_tsquery('simple',unaccent2('wroclaw')) 
ORDER  BY to_tsvector('simple',unaccent2(city))
       @@ to_tsquery('simple',unaccent2('wroclaw')) DESC
         ,displaycount 
LIMIT  20;

我将WHERE条件作为ORDER BY子句的第一个元素重复进行-这在逻辑上是多余的，但应防止查询计划程序假设根据索引来处理行会更好location_displaycount_index-事实证明这要贵得多。

潜在的问题是查询计划者显然严重地误判了条件的选择性和/或成本WHERE。我只能推测为什么。

您是否正在运行autovacuum-还应该ANALYZE在表上进行运行？从而，您的表统计信息是最新的吗？如果您运行以下任何效果：

ANALYZE location;

然后再试一次？

也可能是对@@操作员的选择性进行了错误的判断。我想由于逻辑原因很难估计。

如果我的查询不能解决问题，则通常要验证基础理论，请执行以下两项操作之一：

暂时删除索引 location_displaycount_index
通过运行以下命令暂时禁用基本索引扫描：
```
SET enable_indexscan = OFF;
```

后者的侵入性较小，仅影响当前会话。它将保留方法bitmap heap scan并保持bitmap index scan开放状态，以供更快的计划使用。
然后重新运行查询。

顺便说一句：如果理论是正确的，那么在FTS条件下使用较少选择性的搜索词，您的查询（如您现在所拥有的）将更快-这与您的预期相反。试试看。

— 欧文·布兰德斯特
source

1

查询有效。关闭indexscan也可以。ANALYZE不起作用。非常感谢您提供全面的答案。

— ziri 2012年

0

当使用LIMIT postgresql调整时，它的计划是仅检索行的子集的最佳方案。不幸的是，它在您的情况下做出了错误的选择。这可能是因为该表的统计信息太旧了。尝试通过发布VACUUM ANALYZE位置来更新统计信息；

通常通过禁止使用顺序扫描（设置enable_seqscan = false）来强制使用索引。但是，在您的情况下，它没有进行顺序扫描，只是使用LIMIT切换到查询的其他索引。

如果分析无法解决问题，您能否确定您使用的是哪个版本的Postgresql？表格中还有几行？

— 埃尔克
source

分析没有帮助。该表大约有36000行，我正在使用PostgreSQL 9.1。

— ziri 2012年