我有一张entities
约有1500万条记录的大桌子。我想在其中找到与“曲棍球”匹配的前5行name
。
我在上有全文索引name
,该索引用于:gin_ix_entity_full_text_search_name
查询:
SELECT "entities".*,
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'hockey'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE "entities"."place" = 'f'
AND (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'hockey'::text))
ORDER BY "rank0.48661998202865475" DESC LIMIT 5
持续时间25,623 ms
解释计划 1个限制(cost = 12666.89..12666.89行= 5宽度= 3116) 2->排序(费用= 12666.89..12670.18行= 6571宽度= 3116) 3排序键:(ts_rank(to_tsvector('english':: regconfig,(name):: text),'''hockey''':: tsquery)) 4->对实体进行位图堆扫描(cost = 124.06..12645.06行= 6571宽度= 3116) 5重新检查条件:(to_tsvector('english':: regconfig,(name):: text)@@'''hockey''':: tsquery) 6个过滤器:(不放置) 7->在gin_ix_entity_full_text_search_name上进行位图索引扫描(cost = 0.00..123.74 rows = 6625 width = 0) 8索引条件:(to_tsvector('english':: regconfig,(name):: text)@@'''hockey''':: tsquery)
我不明白为什么它两次验证索引条件。(查询计划步骤4和7)。是因为我的布尔条件(not place
)?如果是这样,我应该将其添加到索引中以获得快速查询吗?还是排序条件使其变慢?
EXPLAIN ANALYZE
输出:
限制(cost = 4447.28..4447.29行= 5宽度= 3116)(实际时间= 18509.274..18509.282行= 5循环= 1) ->排序(成本= 4447.28..4448.41行= 2248宽度= 3116)(实际时间= 18509.271..18509.273行= 5循环= 1) 排序键:(ts_rank(to_tsvector('english':: regconfig,(name):: text),'''test''':: tsquery)) 排序方法:top-N heapsort内存:19kB ->对实体进行位图堆扫描(成本= 43.31..4439.82行= 2248宽度= 3116)(实际时间= 119.003..18491.408行= 2533循环= 1) 重新检查条件:(to_tsvector('english':: regconfig,(name):: text)@@'''test''':: tsquery) 过滤条件:(未放置) ->在gin_ix_entity_full_text_search_name上进行位图索引扫描(成本= 0.00..43.20行= 2266宽度= 0)(实际时间= 74.093..74.093行= 2593循环= 1) 索引条件:(to_tsvector('english':: regconfig,(name):: text)@@'''test''':: tsquery) 总运行时间:18509.381毫秒
这是我的数据库参数。它由Heroku在Amazon服务上托管。他们将其描述为具有1.7GB RAM,1个处理单元和一个最大1TB的DB。
名称| 当前设置 ------------------------------ + ------------------- -------------------------------------------------- ------------------------------------ 版本| i486-pc-linux-gnu上的PostgreSQL 9.0.7,由GCC gcc-4.4.real(Ubuntu 4.4.3-4ubuntu5)4.4.3,32位编译 archive_command | 测试-f /etc/postgresql/9.0/main/wal-ed/ARCHIVING_OFF || envdir /etc/postgresql/9.0/resource29857_heroku_com/wal-ed/env wal-e wal-push%p archive_mode | 上 archive_timeout | 1分钟 checkpoint_completion_target | 0.7 checkpoint_segments | 40 client_min_messages | 注意 cpu_index_tuple_cost | 0.001 cpu_operator_cost | 0.0005 cpu_tuple_cost | 0.003 有效缓存大小| 1530000kB hot_standby | 上 lc_collate | zh_CN.UTF-8 lc_ctype | zh_CN.UTF-8 listen_addresses | * log_checkpoints | 上 log_destination | 系统日志 log_line_prefix | %u [黄色] log_min_duration_statement | 50毫秒 log_min_messages | 注意 logging_collector | 上 maintenance_work_mem | 64MB max_connections | 500 max_prepared_transactions | 500 max_stack_depth | 2MB max_standby_archive_delay | -1 max_standby_streaming_delay | -1 max_wal_senders | 10 港口 random_page_cost | 2 server_encoding | UTF8 shared_buffers | 415兆字节 ssl | 上 syslog_ident | resource29857_heroku_com 时区| 世界标准时间 wal_buffers | 8MB wal_keep_segments | 127 wal_level | hot_standby work_mem | 100MB (39行)
编辑
看起来ORDER BY
是最慢的部分:
d6ifslbf0ugpu=> EXPLAIN ANALYZE SELECT "entities"."name",
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'banana'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'banana'::text))
LIMIT 5;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43.31..53.07 rows=5 width=24) (actual time=76.583..103.623 rows=5 loops=1)
-> Bitmap Heap Scan on entities (cost=43.31..4467.60 rows=2266 width=24) (actual time=76.581..103.613 rows=5 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
-> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..43.20 rows=2266 width=0) (actual time=53.592..53.592 rows=1495 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
Total runtime: 103.680 ms
VS. 与ORDER BY
:
d6ifslbf0ugpu=> EXPLAIN ANALYZE SELECT "entities"."name",
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'banana'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'banana'::text))
ORDER BY "rank0.48661998202865475" DESC
LIMIT 5;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4475.12..4475.13 rows=5 width=24) (actual time=15013.735..15013.741 rows=5 loops=1)
-> Sort (cost=4475.12..4476.26 rows=2266 width=24) (actual time=15013.732..15013.735 rows=5 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''banana'''::tsquery))
Sort Method: top-N heapsort Memory: 17kB
-> Bitmap Heap Scan on entities (cost=43.31..4467.60 rows=2266 width=24) (actual time=0.872..15006.763 rows=1495 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
-> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..43.20 rows=2266 width=0) (actual time=0.549..0.549 rows=1495 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
Total runtime: 15013.805 ms
有点我还是不明白为什么这样慢。看起来它正在从“位图堆扫描”中获取相同数量的行,但是花费的时间更长?