我有一张桌子progresses
(当前包含数十万条记录):
Column | Type | Modifiers
---------------+-----------------------------+---------------------------------------------------------
id | integer | not null default nextval('progresses_id_seq'::regclass)
lesson_id | integer |
user_id | integer |
created_at | timestamp without time zone |
deleted_at | timestamp without time zone |
Indexes:
"progresses_pkey" PRIMARY KEY, btree (id)
"index_progresses_on_deleted_at" btree (deleted_at)
"index_progresses_on_lesson_id" btree (lesson_id)
"index_progresses_on_user_id" btree (user_id)
和视图v_latest_progresses
,其将查询最近progress
的user_id
和lesson_id
:
SELECT DISTINCT ON (progresses.user_id, progresses.lesson_id)
progresses.id AS progress_id,
progresses.lesson_id,
progresses.user_id,
progresses.created_at,
progresses.deleted_at
FROM progresses
WHERE progresses.deleted_at IS NULL
ORDER BY progresses.user_id, progresses.lesson_id, progresses.created_at DESC;
用户可以在任何给定课程上获得许多进度,但是我们经常想查询一组给定用户或课程(或两者的组合)中最近创建的进度。
v_latest_progresses
当我指定一组user_id
s 时,该视图可以很好地做到这一点,甚至表现出色:
# EXPLAIN SELECT "v_latest_progresses".* FROM "v_latest_progresses" WHERE "v_latest_progresses"."user_id" IN ([the same list of ids given by the subquery in the second example below]);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=526.68..528.66 rows=36 width=57)
-> Sort (cost=526.68..527.34 rows=265 width=57)
Sort Key: progresses.user_id, progresses.lesson_id, progresses.created_at
-> Index Scan using index_progresses_on_user_id on progresses (cost=0.47..516.01 rows=265 width=57)
Index Cond: (user_id = ANY ('{ [the above list of user ids] }'::integer[]))
Filter: (deleted_at IS NULL)
(6 rows)
但是,如果我尝试执行相同的查询以user_id
子查询替换s 集,它将变得非常无效率:
# EXPLAIN SELECT "v_latest_progresses".* FROM "v_latest_progresses" WHERE "v_latest_progresses"."user_id" IN (SELECT "users"."id" FROM "users" WHERE "users"."company_id"=44);
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Merge Semi Join (cost=69879.08..72636.12 rows=19984 width=57)
Merge Cond: (progresses.user_id = users.id)
-> Unique (cost=69843.45..72100.80 rows=39969 width=57)
-> Sort (cost=69843.45..70595.90 rows=300980 width=57)
Sort Key: progresses.user_id, progresses.lesson_id, progresses.created_at
-> Seq Scan on progresses (cost=0.00..31136.31 rows=300980 width=57)
Filter: (deleted_at IS NULL)
-> Sort (cost=35.63..35.66 rows=10 width=4)
Sort Key: users.id
-> Index Scan using index_users_on_company_id on users (cost=0.42..35.46 rows=10 width=4)
Index Cond: (company_id = 44)
(11 rows)
我要弄清楚的是PostgreSQL为什么要执行 DISTINCT
progresses
在第二个示例中的子查询过滤之前在整个表上查询。
有人会对如何改进此查询有任何建议吗?
144.07..144.6
我得到的70,000还要低!非常感谢你。