我有一个分区表结构,如:
CREATE TABLE measurements (
sensor_id bigint,
tx timestamp,
measurement int
);
CREATE TABLE measurements_201201(
CHECK (tx >= '2012-01-01 00:00:00'::timestamp without time zone
AND tx < ('2012-01-01 00:00:00'::timestamp without time zone + '1 mon'::interval))
)INHERITS (measurements);
CREATE INDEX ON measurements_201201(sensor_id);
CREATE INDEX ON measurements_201201(tx);
CREATE INDEX ON measurements_201201(sensor_id, tx);
....
等等。每个表大约有2000万行。
如果我在WHERE
子句中查询传感器样本和时间戳样本,则查询计划将显示选择的正确表和正在使用的索引,例如:
SELECT *
FROM measurements
INNER JOIN sensors TABLESAMPLE BERNOULLI (0.01) USING (sensor_id)
WHERE tx BETWEEN '2015-01-04 05:00' AND '2015-01-04 06:00'
OR tx BETWEEN '2015-02-04 05:00' AND '2015-02-04 06:00'
OR tx BETWEEN '2014-03-05 05:00' AND '2014-04-07 06:00' ;
但是,如果我使用CTE,或将时间戳记值放入表中(即使在临时表上具有索引也未显示)。
WITH sensor_sample AS(
SELECT sensor_id, start_ts, end_ts
FROM sensors TABLESAMPLE BERNOULLI (0.01)
CROSS JOIN (VALUES (TIMESTAMP '2015-01-04 05:00', TIMESTAMP '2015-01-04 06:00'),
(TIMESTAMP '2015-02-04 05:00', TIMESTAMP '2015-02-04 06:00'),
(TIMESTAMP '2014-03-05 05:00', '2014-04-07 06:00') ) tstamps(start_ts, end_ts)
)
像下面这样
SET constraint_exclusion = on;
SELECT * FROM measurements
INNER JOIN sensor_sample USING (sensor_id)
WHERE tx BETWEEN start_ts AND end_ts
对每个表执行索引扫描。这仍然相对较快,但是随着查询的复杂性增加,这可能会变成seq扫描,从有限的分区表子集中检索约40K行(50的4-5)最终会非常慢。
我担心这样的问题。
对于非平凡的表达式,您必须在查询中重复或多或少的逐字条件,以使Postgres查询计划程序了解其可以依赖CHECK约束。即使看起来多余!
如何改善分区和查询结构,以减少对所有数据运行seq扫描的可能性?
1
一个好问题-但是,如果您粘贴EXPLAIN(分析,
—
缓冲区