PostgreSQL:在同一查询中重用复杂的中间结果


8

使用PostgreSQL(8.4),我创建总结了几桌各种结果的视图(例如创建列abc在视图中),然后我需要一些结果结合在同一个查询(例如a+ba-b(a+b)/c,...),以产生最终结果。我要注意的是,即使在同一查询中完成操作,每次使用时,中间结果都会被完全计算。

有没有一种方法可以对此进行优化,从而避免每次都计算相同的结果?

这是重现该问题的简化示例。

CREATE TABLE test1 (
    id SERIAL PRIMARY KEY,
    log_timestamp TIMESTAMP NOT NULL
);
CREATE TABLE test2 (
    test1_id INTEGER NOT NULL REFERENCES test1(id),
    category VARCHAR(10) NOT NULL,
    col1 INTEGER,
    col2 INTEGER
);
CREATE INDEX test_category_idx ON test2(category);

-- Added after edit to this question
CREATE INDEX test_id_idx ON test2(test1_id);

-- Populating with test data.
INSERT INTO test1(log_timestamp)
    SELECT * FROM generate_series('2011-01-01'::timestamp, '2012-01-01'::timestamp, '1 hour');
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (20000*random()-10000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (2000*random()-1000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (2000*random()-40)::int, (3000*random()-200)::int FROM test1;

这是执行最耗时的操作的视图:

CREATE VIEW testview1 AS
    SELECT
       t1.id,
       t1.log_timestamp,
       (SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
       (SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
       (SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
    FROM test1 t1;
  • SELECT a FROM testview1产生这个计划(通过EXPLAIN ANALYZE):

     Seq Scan on test1 t1  (cost=0.00..1787086.55 rows=8761 width=4) (actual time=12.877..10517.575 rows=8761 loops=1)
       SubPlan 1
         ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.193..1.193 rows=1 loops=8761)
               ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.109..1.177 rows=0 loops=8761)
                     Recheck Cond: ((category)::text = 'A'::text)
                     Filter: (test1_id = $0)
                     ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.414..0.414 rows=1631 loops=8761)
                           Index Cond: ((category)::text = 'A'::text)
     Total runtime: 10522.346 ms
    

  • SELECT a, a FROM testview1产生这个计划

     Seq Scan on test1 t1  (cost=0.00..3574037.50 rows=8761 width=4) (actual time=3.343..20550.817 rows=8761 loops=1)
       SubPlan 1
         ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.183..1.183 rows=1 loops=8761)
               ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.100..1.166 rows=0 loops=8761)
                     Recheck Cond: ((category)::text = 'A'::text)
                     Filter: (test1_id = $0)
                     ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.418..0.418 rows=1631 loops=8761)
                           Index Cond: ((category)::text = 'A'::text)
       SubPlan 2
         ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.154..1.154 rows=1 loops=8761)
               ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.083..1.143 rows=0 loops=8761)
                     Recheck Cond: ((category)::text = 'A'::text)
                     Filter: (test1_id = $0)
                     ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.426..0.426 rows=1631 loops=8761)
                           Index Cond: ((category)::text = 'A'::text)
     Total runtime: 20557.581 ms
    

在这里,选择a, a花费的时间是select的两倍a,而实际上它们只能被计算一次。例如,使用SELECT a, a+b, a-b FROM testview1,它将通过子计划a3次并通过b两次,而执行时间可以减少为总时间的2/5(假设+和-在此处可以忽略)。

很好的是,当不需要时,不计算未使用的列(bc),但是有没有办法使它仅从视图中计算一次相同的已使用列?

编辑: @Frank Heikens正确建议使用索引,上面的示例中缺少该索引。尽管它确实提高了每个子计划的速度,但并不能防止多次计算同一子查询。抱歉,我应该将其放在最初的问题中以使其清楚。

Answers:


6

(为回答我自己的问题而道歉,但是在阅读了这个无关的问题和答案之后,我想到应该改为使用CTE。它可以工作。)

这是另一个视图,类似于testview1问题中的视图,但是使用了Common Table Expression

CREATE VIEW testview2 AS
    WITH testcte AS (SELECT
       t1.id,
       t1.log_timestamp,
       (SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
       (SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
       (SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
      FROM test1 t1)
    SELECT * FROM testcte;

(这只是一个例子,我并不是在建议将视图和CTE结合起来一定是一个好主意:CTE可能就足够了。)

与之前testview1的查询计划不同,SELECT a FROM testview2现在的查询计划还计算bc,由于未在中使用,因此将忽略它​​们testview1

Subquery Scan testview2  (cost=395272.42..395535.25 rows=8761 width=8) (actual time=0.256..607.941 rows=8761 loops=1)
   ->  CTE Scan on testcte  (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.255..604.106 rows=8761 loops=1)
         CTE testcte
           ->  Seq Scan on test1 t1  (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.252..589.358 rows=8761 loops=1)
                 SubPlan 1
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'A'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.009..0.009 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 2
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'B'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 3
                   ->  Aggregate  (cost=15.02..15.04 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=8) (actual time=0.013..0.014 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'C'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)

但是,它不会重新计算在同一查询中多次使用的结果(这是目标)。

不同于testview1SELECT a, a, a, a, a花更长的5倍以上SELECT a,在这里SELECT a, a, a, a, a, b, c, a+b, a+c, b+c FROM testview2就像长镜头作为SELECT a FROM testview2SELECT a, b, c FROM testview2。它仅通过ab并且c一次通过:

 Subquery Scan testview2  (cost=395272.42..395600.96 rows=8761 width=24) (actual time=0.147..562.790 rows=8761 loops=1)
   ->  CTE Scan on testcte  (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.144..554.194 rows=8761 loops=1)
         CTE testcte
           ->  Seq Scan on test1 t1  (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.140..542.657 rows=8761 loops=1)
                 SubPlan 1
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'A'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 2
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'B'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.006..0.006 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 3
                   ->  Aggregate  (cost=15.02..15.04 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'C'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)

1
不用道歉!:-)实际上,有一个徽章自助学习器可以回答您自己的问题。
Gaius

3

您需要在表test2中的test1_id上建立索引,这将改变事情。

Seq Scan on test1 t1  (cost=0.00..301450.63 rows=8761 width=12) (actual time=0.108..229.859 rows=8761 loops=1)
  SubPlan 1
    ->  Aggregate  (cost=11.45..11.46 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=8761)
          ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=8761)
                Recheck Cond: (test1_id = t1.id)
                Filter: ((category)::text = 'A'::text)
                ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                      Index Cond: (test1_id = t1.id)
  SubPlan 2
    ->  Aggregate  (cost=11.45..11.46 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=8761)
          ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=8761)
                Recheck Cond: (test1_id = t1.id)
                Filter: ((category)::text = 'B'::text)
                ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                      Index Cond: (test1_id = t1.id)
  SubPlan 3
    ->  Aggregate  (cost=11.46..11.47 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=8761)
          ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=8761)
                Recheck Cond: (test1_id = t1.id)
                Filter: ((category)::text = 'C'::text)
                ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                      Index Cond: (test1_id = t1.id)
Total runtime: 232.419 ms

谢谢,这确实确实提高了性能,但是实际上我的实际表已经具有该相应的索引。最初的问题仍然存在:SELECT a, a, a, a, a FROM testview1仍然比花费5倍的时间SELECT a FROM testview1
布鲁诺,
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.