postgres_fdw性能慢

以下针对外部对象的查询大约需要5秒钟才能在320万行上执行：

SELECT x."IncidentTypeCode", COUNT(x."IncidentTypeCode") 
FROM "IntterraNearRealTimeUnitReflexes300sForeign" x 
WHERE x."IncidentDateTime" >= '05/01/2016' 
GROUP BY x."IncidentTypeCode" 
ORDER BY 1;

当我在普通表上执行相同的查询时，它将在0.6秒内返回。执行计划完全不同：

普通表

Sort  (cost=226861.20..226861.21 rows=4 width=4) (actual time=646.447..646.448 rows=7 loops=1) 
  Sort Key: "IncidentTypeCode" 
  Sort Method: quicksort  Memory: 25kB 
  -> HashAggregate (cost=226861.12..226861.16 rows=4 width=4) (actual  time=646.433..646.434 rows=7 loops=1)
     Group Key: "IncidentTypeCode"
     -> Bitmap Heap Scan on "IntterraNearRealTimeUnitReflexes300s" x  (cost=10597.63..223318.41 rows=708542 width=4) (actual time=74.593..342.110 rows=709376 loops=1) 
        Recheck Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone) 
        Rows Removed by Index Recheck: 12259 
        Heap Blocks: exact=27052 lossy=26888
        -> Bitmap Index Scan on idx_incident_date_time_300  (cost=0.00..10420.49 rows=708542 width=0) (actual time=69.722..69.722 rows=709376 loops=1) 
           Index Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone) 

Planning time: 0.165 ms 
Execution time: 646.512 ms

外国表

Sort  (cost=241132.04..241132.05 rows=4 width=4) (actual time=4782.110..4782.112 rows=7 loops=1)   
  Sort Key: "IncidentTypeCode" 
  Sort Method: quicksort  Memory: 25kB
  -> HashAggregate  (cost=241131.96..241132.00 rows=4 width=4) (actual time=4782.097..4782.100 rows=7 loops=1)
     Group Key: "IncidentTypeCode"
     -> Foreign Scan on "IntterraNearRealTimeUnitReflexes300sForeign" x  (cost=10697.63..237589.25 rows=708542 width=4) (actual time=1.916..4476.946 rows=709376 loops=1) 

Planning time: 1.413 ms 
Execution time: 4782.660 ms

我认为我为该GROUP BY条款付出了高昂的代价，当我执行EXPLAIN VERBOSE以下操作时，该条款并未传递给外部服务器：

SELECT
    "IncidentTypeCode"
FROM
    PUBLIC ."IntterraNearRealTimeUnitReflexes300s"
WHERE
    (
        (
            "IncidentDateTime" >= '2016-05-01 00:00:00' :: TIMESTAMP WITHOUT TIME ZONE
        )
    )

这将返回700k行。有没有解决的办法？

昨天，我花了很多时间阅读本文档页面，并认为设置use_remote_estimate为true可以找到答案，但这没有效果。

如果需要，我确实可以访问外部服务器来创建对象。WHERE子句中的时间戳记值可以是任何值；它不是来自预定义值的列表。

— 道格
source

9.6中有一些下推方面的改进可能值得关注：wiki.postgresql.org/wiki/NewIn96#postgres_fdw

— 杰克说尝试topanswers.xyz

当您说普通表还是外部表时，您是在同一个表（本地和远程）上运行还是实际上是在不同的表（看起来好像它们在运行）上，如果它们不同，请检查远程服务器上的索引以确保它们相同因为您似乎正在阅读与完全不同的信息源IntterraNearRealTimeUnitReflexes300sForeign，IntterraNearRealTimeUnitReflexes300s而idx_incident_date_time_300 我认为300年代的信息源是相同的，但是可能值得检查idx_incident_date_time_300索引是否存在于外部服务器上

— Ste Bov

据我了解，聚合（COUNT）不会推送到远程服务器，这将解释请求时间长的原因。看来这个功能会出现在第10页- depesz.com/2016/10/25/...

— 杰罗姆·瓦格纳

@JeromeWAGNER-很棒

— J-DawG

如果您use_remote_estimate一定要运行ANALYZE外部表（我发现估算值与返回值非常接近，那么您可能已经做到了）。此外，下推改进功能在<9.5版本中不可用。我还假定您在远程服务器上也具有相同的表结构（包括索引）。如果由于低基数而需要位图，则由于下推机制的限制，它将不使用索引。您可能希望减少返回行的数量以强制执行BTREE索引扫描（时间戳范围）。不幸的是，如果过滤器返回表的+ 10％行，则没有一种避免远程服务器上SeqScan的干净方法（如果计划者认为扫描整个表比查找读取便宜，则可以改变此百分比）。如果您使用的是SSD，则可能会有所帮助（）进行调整random_page_cost。

您可以使用CTE隔离GROUP BY行为：

WITH atable AS (
    SELECT "IncidentTypeCode"
    FROM PUBLIC ."IntterraNearRealTimeUnitReflexes300s"
    WHERE 
       ("IncidentDateTime" 
              BETWEEN '2016-05-01 00:00:00'::TIMESTAMP WITHOUT TIME ZONE 
                  AND '2016-05-02 00:00:00'::TIMESTAMP WITHOUT TIME ZONE)
)
SELECT atable."IncidentTypeCode", COUNT(atable.IncidentTypeCode) 
FROM atable
GROUP BY atable."IncidentTypeCode" 
ORDER BY atable."IncidentTypeCode";

— 3个
source

使用CTE的性能相同。虽然可以尝试random_page_cost设置。谢谢！

— J-DawG '16