ST_Intersection慢查询

11

我正在尝试在两层之间执行交集：

代表某些道路的折线图层（约5500行）
代表不同兴趣点（约47,000行）周围形状不规则缓冲区的多边形层

最终，我想做的就是将折线剪切到许多（有时是重叠的）缓冲区中，然后对每个缓冲区中包含的道路总长度求和。

问题是事情运行缓慢。我不确定应该花多长时间，但在超过34小时后才中止查询。我希望有人可以指出我的SQL查询出错了，或者可以指出一种更好的方法。

CREATE TABLE clip_roads AS

SELECT 
  ST_Intersection(b.the_geom, z.the_geom) AS clip_geom,
  b.*

FROM 
  public."roads" b, 
  public."buffer1KM" z

WHERE ST_Intersects(b.the_geom, z.the_geom);


CREATE INDEX "clip_roads_clip_geom_gist"
  ON "clip_roads"
  USING gist
  (clip_geom);



CREATE TABLE buffer1km_join AS

SELECT
  z.name, z.the_geom,
  sum(ST_Length(b.clip_geom)) AS sum_length_m

FROM
  public."clip_roads" b,
  public."buffer1KM" z

WHERE
  ST_Contains(z.the_geom, b.the_geom)

GROUP BY z.name, z.the_geom;

我确实为原始路表创建了一个GiST索引，并且（为了安全起见？）在创建第二个表之前创建了一个索引。

PGAdmin III的查询计划看起来像这样，尽管恐怕我没有太多的解释能力：

"Nested Loop  (cost=0.00..29169.98 rows=35129 width=49364)"
"  Output: st_intersection(b.the_geom, z.the_geom), b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  Join Filter: _st_intersects(b.the_geom, z.the_geom)"
"  ->  Seq Scan on public."roads" b  (cost=0.00..306.72 rows=5472 width=918)"
"        Output: b.gid, b.geo_id, b.address_l, b.address_r, b.lf_name, b.lfn_id, b.lfn_name, b.lfn_type_c, b.lfn_type_d, b.lfn_dir_co, b.lfn_dir_de, b.lfn_desc, b.oe_flag_l, b.oe_flag_r, b.fcode_desc, b.fcode, b.fnode, b.tnode, b.metrd_num, b.lo_num_l, b.lo_n_suf_l, b.hi_num_l, b.hi_n_suf_l, b.lo_num_r, b.lo_n_suf_r, b.hi_num_r, b.hi_n_suf_r, b.juris_code, b.dir_code, b.dir_code_d, b.cp_type, b.length, b.the_geom"
"  ->  Index Scan using "buffer1KM_index_the_geom" on public."buffer1KM" z  (cost=0.00..3.41 rows=1 width=48446)"
"        Output: z.gid, z.objectid, z.facilityid, z.name, z.frombreak, z.tobreak, z.postal_cod, z.pc_area, z.ct_id, z.da_id, z.taz_id, z.edge_poly, z.cchs_0708, z.tts_06, z.the_geom"
"        Index Cond: (b.the_geom && z.the_geom)"

这项操作是否注定要运行几天？我目前正在Windows的PostGIS上运行此程序，但从理论上讲，可以将其放在Amazon EC2上，以解决此问题。但是，我看到查询一次只使用一个核心（是否有办法使其使用更多？）。

postgis sql optimization

— 彼得
source

Postgis正在运行什么？操作系统和处理器可能是一个因素。

— Mapperz

嗨，Mapperz：OS为Windows 7，CPU为Core 2 Duo，内存为4GB（Windows，运行32位PGSQL / PostGIS）

— Peter

6

彼得，

您使用的是哪个版本的PostGIS，GEOS和PostgreSQL？
做一个

SELECT postgis_full_version（），version（）;

在1.4和1.5以及GEOS 3.2+之间已经做了很多改进。

另外，您的多边形有多少个顶点？

做一个

SELECT Max（ST_NPoints（the_geom））作为maxp FROM sometable;

了解最坏的情况。像这样的慢速速度通常是由于几何形状最终无法实现而引起的。在这种情况下，您可能首先要简化。

您还优化了postgresql.conf文件吗？

— LR1234567
source

LR1234567，您好：“ POSTGIS =” 1.5.2“ GEOS =” 3.2.2-CAPI-1.6.2“ PROJ =” Rel。2008年8月21日，4.6.1“” LIBXML =“ 2.7.6” USE_STATS“;” PostgreSQL 9.0.3，由Visual C ++ build 1500编译，32位“（现在运行另一个查询）

— Peter

Max查询的运行速度比我预期的要快：maxp = 2030我怀疑粒度还不错吗？

— 彼得

1

2,030实际上还不错。可能是您有很多相交的多边形。通常，相交是最慢的部分。尝试对实际相交的记录数进行计数-它可能很大。

— LR1234567

SELECT count（*）from public。“ roads” b，public。“ buffer1KM” z WHERE ST_Intersects（b.the_geom，z.the_geom）;

— LR1234567

1

910,978巨大吗？这是关于开始一项新技术的一件好事–我没有规范的期望:-)

— Peter

1

有用的堆栈交换答案：https : //stackoverflow.com/questions/1162206/why-is-postgresql-so-slow-on-windows

调整postgres：http : //wiki.postgresql.org/wiki/Performance_Optimization

根据经验推荐VACUUM ANALYZE

— Mapperz
source

谢谢，这听起来像是个好建议。某些Windows问题（例如fork（）惩罚）在这里不应该成为问题，因为我正在运行单个连接，对吗？另外，运行VACUUM ANALYZE。我还没有进行任何性能优化。

— 彼得

1

通常，shared_buffers和work_mem的影响最大。对于shared_buffers，与在Linux上相比，您在Windows上可以使用的

— 缓冲区

shared_buffers已经打开，但work_mem已关闭。我现在添加了1 GB的工作内存。

— 彼得

1

无耻的插件:)可能有助于阅读本书的第8章和第9章。只是烫一下压力机。在这些章节中，我们涵盖了许多此类问题。

http://www.postgis.us/chapter_08

http://www.postgis.us/chapter_09

— LR1234567
source

链接断开了，这是指操作中的PostGIS还是PostGIS Cookbook？

— HeikkiVesanto 2014年

1

啊，你是对的。这些是到PostGIS In Action的第一版的链接-当时是有效的。当我们引入第二版时，我们不得不更改链接结构。那些旧链接现在在这里：postgis.us/chapters_edition_1

— LR1234567

0

请参阅两个技巧以优化空间查询。他们对我很好。 http://kb.zillionics.com/optimize-spatial-query/

— gle
source

2

如果有更多详细信息（例如如何在特定情况下应用它们），此答案会更好。

— BradHards 2013年