了解RTree使用空间索引的情况？

我在理解RTree的空间索引使用方面遇到麻烦。

示例：我有300个缓冲点，我需要用多边形shapefile知道每个缓冲区的相交区域。多边形shapefile具有> 20,000个多边形。建议我使用空间索引来加快此过程。

所以...如果我为多边形shapefile创建空间索引，它会以某种方式“附加”到文件，还是会独立存在？也就是说，创建完之后，我可以在多边形文件上运行交集函数并获得更快的结果吗？交叉点会“看到”有空间索引并且知道该怎么做吗？还是我需要在索引上运行它，然后通过FID或类似的方法将那些结果关联回我的原始多边形文件？

RTree文档对我没有太大帮助（可能是因为我只是在学习编程）。它们显示了如何通过读取手动创建的点，然后针对其他手动创建的点查询索引来创建索引，这些其他点返回包含在窗口中的ID。说得通。但是，他们没有解释这与索引将来自的原始文件之间的关系。

我认为它必须是这样的：

从我的多边形shapefile中为每个多边形要素拉出bbox，并将其放置在空间索引中，为其提供一个与shapefile中的id相同的ID。
查询该索引以获取相交的ID。
然后，仅对通过查询索引确定的原始shapefile中的要素重新运行交集（不知道我将如何做最后一部分）。

我有正确的主意吗？我有什么想念的吗？

现在，我正在尝试使此代码在仅包含一个点要素的一个点shapefile和一个包含> 20,000个面要素的多边形shapefile上工作。

我正在使用Fiona导入shapefile，使用RTree添加空间索引，并尝试使用Shapely进行交点。

我的测试代码如下：

#point shapefile representing location of desired focal statistic
traps = fiona.open('single_pt_speed_test.shp', 'r') 

#polygon shapefile representing land cover of interest 
gl = MultiPolygon([shape(pol['geometry']) for pol in fiona.open('class3_aa.shp', 'r')]) 

#search area
areaKM2 = 20

#create empty spatial index
idx = index.Index()

#set initial search radius for buffer
areaM2 = areaKM2 * 1000000
r = (math.sqrt(areaM2/math.pi))

#create spatial index from gl
for i, shape in enumerate(gl):
    idx.insert(i, shape.bounds)

#query index for ids that intersect with buffer (will eventually have multiple points)
for point in traps:
        pt_buffer = shape(point['geometry']).buffer(r)
        intersect_ids = pt_buffer.intersection(idx)

但是我不断收到TypeError：'Polygon'对象不可调用

— 公务员事务局
source

空间索引对于数据集是不可分割的且透明的（从用户的角度来看，它是包含的，而不是单个实体）执行交叉路口的软件会意识到这一点，并将使用空间索引来创建一个简短列表，以通过快速通知来执行真实的交叉路口该软件应考虑哪些功能以进行更仔细的检查，而哪些功能显然相交不远。如何创建一个取决于您的软件和数据类型...请提供有关这些点的更多信息，以获取更多具体帮助。对于形状文件，它是.shx文件。

— Michael Stimson 2014年

.shx不是空间索引。它只是可变宽度记录动态访问偏移文件。.sbn / .sbx是ArcGIS shapefile空间索引对，但是尚未发布这些规范。

— 文斯2014年

也是.qixMapServer / GDAL / OGR / SpatiaLite 四叉树索引

— Mike T

您的想法完全适用于没有实际空间索引的Spatialite。如果大多数其他格式完全支持空间索引，则应透明地执行。

— user30184 2014年

您不断TypeError: 'Polygon' object is not callable使用更新示例，因为shape用此行创建的Polygon对象覆盖了从形状导入的函数：for i, shape in enumerate(gl):

— user2856

Answers:

这就是要旨。R树可让您快速完成第一遍，并为您提供一组结果，这些结果将具有“假阳性”（边界框可能在几何图形完全不相交时相交）。然后，遍历一组候选对象（通过它们的索引从shapefile中获取它们），并使用例如Shapely进行数学上精确的相交测试。这与在PostGIS等空间数据库中采用的策略完全相同。

— gil
source

双关语（GiST）！GiST通常被描述为B树的变体，但是Postgresql具有R树的GiST实现。尽管Wiki不一定是引用的最佳参考，但它确实有一个很好的图表来说明边界框搜索。

— MappaGnosis

值得在第2步和第3步中学习使用R-tree索引的手动方法。有关OGC GeoPackage的博客也支持R-tree，因为单独的数据库表显示了一些SQL和屏幕截图openjump.blogspot.fi / 2014/02 /…。

— user30184

您差不多了，但是您犯了一个小错误。您需要intersection在空间索引上使用该方法，而不是将索引传递给intersection缓冲点上的方法。找到边界框重叠的要素列表后，您需要检查缓冲点是否确实与几何相交。

import fiona
from shapely.geometry import mapping
import rtree
import math

areaM2 = areaKM2 * 1000000
r = (math.sqrt(areaM2/math.pi))

# open both layers
with fiona.open('single_pt_speed_test.shp', 'r') as layer_pnt:
    with fiona.open('class3_aa.shp', 'r') as layer_land:

        # create an empty spatial index object
        index = rtree.index.Index()

        # populate the spatial index
        for fid, feature in layer_land.items():
            geometry = shape(feature['geometry'])
            idx.insert(fid, geometry.bounds)

        for feature in layer_pnt:
            # buffer the point
            geometry = shape(feature['geometry'])
            geometry_buffered = geometry.buffer(r)

            # get list of fids where bounding boxes intersect
            fids = [int(i) for i in index.intersection(geometry_buffered.bounds)]

            # access the features that those fids reference
            for fid in fids:
                feature_land = layer_land[fid]
                geometry_land = shape(feature_land['geometry'])

                # check the geometries intersect, not just their bboxs
                if geometry.intersects(geometry_land):
                    print('Found an intersection!')  # do something useful here

如果您有兴趣查找距离您的土地类别最小距离的点，则可以改用该distance方法（从上一章节中替换掉相应的部分）。

for feature in layer_pnt:
    geometry = shape(feature['geometry'])

    # expand bounds by r in all directions
    bounds = [a+b*r for a,b in zip(geometry.bounds, [-1, -1, 1, 1])]

    # get list of fids where bounding boxes intersect
    fids = [int(i) for i in index.intersection(geometry_buffered.bounds)]

    for fid in fids:
        feature_land = layer_land[fid]
        geometry_land = shape(feature_land['geometry'])

        # check the geometries are within r metres
        if geometry.distance(geometry_land) <= r:
            print('Found a match!')

如果构建空间索引需要很长时间，并且要多次进行，则应考虑将索引序列化为文件。该文档介绍了如何执行此操作：http : //toblerity.org/rtree/tutorial.html#serializing-your-index-to-a-file

您还可以考虑使用生成器将边界框批量加载到rtree中，如下所示：

def gen(collection):
    for fid, feature in collection.items():
        geometry = shape(feature['geometry'])
        yield((fid, geometry.bounds, None))
index = rtree.index.Index(gen(layer_land))

— 食囊龙
source

是的，这就是主意。这是本教程的摘录，内容涉及在Python中使用shape，Fiona和geopandas 的r树空间索引：

r树表示单个对象及其边界框（“ r”表示“矩形”）是空间索引的最低级别。然后，它聚合附近的对象，并在索引的下一个更高级别中用其聚合边界框表示它们。在更高的级别上，r树会聚合边界框，并以其边界框迭代地表示边界框，直到将所有内容嵌套到一个顶级边界框中为止。要进行搜索，r树将使用一个查询框，并从顶层开始，查看哪些边界框（如果有）与其相交。然后，它展开每个相交的边界框，并查看其中的哪个子边界框与查询框相交。这将递归进行，直到所有相交的框都向下搜索到最低级别，然后从最低级别返回匹配的对象。

— eos
source