计算要素类中要素数量的最快方法?


35

随着arcpy中数据访问模块的引入(搜索光标速度提高了30倍),我想知道与sql条件匹配的功能计数是否比传统的MakeTableView + GetCount方法更快?


12
功能计数不仅是arcpy的属性有多愚蠢。描述对象
Grant Humphries

使用带有某些OGR SQL的ogrinfo,这非常容易。数据集有大约170000条记录,并且在未索引字段上进行的通配符搜索仅在几秒钟后返回。VARCHARogrinfo "C:\xGIS\Vector\parcels\parcels_20140829_pmerc.ovf -sql "SELECT count(*) FROM parcels_20140829_pmerc WHERE tms like 'R39200-02-%'"
elrobis 2014年

Answers:


2

我已经从上面的答案中测试了解决方案并且在我的真实世界数据上,差异可以忽略不计。与其他答案的结果相反,我在ArcMap中使用arcpy.MakeTableView_management和arcpy.da.SearchCursor的时间相同。

我已经测试了有无查询的变体形式,请在下面查看查询版本的代码以及最终的测量结果:

@staticmethod
def query_features(feature_class, query):

    # Method 1
    time.sleep(5)  # Let the cpu/ram calm before proceeding!
    start_time = time.clock()
    count = len(list(i for i in arcpy.da.SearchCursor(feature_class, ["OBJECTID"], query)))
    end_time = time.clock()
    arcpy.AddMessage("Method 1 finished in {} seconds".format((end_time - start_time)))
    arcpy.AddMessage("{} features".format(count))

    # Method 2
    time.sleep(5)  # Let the cpu/ram calm before proceeding!
    start_time = time.clock()
    arcpy.MakeTableView_management(feature_class, "myTableView", query)
    count = int(arcpy.GetCount_management("myTableView").getOutput(0))

    end_time = time.clock()
    arcpy.AddMessage("Method 2 in {} seconds".format((end_time - start_time)))
    arcpy.AddMessage("{} features".format(count))

结果如下:

    No query:
    Method 1 finished in 5.3616442 seconds
    804140 features
    Method 2 in 4.2843138 seconds
    804140 features

    Many results query:
    Method 1 finished in 12.7124766 seconds
    518852 features
    Method 2 in 12.1396602 seconds
    518852 features

    Few results query:
    Method 1 finished in 11.1421476 seconds
    8 features
    Method 2 in 11.2232503 seconds
    8 features

好了,距离回答问题已有7年了,所以我希望他们对他们的SDK进行了改进!!!=)感谢您自己测试Miro。
Michael Markieta

47

我正在使用一个示例,该示例在filegeodatabase内部具有一百万个随机生成的点。附在这里

这是一些使我们入门的代码:

import time
import arcpy

arcpy.env.workspace = "C:\CountTest.gdb"

time.sleep(5) # Let the cpu/ram calm before proceeding!

"""Method 1"""
StartTime = time.clock()
with arcpy.da.SearchCursor("RandomPoints", ["OBJECTID"]) as cursor:
    rows = {row[0] for row in cursor}

count = 0
for row in rows:
    count += 1

EndTime = time.clock()
print "Finished in %s seconds" % (EndTime - StartTime)
print "%s features" % count

time.sleep(5) # Let the cpu/ram calm before proceeding!

"""Method 2"""
StartTime2 = time.clock()
arcpy.MakeTableView_management("RandomPoints", "myTableView")
count = int(arcpy.GetCount_management("myTableView").getOutput(0))

EndTime2 = time.clock()
print "Finished in %s seconds" % (EndTime2 - StartTime2)
print "%s features" % count

和一些初步结果:

>>> 
Finished in 6.75540050237 seconds
1000000 features
Finished in 0.801474780332 seconds
1000000 features
>>> =============================== RESTART ===============================
>>> 
Finished in 6.56968596918 seconds
1000000 features
Finished in 0.812731769756 seconds
1000000 features
>>> =============================== RESTART ===============================
>>> 
Finished in 6.58207512487 seconds
1000000 features
Finished in 0.841122157314 seconds
1000000 features

想象更大,更复杂的数据集。SearchCursor将无限期爬网。

我对结果并不满意,但是,DataAccess模块​​已在我们的GIS开发圈中广泛使用。我正在尝试使用此模块重建一些函数定义,因为它比MakeTableView + GetCount方法更灵活。


不错的综述。为了完整起见,我想添加IMO应该最快的方法,但实际上是最慢的方法(慢10倍)。arcpy.Statistics_analysis("RandomPoints", r"in_memory\count", [["OBJECTID", "COUNT"]]) cursor = arcpy.da.SearchCursor(r"in_memory\count", ["COUNT_OBJECTID"]) row = cursor.next() del cursor count = row[0]
Berend 2015年
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.