在ArcGIS Desktop中加快Python计算的时间戳字段?


9

我是Python的新手,已经开始为ArcGIS工作流创建脚本。我想知道如何加快我的代码从时间戳字段生成“小时”双数值字段的速度。我从DNR Garmin生成的跟踪点日志(面包屑跟踪)shapefile开始,其中包含获取每个跟踪点记录的LTIME时间戳字段(文本字段,长度20)。该脚本计算每个连续时间戳(“ LTIME”)之间的小时差异,并将其放入新字段(“ Hours”)中。

这样一来,我就可以总结出在特定区域/多边形中花费了多少时间。主要部分在print "Executing getnextLTIME.py script..." 以下代码之后:

# ---------------------------------------------------------------------------
# 
# Created on: Sept 9, 2010
# Created by: The Nature Conservancy
# Calculates delta time (hours) between successive rows based on timestamp field
#
# Credit should go to Richard Crissup, ESRI DTC, Washington DC for his
# 6-27-2008 date_diff.py posted as an ArcScript
'''
    This script assumes the format "month/day/year hours:minutes:seconds".
    The hour needs to be in military time. 
    If you are using another format please alter the script accordingly. 
    I do a little checking to see if the input string is in the format
    "month/day/year hours:minutes:seconds" as this is a common date time
    format. Also the hours:minute:seconds is included, otherwise we could 
    be off by almost a day.

    I am not sure if the time functions do any conversion to GMT, 
    so if the times passed in are in another time zone than the computer
    running the script, you will need to pad the time given back in 
    seconds by the difference in time from where the computer is in relation
    to where they were collected.

'''
# ---------------------------------------------------------------------------
#       FUNCTIONS
#----------------------------------------------------------------------------        
import arcgisscripting, sys, os, re
import time, calendar, string, decimal
def func_check_format(time_string):
    if time_string.find("/") == -1:
        print "Error: time string doesn't contain any '/' expected format \
            is month/day/year hour:minutes:seconds"
    elif time_string.find(":") == -1:
        print "Error: time string doesn't contain any ':' expected format \
            is month/day/year hour:minutes:seconds"

        list = time_string.split()
        if (len(list)) <> 2:
            print "Error time string doesn't contain and date and time separated \
                by a space. Expected format is 'month/day/year hour:minutes:seconds'"


def func_parse_time(time_string):
'''
    take the time value and make it into a tuple with 9 values
    example = "2004/03/01 23:50:00". If the date values don't look like this
    then the script will fail. 
'''
    year=0;month=0;day=0;hour=0;minute=0;sec=0;
    time_string = str(time_string)
    l=time_string.split()
    if not len(l) == 2:
        gp.AddError("Error: func_parse_time, expected 2 items in list l got" + \
            str(len(l)) + "time field value = " + time_string)
        raise Exception 
    cal=l[0];cal=cal.split("/")
    if not len(cal) == 3:
        gp.AddError("Error: func_parse_time, expected 3 items in list cal got " + \
            str(len(cal)) + "time field value = " + time_string)
        raise Exception
    ti=l[1];ti=ti.split(":")
    if not len(ti) == 3:
        gp.AddError("Error: func_parse_time, expected 3 items in list ti got " + \
            str(len(ti)) + "time field value = " + time_string)
        raise Exception
    if int(len(cal[0]))== 4:
        year=int(cal[0])
        month=int(cal[1])
        day=int(cal[2])
    else:
        year=int(cal[2])
        month=int(cal[0])
        day=int(cal[1])       
    hour=int(ti[0])
    minute=int(ti[1])
    sec=int(ti[2])
    # formated tuple to match input for time functions
    result=(year,month,day,hour,minute,sec,0,0,0)
    return result


#----------------------------------------------------------------------------

def func_time_diff(start_t,end_t):
    '''
    Take the two numbers that represent seconds
    since Jan 1 1970 and return the difference of
    those two numbers in hours. There are 3600 seconds
    in an hour. 60 secs * 60 min   '''

    start_secs = calendar.timegm(start_t)
    end_secs = calendar.timegm(end_t)

    x=abs(end_secs - start_secs)
    #diff = number hours difference
    #as ((x/60)/60)
    diff = float(x)/float(3600)   
    return diff

#----------------------------------------------------------------------------

print "Executing getnextLTIME.py script..."

try:
    gp = arcgisscripting.create(9.3)

    # set parameter to what user drags in
    fcdrag = gp.GetParameterAsText(0)
    psplit = os.path.split(fcdrag)

    folder = str(psplit[0]) #containing folder
    fc = str(psplit[1]) #feature class
    fullpath = str(fcdrag)

    gp.Workspace = folder

    fldA = gp.GetParameterAsText(1) # Timestamp field
    fldDiff = gp.GetParameterAsText(2) # Hours field

    # set the toolbox for adding the field to data managment
    gp.Toolbox = "management"
    # add the user named hours field to the feature class
    gp.addfield (fc,fldDiff,"double")
    #gp.addindex(fc,fldA,"indA","NON_UNIQUE", "ASCENDING")

    desc = gp.describe(fullpath)
    updateCursor = gp.UpdateCursor(fullpath, "", desc.SpatialReference, \
        fldA+"; "+ fldDiff, fldA)
    row = updateCursor.Next()
    count = 0
    oldtime = str(row.GetValue(fldA))
    #check datetime to see if parseable
    func_check_format(oldtime)
    gp.addmessage("Calculating " + fldDiff + " field...")

    while row <> None:
        if count == 0:
            row.SetValue(fldDiff, 0)
        else:
            start_t = func_parse_time(oldtime)
            b = str(row.GetValue(fldA))
            end_t = func_parse_time(b)
            diff_hrs = func_time_diff(start_t, end_t)
            row.SetValue(fldDiff, diff_hrs)
            oldtime = b

        count += 1
        updateCursor.UpdateRow(row)
        row = updateCursor.Next()

    gp.addmessage("Updated " +str(count+1)+ " rows.")
    #gp.removeindex(fc,"indA")
    del updateCursor
    del row

except Exception, ErrDesc:
    import traceback;traceback.print_exc()

print "Script complete."

1
不错的程序!我还没有任何东西可以加快计算速度。现场计算器永远需要!!
布拉德·尼索姆

Answers:


12

在地理处理环境中,光标总是非常缓慢。解决此问题的最简单方法是将Python代码块传递到CalculateField地理处理工具中。

这样的事情应该起作用:

import arcgisscripting
gp = arcgisscripting.create(9.3)

# Create a code block to be executed for each row in the table
# The code block is necessary for anything over a one-liner.
codeblock = """
import datetime
class CalcDiff(object):
    # Class attributes are static, that is, only one exists for all 
    # instances, kind of like a global variable for classes.
    Last = None
    def calcDiff(self,timestring):
        # parse the time string according to our format.
        t = datetime.datetime.strptime(timestring, '%m/%d/%Y %H:%M:%S')
        # return the difference from the last date/time
        if CalcDiff.Last:
            diff =  t - CalcDiff.Last
        else:
            diff = datetime.timedelta()
        CalcDiff.Last = t
        return float(diff.seconds)/3600.0
"""

expression = """CalcDiff().calcDiff(!timelabel!)"""

gp.CalculateField_management(r'c:\workspace\test.gdb\test','timediff',expression,   "PYTHON", codeblock)

显然,您必须对其进行修改以接受字段和参数,但它应该非常快。

请注意,尽管您的日期/时间解析函数实际上比strptime()函数快了一些,但标准库几乎始终没有错误。


谢谢大卫。我没有意识到CalculateField更快。我将尝试对此进行测试。我认为可能存在的唯一问题是数据集可能是乱序的。有时会发生这种情况。有没有一种方法可以先对LTIME字段进行升序排序,然后应用CalculateField,或告诉CalculateField按特定顺序执行?
罗素

仅需注意,在大多数情况下,调用预设的gp函数会更快。我在先前的文章gis.stackexchange.com/questions/8186/…中
Ragi Yaser Burhum 2011年

+1使用datetime内置包,因为它提供了强大的功能并且几乎替代了时间/日历包
Mike T

1
太不可思议了!我尝试了您的代码,并将其与@OptimizePrime的“内存中”建议集成在一起,该脚本的平均运行时间从55秒缩短到2秒(810条记录)。这正是我一直在寻找的东西。非常感谢。我学到了很多。
罗素

3

@David为您提供了非常干净的解决方案。+1,用于利用arcgisscripting代码库的优势。

另一种选择是使用以下方法将数据集复制到内存中:

  • gp.CopyFeatureclass(“源的路径”,“ in_memory \复制的要素名称”)-用于地理数据库要素类,shapefile或
  • gp.CopyRows(“来源的路径”,)-用于地理数据库表,dbf等

这样可以消除您从ESRI COM代码库请求游标时产生的开销。

开销来自python数据类型到C数据类型的转换以及对ESRI COM代码库的访问。

当您将数据存储在内存中时,就减少了访问磁盘的需求(一个高成本的过程)。此外,使用arcgisscripting时,还减少了python和C / C ++库传输数据的需求。

希望这可以帮助。


1

从ArcGIS 10.1 for Desktop开始,使用arcgisscripting替代旧样式UpdateCursor的绝佳替代方法就是arcpy.da.UpdateCursor

我发现这些速度通常快10倍左右。

撰写此问题时,这些选项可能/可能不是一种选择,但现在阅读此问答的任何人都不应忽视这些选项。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.