看来现在的性能差异要小得多(0.21.1-我忘记了原始示例中的Pandas版本)。词典的访问和之间不仅性能差距.loc
减小(从约335倍到126倍速度较慢), (loc
)iloc
小于2倍慢于at
(iat
)现在。
In [1]: import numpy, pandas
...: ...: df = pandas.DataFrame(numpy.zeros(shape=[10, 10]))
...: ...: dictionary = df.to_dict()
...:
In [2]: %timeit value = dictionary[5][5]
85.5 ns ± 0.336 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %timeit value = df.loc[5, 5]
10.8 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [4]: %timeit value = df.at[5, 5]
6.87 µs ± 64.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: %timeit value = df.iloc[5, 5]
14.9 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: %timeit value = df.iat[5, 5]
9.89 µs ± 54.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [7]: print(pandas.__version__)
0.21.1
----以下原始答案----
+1用于使用at
或iat
用于标量运算。基准示例:
In [1]: import numpy, pandas
...: df = pandas.DataFrame(numpy.zeros(shape=[10, 10]))
...: dictionary = df.to_dict()
In [2]: %timeit value = dictionary[5][5]
The slowest run took 34.06 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 310 ns per loop
In [4]: %timeit value = df.loc[5, 5]
10000 loops, best of 3: 104 µs per loop
In [5]: %timeit value = df.at[5, 5]
The slowest run took 6.59 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.26 µs per loop
In [6]: %timeit value = df.iloc[5, 5]
10000 loops, best of 3: 98.8 µs per loop
In [7]: %timeit value = df.iat[5, 5]
The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.58 µs per loop
似乎使用at
(iat
)比loc
(iloc
)快10倍。