在numpy数组中查找最接近的值

336

是否有numpy-thonic方法（例如函数）在数组中查找最接近的值？

例：

np.find_nearest( array, value )

python search numpy

— 福茶
source

515

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

value = 0.5

print(find_nearest(array, value))
# 0.568743859261

— Unutbu
source

52

@EOL：return np.abs(array-value).min()给出错误的答案。这为您提供了绝对值距离的最小值，并且我们需要以某种方式返回实际的数组值。我们可以添加value并接近，但是绝对值使事情

— 陷入僵局

9

@〜unutbu你是对的，我的坏。没有比您的解决方案更好的了！

— Eric O Lebigot

24

似乎很疯狂，没有内置的numpy可以做到这一点。

— dbliss 2015年

3

@jsmedmar对分方法（请参阅下面的答案）是O（log（n））。

— 乔什·艾伯特

4

FutureWarning: 'argmin' is deprecated. Use 'idxmin' instead. The behavior of 'argmin' will be corrected to return the positional minimum in the future. Use 'series.values.argmin' to get the position of the minimum now.

在上述解决方案中，使用idxmin代替argmin对我有用。（v3.6.4）

— jorijnsmit

78

如果您的数组已排序并且非常大，则这是一个更快的解决方案：

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

这可以扩展到非常大的阵列。如果您不能假定数组已经排序，则可以轻松修改上面的内容以对方法进行排序。对于小型阵列而言，这是过大的杀伤力，但是一旦阵列变大，速度就会更快。

— 德米特里
source

这听起来是最合理的解决方案。我不知道为什么它这么慢。Plain np.searchsorted对于我的测试仪来说大约需要2 µs，整个功能大约需要10 µs。使用np.abs它变得更糟。不知道python在做什么。

— 迈克尔

2

@Michael对于单个值，Numpy数学例程将比例程慢math，请参见此答案。

— 德米特里（Dimitri）

3

如果您有多个值要一次查找（有一些调整），这是最佳解决方案。整个if/else需要替换为idx = idx - (np.abs(value - array[idx-1]) < np.abs(value - array[idx])); return array[idx]

— coderforlife

3

很棒，但是如果value大于array的最大元素则不起作用。我更改了if声明以if idx == len(array) or math.fabs(value - array[idx - 1]) < math.fabs(value - array[idx])使其适合我！

— nicoco '16

3

当idx为0时，这将不起作用if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):

— 。if

52

稍作修改，上面的答案就可以用于任意维数（1d，2d，3d等）的数组：

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

或者，写成一行：

a.flat[np.abs(a - a0).argmin()]

— 夸德曼
source

6

不需要“扁平”位。a[np.abs(a-a0).argmin)]工作良好。

— Max Shron 2013年

2

实际上，这仍然仅适用于一维，因为argmin（）每一列/维给出多个结果。我也有错字。这至少适用于2个维度：a[np.sum(np.square(np.abs(a-a0)),1).argmin()]。

— Max Shron 2013年

3

因此，它不适用于较大的尺寸，应删除答案（或修改答案以反映这一点）

— Hugues Fontenelle 2014年

11

请提供一个示例，其中建议的答案无效。如果找到一个，我将修改我的答案。如果找不到，那么您可以删除评论吗？

— kwgoodman 2015年

18

答案摘要：如果已排序，array则二等分代码（如下所示）执行最快。大型阵列的速度要快〜100-1000倍，小型阵列的速度要快〜2-100倍。它也不需要numpy。如果您有一个未排序的，array则如果array为大，则应首先考虑使用O（n logn）排序，然后再按等分；如果array为小，则方法2似乎是最快的。

首先，您应该弄清最近值的含义。通常人们想要一个横坐标的间隔，例如array = [0,0.7,2.1]，value = 1.95，答案将是idx = 1。我怀疑您是这种情况（否则，一旦找到间隔，可以使用后续条件语句很容易地修改以下内容）。我将注意到，执行此操作的最佳方法是使用二分法（我将首先提供它-请注意，它根本不需要numpy，并且比使用numpy函数要快，因为它们执行冗余操作）。然后，我将与其他用户在此处介绍的其他项目进行时间比较。

二等分：

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

现在，我将从其他答案中定义代码，它们每个都返回一个索引：

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

现在，我将对代码进行计时：注意方法1,2,4,5没有正确给出间隔。方法1,2,4舍入到数组中的最近点（例如> = 1.5-> 2），方法5始终舍入（例如1.45-> 2）。只有方法3和6，当然还有二等分，才能正确给出间隔。

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

对于大型阵列，二等分与次优的180us和最长的1.21ms相比较给出4us（约快100-1000倍）。对于较小的阵列，速度要快2到100倍。

— 乔什·阿尔伯特
source

2

您假设数组已排序。有人不想对数组排序的原因有很多：例如，如果数组代表折线图上的数据点。

— user1917407

7

python标准库已经包含了对分算法的实现：docs.python.org/3.6/library/bisect.html

— Felix

当您说“如果array太小，则方法2似乎是最快的”。您的意思是@JoshAlbert多小？

— 宙斯先生

2

找不到最近的值，而是找到下一个最低的值。

— endolith '18

@endolith仅适用于bisect。

— Homero Esmeraldo

17

这是在向量数组中查找最近的向量的扩展。

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])

— Onasafari
source

我认为norm(..., axis=-1)应该比x,y通过Python迭代提取值更快。另外，x,y这里有标量吗？然后norm(x+y)是一个错误，因为例如距离(+1, -1)将被视为

— 0。– cfh

这对我idx = np.array([np.linalg.norm(x+y) for (x,y) in abs(array-value)]).argmin()

— 有用

9

如果您不想使用numpy，可以这样做：

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]

— 尼克·克劳福德
source

9

这是将处理非标量“值”数组的版本：

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

如果输入是标量，则返回一个数字类型（例如，int，float）的版本：

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

— 雷吉尔
source

好的答案，我以前从未使用outer过ufunc 的方法，我想以后会更多地使用它。array[indices]顺便说一句，第一个函数应该返回。

— Widjet

1

此解决方案无法扩展。 np.subtract.outer如果array和/或values非常大，将生成整个外部乘积矩阵，这确实很慢并且占用大量内存。

— anthonybell's

8

这是@Ari Onasafari的scipy版本，请回答“ 在向量数组中查找最近的向量 ”

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])

— 头孢氨苄
source

对于这样的问题，构建KDTree是相当大的开销。除非您必须在一个大数组上进行多个查询，否则我不建议这样的解决方案。然后，最好一次构建并重用它，而不是为每个查询动态创建它。

— 奔

8

如果您有很多values要搜索的东西，这是@Dimitri解决方案的快速向量化版本（values可以是多维数组）：

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")

    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1

    return array[idxs]

基准测试

比使用for@Demitri解决方案的循环快100倍以上

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

— 蒽铃
source

如果您在数组中进行恒定采样，它将变得更加简单：idx = np.searchsorted(array, values)然后：idx[array[idx] - values>np.diff(array).mean()*0.5]-=1最后return array[idx]

— 谢尔盖·安托波尔斯基

7

对于大型数组，@ Demitri给出的（出色）答案远远快于当前标记为最佳的答案。我通过以下两种方式调整了他的确切算法：

无论输入数组是否已排序，下面的函数均有效。
下面的函数返回与最接近的值相对应的输入数组的索引，该值更为通用。

请注意，下面的函数还处理特定的边缘情况，这会导致@Demitri编写的原始函数存在错误。否则，我的算法与他的算法相同。

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

— aph
source

1

值得指出的是，这是一个很好的例子，说明优化代码如何使其变得丑陋和难以阅读。在速度不是主要问题的情况下，@ unutbu给出的答案应该是（很多）首选，因为它要透明得多。

— 2015年

我看不到@Michael给出的答案。这是错误还是我盲目？

— Fookatchu 2015年

不，您不是盲目的，我只是文盲;-)我一直在回答@Demitri。我的错。我刚刚修好了职位。谢谢！

— 2015年

我对Demitri和您的答案有不同的答案。有任何想法吗？x = np.array([2038, 1758, 1721, 1637, 2097, 2047, 2205, 1787, 2287, 1940, 2311, 2054, 2406, 1471, 1460])。使用find_nearest(x, 1739.5)（最接近第一个分位数的值），我得到 1637（合理的）和1（错误的？）。

— PatrickT

3

这是unutbu答案的矢量化版本：

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

— 陈占文
source

2

我认为最Python化的方式是：

 num = 65 # Input number
 array = n.random.random((10))*100 # Given array 
 nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

这是基本代码。您可以根据需要将其用作功能

— 伊山·托玛（Ishan Tomar）
source

2

所有答案都有助于收集信息以编写高效的代码。但是，我编写了一个小的Python脚本来针对各种情况进行优化。如果对提供的数组进行了排序，那将是最好的情况。如果搜索指定值最近点的索引，则bisect模块是最省时的。当一个搜索索引对应于一个数组时，numpy searchsorted效率最高。

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

在[63]中：％time bisect.bisect_left（xlist，0.3）CPU时间：用户0 ns，sys：0 ns，总计：0 ns墙壁时间：22.2 µs

np.searchsorted(xar, 0.3, side="left")

在[64]中：％time np.searchsorted（xar，0.3，side =“ left”）CPU时间：用户0 ns，sys：0 ns，总计：0 ns挂墙时间：98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

％time np.searchsorted（xar，randpts，side =“ left”）CPU时间：用户4 ms，sys：0 ns，总计：4 ms挂墙时间：1.2 ms

如果我们遵循乘法规则，那么numpy应该花费〜100毫秒，这意味着〜83X更快。

— 苏门
source

1

对于2d数组，确定最近元素的i，j位置：

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

— 爱德华多·佩雷拉
source

0

import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

— 凯里姆·穆罕默德
source

1

嗨，欢迎来到Stack Overflow。查看如何写一个好的答案。尝试简短说明您在问题中所做的事情！

— 特里斯托（Tristo）

0

可能对ndarrays：

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

— 古塞夫·斯拉瓦（Gusev Slava）
source