python numpy.where（）如何工作？

94

我正在玩耍numpy并浏览文档，并且遇到了一些魔术。即我正在谈论numpy.where()：

>>> x = np.arange(9.).reshape(3, 3)
>>> np.where( x > 5 )
(array([2, 2, 2]), array([0, 1, 2]))

它们如何在内部实现您能够将类似的东西传递x > 5给方法的功能？我想这与它有关，__gt__但是我正在寻找详细的解释。

python numpy magic-methods

— 佩顿
source

75

他们如何在内部实现将x> 5之类的内容传递给方法的能力？

简短的答案是他们没有。

对numpy数组进行的任何逻辑运算都会返回布尔数组。（即__gt__，，__lt__等等都返回给定条件为true的布尔数组）。

例如

x = np.arange(9).reshape(3,3)
print x > 5

产量：

array([[False, False, False],
       [False, False, False],
       [ True,  True,  True]], dtype=bool)

这就是为什么类似的东西if x > 5:如果x是一个numpy数组会引发ValueError的原因。它是True / False值的数组，而不是单个值。

此外，numpy数组可以由布尔数组索引。例如，在这种情况下，x[x>5]yields [6 7 8]。

老实说，您实际需要的很少，numpy.where但它只返回布尔数组为的索引True。通常，您可以使用简单的布尔索引来完成所需的操作。

— 乔·金顿
source

10

只是指出，numpy.where确实有2“操作模式”，第一个返回indices，其中condition is True如果可选参数x和y存在（相同的形状condition，或broadcastable这种形状！），它将从返回值x时condition is True以其他方式从y。因此，这使它where具有更多用途，并使其可以更经常使用。谢谢

— 吃

1

在某些情况下，使用over 或__getitem__语法也可能会产生开销。由于还必须支持切片，因此会有一些开销。在使用Python Pandas数据结构和逻辑索引非常大的列时，我已经看到了明显的速度差异。在这种情况下，如果你不需要切片，然后和实际上是更好的。[]numpy.wherenumpy.take__getitem__takewhere

— 2012年

24

旧答案， 这有点令人困惑。它为您提供了陈述正确的位置（所有位置）。

所以：

>>> a = np.arange(100)
>>> np.where(a > 30)
(array([31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
       48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
       65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
       82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
       99]),)
>>> np.where(a == 90)
(array([90]),)

a = a*40
>>> np.where(a > 1000)
(array([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
       43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
       60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
       77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
       94, 95, 96, 97, 98, 99]),)
>>> a[25]
1000
>>> a[26]
1040

我将它用作list.index（）的替代方法，但它还有许多其他用途。我从未将其用于2D阵列。

http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

新答案 似乎这个人在问一些更基本的问题。

问题是您如何实现允许功能（例如在哪里）知道所请求内容的东西。

首先请注意，调用任何比较运算符都会做一件有趣的事情。

a > 1000
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True`,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)`

这是通过重载“ __gt__”方法来完成的。例如：

>>> class demo(object):
    def __gt__(self, item):
        print item


>>> a = demo()
>>> a > 4
4

如您所见，“ a> 4”是有效代码。

您可以在此处获得所有重载函数的完整列表和文档：http : //docs.python.org/reference/datamodel.html

令人难以置信的是，这样做非常简单。python中的所有操作都是以这种方式完成的。说a> b等于a。gt（b）！

— 加勒特·伯格
source

3

但是，这种比较运算符重载似乎不适用于更复杂的逻辑表达式-例如，我不能这样做，np.where(a > 30 and a < 50)或者np.where(30 < a < 50)因为它最终试图评估两个布尔数组的逻辑AND，这毫无意义。有没有办法写这样的条件np.where？

— davidA

@meowsqueaknp.where((a > 30) & (a < 50))

— tibalt

为什么在示例中np.where（）返回列表？

— 安德烈亚斯（Andreas Yankopolus），

0

np.where返回一个元组，其长度等于在其上被调用的numpy ndarray的维数（换句话说ndim），并且元组的每个项目都是一个初始ndarray中条件为True的所有值的索引的numpy ndarray。（请不要将尺寸与形状混淆）

例如：

x=np.arange(9).reshape(3,3)
print(x)
array([[0, 1, 2],
      [3, 4, 5],
      [6, 7, 8]])
y = np.where(x>4)
print(y)
array([1, 2, 2, 2], dtype=int64), array([2, 0, 1, 2], dtype=int64))

y是长度为2的元组，因为x.ndim为2。元组的第一项包含所有大于4的元素的行号，第二项包含所有大于4的元素的列号。如您所见，[1,2,2 ，2]对应于5,6,7,8的行号，[2,0,1,2]对应于5,6,7,8的列号注意，ndarray沿第一维（行方向）遍历）。

同样，

x=np.arange(27).reshape(3,3,3)
np.where(x>4)

将返回长度为3的元组，因为x具有3个维度。

但是，等等，np.where还有更多！

当两个附加参数被添加到np.where; 它将对上述元组获得的所有那些成对的行-列组合执行替换操作。

x=np.arange(9).reshape(3,3)
y = np.where(x>4, 1, 0)
print(y)
array([[0, 0, 0],
   [0, 0, 1],
   [1, 1, 1]])

— 皮尤什·辛格（Piyush Singh）
source