找到多个集合的交集的最佳方法？

265

我有一套清单：

setlist = [s1,s2,s3...]

我要s1∩s2∩s3 ...

我可以编写一个函数来执行一系列逐对操作s1.intersection(s2)，等等。

有没有推荐，更好或内置的方法？

python set set-intersection

— 用户名
source

453

从Python 2.6版开始，您可以对使用多个参数set.intersection()，例如

u = set.intersection(s1, s2, s3)

如果这些集合在列表中，则表示：

u = set.intersection(*setlist)

这里*a_list是列表扩展

请注意，set.intersection是不是一个静态的方法，但这种使用功能符号应用第一套交叉口列表的其余部分。因此，如果参数列表为空，则将失败。

— 某物
source

64

从2.6开始，set.intersection任意可迭代。

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])

— 迈克·格雷厄姆
source

24

显然，set.intersection这里是您想要的，但是如果您需要概括“取所有这些和”，“取所有这些的乘积”，“取所有这些的异或”，则您想要的是reduce功能：

from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

要么

print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

— 托马斯·阿勒
source

12

如果您没有Python 2.6或更高版本，则可以选择编写一个显式的for循环：

def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

您也可以使用reduce：

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

但是，许多Python程序员都不喜欢它，包括Guido本人：

大约12年前，Python获得了lambda，reduce（），filter（）和map（），这是由（我相信）一个Lisp黑客（他错过了它们并提交了工作补丁）提供的。但是，尽管具有PR值，但我认为应该从Python 3000中删除这些功能。

所以现在reduce（）。这实际上是我一直最讨厌的一个，因为除了一些涉及+或*的示例外，几乎每次我看到带有非平凡函数参数的reduce（）调用时，我都需要拿笔和纸来在我了解reduce（）应该做什么之前，请先绘制出该函数实际输入的内容。因此，在我看来，reduce（）的适用性几乎仅限于关联运算符，在所有其他情况下，最好显式地写出累加循环。

— 艾曼·霍里（Ayman Hourieh）
source

8

请注意，Guido表示使用reduce“仅限于关联运算符”，在这种情况下适用。reduce通常很难弄清楚，但是&还算不错。

— Mike Graham

set_list and reduce(set.intersection, set_list)

— jfs 2012年

请查看python.org/doc/essays/list2str以获取涉及reduce的有用优化。通常，它可以很好地用于构建列表，集合，字符串等。值得一看的是github.com/EntilZha/PyFunctional

— Andreas

请注意，您可以通过在result为空时中断循环来进行优化。

— bfontaine

1

在这里，我为多个集合交集提供了一个通用函数，试图利用现有的最佳方法：

def multiple_set_intersection(*sets):
    """Return multiple set intersection."""
    try:
        return set.intersection(*sets)
    except TypeError: # this is Python < 2.6 or no arguments
        pass

    try: a_set= sets[0]
    except IndexError: # no arguments
        return set() # return empty set

    return reduce(a_set.intersection, sets[1:])

Guido可能不喜欢reduce，但我对此很喜欢:)

— 佐特
source

您应该检查的长度，sets而不是尝试访问sets[0]和捕获IndexError。

— bfontaine

这不是一个简单的检查；a_set用于最终回报。

— tzot

你不能return reduce(sets[0], sets[1:]) if sets else set()吗？

— bfontaine

哈，是的，谢谢。代码应该更改，因为如果可以的话，应该避免依赖try/ except。这是一种代码气味，效率低下，并且可以隐藏其他问题。

— bfontaine

0

Jean-FrançoisFabre set.intesection（* list_of_sets）答案无疑是最pyhtonic的答案，并且是公认的答案。

对于那些想要使用reduce的用户，以下方法也将起作用：

reduce(set.intersection, list_of_sets)

— 米纳斯
source