压缩的Python生成器,第二个生成器更短:如何检索静默消耗的元素


50

我想解析2个(可能)长度不同的生成器zip

for el1, el2 in zip(gen1, gen2):
    print(el1, el2)

但是,如果gen2元素较少,gen1则“消耗” 一个额外的元素。

例如,

def my_gen(n:int):
    for i in range(n):
        yield i

gen1 = my_gen(10)
gen2 = my_gen(8)

list(zip(gen1, gen2))  # Last tuple is (7, 7)
print(next(gen1))  # printed value is "9" => 8 is missing

gen1 = my_gen(8)
gen2 = my_gen(10)

list(zip(gen1, gen2))  # Last tuple is (7, 7)
print(next(gen2))  # printed value is "8" => OK

显然,缺少一个值(8在我的上一个示例中)是因为在意识到没有更多元素之前已对其进行gen1读取(因此生成了value 8gen2。但是这个价值在宇宙中消失了。当gen2“更长”时,就没有这样的“问题”。

问题:是否有方法可以检索此缺失值(即8在我之前的示例中)?...理想情况下具有可变数量的参数(像zip这样)。

注意我目前已经通过使用的另一种方式实现了,itertools.zip_longest但是我真的很想知道如何使用zip或等效方法来获得这个缺失的值。

注意2如果您要提交并尝试新的实现,我已经在该REPL中创建了一些对不同实现的测试:) https://repl.it/@jfthuong/MadPhysicistChester


19
该文档确实指出:“当您不关心较长的可迭代对象的尾随,不匹配的值时,仅应将zip()与不等长的输入一起使用。如果这些值很重要,请改用itertools.zip_longest()。”
Carcigenicate

2
@ Ch3steR。但问题与“为什么”无关。它的字面意思是“有没有办法找回这个遗漏的值...?” 似乎所有答案(不过我的答案)都忘记了阅读该部分。
疯狂物理学家

@MadPhysicist确实很奇怪。我重新表述了在这方面更明确的问题。
Jean-Francois T.

1
基本问题是无法窥视或推回发电机。因此,一旦从中zip()读取,它就消失了。8gen1
Barmar

1
@Barmar当然,我们都同意这一点。问题更多是如何将其存储在可以使用它的地方。
Jean-Francois T.

Answers:


28

一种方法是实现一个生成器,使您可以缓存最后一个值:

class cache_last(collections.abc.Iterator):
    """
    Wraps an iterable in an iterator that can retrieve the last value.

    .. attribute:: obj

       A reference to the wrapped iterable. Provided for convenience
       of one-line initializations.
    """
    def __init__(self, iterable):
        self.obj = iterable
        self._iter = iter(iterable)
        self._sentinel = object()

    @property
    def last(self):
        """
        The last object yielded by the wrapped iterator.

        Uninitialized iterators raise a `ValueError`. Exhausted
        iterators raise a `StopIteration`.
        """
        if self.exhausted:
            raise StopIteration
        return self._last

    @property
    def exhausted(self):
        """
        `True` if there are no more elements in the iterator.
        Violates EAFP, but convenient way to check if `last` is valid.
        Raise a `ValueError` if the iterator is not yet started.
        """
        if not hasattr(self, '_last'):
            raise ValueError('Not started!')
        return self._last is self._sentinel

    def __next__(self):
        """
        Retrieve, record, and return the next value of the iteration.
        """
        try:
            self._last = next(self._iter)
        except StopIteration:
            self._last = self._sentinel
            raise
        # An alternative that has fewer lines of code, but checks
        # for the return value one extra time, and loses the underlying
        # StopIteration:
        #self._last = next(self._iter, self._sentinel)
        #if self._last is self._sentinel:
        #    raise StopIteration
        return self._last

    def __iter__(self):
        """
        This object is already an iterator.
        """
        return self

要使用此功能,请将输入包装到zip

gen1 = cache_last(range(10))
gen2 = iter(range(8))
list(zip(gen1, gen2))
print(gen1.last)
print(next(gen1)) 

重要的是要使gen2迭代器而不是可迭代的,以便您可以知道哪一个已用尽。如果gen2已用尽,则无需检查gen1.last

另一种方法是重写zip以接受可变的可迭代序列而不是单独的可迭代序列。这样一来,您就可以使用包含“偷看”物品的链式版本替换可迭代对象:

def myzip(iterables):
    iterators = [iter(it) for it in iterables]
    while True:
        items = []
        for it in iterators:
            try:
                items.append(next(it))
            except StopIteration:
                for i, peeked in enumerate(items):
                    iterables[i] = itertools.chain([peeked], iterators[i])
                return
            else:
                yield tuple(items)

gens = [range(10), range(8)]
list(myzip(gens))
print(next(gens[0]))

由于许多原因,该方法是有问题的。它不仅会丢失原始的可迭代对象,而且还会丢失原始对象(通过将其替换为对象)可能具有的任何有用属性chain


@MadPhysicist。爱您的回答cache_last,并且它不会改变next行为的事实……很糟糕,它不是对称的(切换gen1gen2zip将导致不同的结果)。干杯
Jean-Francois T.

1
@ Jean-Francois。我已经更新了迭代器,last使其在用尽后可以正确响应调用。这应该有助于确定是否需要最后一个值。也使其产量更高。
疯狂物理学家

@MadPhysicist我运行了代码,输出print(gen1.last) print(next(gen1)) None and 9
Ch3steR

@MadPhysicist带有一些文档字符串和全部。很好;)我待会儿再检查。感谢您所花费的时间
Jean-Francois T.

@ Ch3steR。谢谢你的收获。我太激动了,并从中删除了return语句last
疯狂物理学家

17

zipdocs中给出的实现等效

def zip(*iterables):
    # zip('ABCD', 'xy') --> Ax By
    sentinel = object()
    iterators = [iter(it) for it in iterables]
    while iterators:
        result = []
        for it in iterators:
            elem = next(it, sentinel)
            if elem is sentinel:
                return
            result.append(elem)
        yield tuple(result)

在您的第一个示例中gen1 = my_gen(10)gen2 = my_gen(8)。在两个发生器都消耗完之后,直到第7次迭代。现在在第8次迭代gen1调用elem = next(it, sentinel)中返回8,但在gen2调用elem = next(it, sentinel)时返回sentinel(因为gen2已用尽),并且if elem is sentinel得到满足,函数执行return并停止。现在next(gen1)返回9。

在您的第二个示例中gen1 = gen(8)gen2 = gen(10)。在两个发生器都消耗完之后,直到第7次迭代。现在在第8次迭代gen1调用elem = next(it, sentinel)中返回并满足sentinel(因为此时gen1已耗尽)并if elem is sentinel得到满足,并且该函数执行return和stop。现在next(gen2)返回8。

疯狂物理学家的回答启发,您可以使用此Gen包装器来解决该问题:

编辑:处理由指出的情况让-弗朗索瓦·T。

一旦从迭代器中消耗了一个值,该值就从迭代器中消失了,并且没有迭代器的就地变异方法将其添加回迭代器中。一种解决方法是存储最后消耗的值。

class Gen:
    def __init__(self,iterable):
        self.d = iter(iterable)
        self.sentinal = object()
        self.prev = self.sentinal
    def __iter__(self):
        return self
    @property
    def last_val_consumed(self):
        if self.prev is None:
            raise StopIteration
        if self.prev == self.sentinal:
            raise ValueError('Nothing has been consumed')
        return self.prev
    def __next__(self):
        self.prev = next(self.d,None)
        if self.prev is None:
            raise StopIteration
        return self.prev

例子:

# When `gen1` is larger than `gen2`
gen1 = Gen(range(10))
gen2 = Gen(range(8))
list(zip(gen1,gen2))
# [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7)]
gen1.last_val_consumed
# 8 #as it was the last values consumed
next(gen1)
# 9
gen1.last_val_consumed
# 9

# 2. When `gen1` or `gen2` is empty
gen1 = Gen(range(0))
gen2 = Gen(range(5))
list(zip(gen1,gen2))
gen1.last_val_consumed
# StopIteration error is raised
gen2.last_val_consumed
# ValueError is raised saying `ValueError: Nothing has been consumed`

感谢您@ Ch3steR在此问题上花费的时间。您对MadPhysicist解决方案的修改有几个限制:#1。如果gen1 = cache_last(range(0))gen2 = cache_last(range(2))那么做之后list(zip(gen1, gen2),调用next(gen2)会引发一个AttributeError: 'cache_last' object has no attribute 'prev'。#2。如果gen1比gen2长,则在使用完所有元素后,next(gen2)将继续返回最后一个值,而不是返回StopIteration。我将标记MadPhysicist答案和THE答案。谢谢!
Jean-Francois T.

@ Jean-FrancoisT。是的,同意。您应该将他的答案标记为答案。这有局限性。我将尝试改善此答案以应对所有情况。;)
Ch3steR

@ Ch3steR如果您愿意,我可以帮助您摇晃它。我是软件验证领域的专业人员:)
Jean-Francois T.

@ Jean-FrancoisT。我喜欢。这很重要。我是3年级的本科生。
Ch3steR

2
干得好,它通过了我在这里编写的所有测试:repl.it/@jfthuong/MadPhysicistChester 您可以在线运行它们,非常方便:)
Jean-Francois T.

6

我可以看到您已经找到了这个答案,并在评论中提到了这个问题,但是我想我会从中做出一个答案。您要使用itertools.zip_longest(),它将用以下替换较短的生成器的空值None

import itertools

def my_gen(n:int):
    for i in range(n):
        yield i

gen1 = my_gen(10)
gen2 = my_gen(8)

for i, j in itertools.zip_longest(gen1, gen2):
    print(i, j)

印刷品:

0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 None
9 None

您也可以fillvalue在调用zip_longestNone默认值替换时提供一个参数,但是基本上,对于您的解决方案,一旦您在for循环中命中一个Noneij),另一个变量就会带有您的8


谢谢。我确实已经想出了,zip_longest这实际上是我的问题。:)
Jean-Francois T.

6

受到@GrandPhuba的阐明的启发zip,让我们创建一个“安全”变体(在此处进行单元测试):

def safe_zip(*args):
    """
    Safe zip that restores last consumed element in eachgenerator
    if not able to consume an element in all of them

    Returns:
        * generators in tuple
        * generator for zipped generators
    """
  continue_ = True
  n = len(args)
  result = (_ for _ in [])
  while continue_:
    addend = []
    for i, gen in enumerate(args):
      try:
        value = next(gen)
        addend.append(value)
      except StopIteration:
        genlist = list(args)
        args = tuple([chain([v], g) for v, g in zip(addend, genlist[:i])]+genlist[i:])
        continue_ = False
        break
    if len(addend)==n: result = chain(result, [tuple(addend)])
  return args, result

这是一个基本测试:

    g1, g2 = (i for i in range(10)), (i for i in range(4))
    # Create (g1, g2), g3 first, then loop over g3 as one would with zip
    (g1, g2), g3 = safe_zip(g1, g2)
    for a, b in g3:
        print(a, b)#(0, 0) to (3, 3)
    for x in g1:
        print(x)#4 to 9

4

您可以使用itertools.teeitertools.islice

from itertools import islice, tee

def zipped(gen1, gen2, pred=list):
    g11, g12 = tee(gen1)
    z = pred(zip(g11, gen2))

    return (islice(g12, len(z), None), gen2), z

gen1 = iter(range(10))
gen2 = iter(range(5))

(gen1, gen2), output = zipped(gen1, gen2)

print(output)
print(next(gen1))
# [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
# 5

3

如果要重用代码,最简单的解决方案是:

from more_itertools import peekable

a = peekable(a)
b = peekable(b)

while True:
    try:
        a.peek()
        b.peek()
    except StopIteration:
        break
    x = next(a)
    y = next(b)
    print(x, y)


print(list(a), list(b))  # Misses nothing.

您可以使用设置来测试此代码:

def my_gen(n: int):
    yield from range(n)

a = my_gen(10)
b = my_gen(8)

它将打印:

0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
[8, 9] []

2

我不认为您可以使用基本的for循环来检索下降的值,因为用尽的迭代器取自 zip(..., ...).__iter__耗尽后被丢弃而无法访问的。

您应该更改邮政编码,然后才能使用一些骇人的代码获取放置物品的位置)

z = zip(range(10), range(8))
for _ in iter(z.__next__, None):
    ...
_, (one, other) = z.__reduce__()
_, (i_one,), p_one = one.__reduce__() # p_one == current pos, 1 based
import itertools
val = next(itertools.islice(iter(i_one), p_one - 1, p_one))
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.