如何计算其他代码使用的生成器中的项目

时间:2021-03-02 15:22:11

I'm creating a generator that gets consumed by another function, but I'd still like to know how many items were generated:

我正在创建一个被另一个函数占用的生成器,但我仍然想知道生成了多少项:

lines = (line.rstrip('\n') for line in sys.stdin)
process(lines)
print("Processed {} lines.".format( ? ))

The best I can come up with is to wrap the generator with a class that keeps a count, or maybe turn it inside out and send() things in. Is there an elegant and efficient way to see how many items a generator produced when you're not the one consuming it in Python 2?

我能想到的最好的方法是用一个保持计数的类来包装发生器,或者可以将其内部输出并发送()内容。是否有一种优雅而有效的方法来查看当您生成多少项时不是在Python 2中使用它的人吗?

Edit: Here's what I ended up with:

编辑:这是我最终得到的:

class Count(Iterable):
    """Wrap an iterable (typically a generator) and provide a ``count``
    field counting the number of items.

    Accessing the ``count`` field before iteration is finished will
    invalidate the count.
    """
    def __init__(self, iterable):
        self._iterable = iterable
        self._counter = itertools.count()

    def __iter__(self):
        return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))

    @property
    def count(self):
        self._counter = itertools.repeat(self._counter.next())
        return self._counter.next()

6 个解决方案

#1


9  

Here is another way using itertools.count() example:

这是使用itertools.count()示例的另一种方法:

import itertools

def generator():
    for i in range(10):
       yield i

def process(l):
    for i in l:
        if i == 5:
            break

def counter_value(counter):
    import re
    return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

Hope this was helpful.

希望这很有帮助。

#2


13  

If you don't care that you are consuming the generator, you can just do:

如果您不在乎使用发电机,您可以这样做:

sum(1 for x in gen)

#3


8  

Usually, I'd just turn the generator into a list and take its length. If you have reasons to assume that this will consume too much memory, your best bet indeed seems to be the wrapper class you suggested yourself. It's not too bad, though:

通常,我只是将生成器变成一个列表并取其长度。如果你有理由认为这会消耗太多内存,你最好的选择似乎确实是你自己建议的包装类。但这并不算太糟糕:

class CountingIterator(object):
    def __init__(self, it):
        self.it = it
        self.count = 0
    def __iter__(self):
        return self
    def next(self):
        nxt = next(self.it)
        self.count += 1
        return nxt
    __next__ = next

(The last line is for forward compatibility to Python 3.x.)

(最后一行是为了向前兼容Python 3.x.)

#4


1  

Here's another approach. The use of a list for the count output is a bit ugly, but it's pretty compact:

这是另一种方法。使用计数输出列表有点难看,但它非常紧凑:

def counter(seq, count_output_list):
    for x in seq:
        count_output_list[0] += 1
        yield x

Used like so:

像这样使用:

count = [0]
process(counter(lines, count))
print count[0]

One could alternatively make counter() take a dict in which it might add a "count" key, or an object on which it could set a count member.

或许可以使counter()接受一个dict,在该dict中它可以添加一个“count”键,或者一个可以设置count成员的对象。

#5


1  

If you don't need to return the count and just want to log it, you can use a finally block:

如果您不需要返回计数并且只想记录它,则可以使用finally块:

def generator():
    i = 0
    try:
        for x in range(10):
            i += 1
            yield x
    finally:
        print '{} iterations'.format(i)

[ n for n in generator() ]

Which produces:

哪个产生:

10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#6


0  

This is another solution similar to that of @sven-marnach:

这是另一种类似于@ sven-marnach的解决方案:

class IterCounter(object):
  def __init__(self, it):
    self._iter = it
    self.count = 0

  def _counterWrapper(self, it):
    for i in it:
      yield i
      self.count += 1

  def __iter__(self):
    return self._counterWrapper(self._iter)

I wrapped the iterator with a generator function and avoided re-defining next. The result is iterable (not an iterator because it lacks next method) but if it is enugh it is faster. In my tests this is 10% faster.

我用生成器函数包装迭代器,然后避免重新定义。结果是可迭代的(不是迭代器,因为它缺少下一个方法)但是如果它是非常快的话。在我的测试中,这个速度提高了10%。

#1


9  

Here is another way using itertools.count() example:

这是使用itertools.count()示例的另一种方法:

import itertools

def generator():
    for i in range(10):
       yield i

def process(l):
    for i in l:
        if i == 5:
            break

def counter_value(counter):
    import re
    return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

Hope this was helpful.

希望这很有帮助。

#2


13  

If you don't care that you are consuming the generator, you can just do:

如果您不在乎使用发电机,您可以这样做:

sum(1 for x in gen)

#3


8  

Usually, I'd just turn the generator into a list and take its length. If you have reasons to assume that this will consume too much memory, your best bet indeed seems to be the wrapper class you suggested yourself. It's not too bad, though:

通常,我只是将生成器变成一个列表并取其长度。如果你有理由认为这会消耗太多内存,你最好的选择似乎确实是你自己建议的包装类。但这并不算太糟糕:

class CountingIterator(object):
    def __init__(self, it):
        self.it = it
        self.count = 0
    def __iter__(self):
        return self
    def next(self):
        nxt = next(self.it)
        self.count += 1
        return nxt
    __next__ = next

(The last line is for forward compatibility to Python 3.x.)

(最后一行是为了向前兼容Python 3.x.)

#4


1  

Here's another approach. The use of a list for the count output is a bit ugly, but it's pretty compact:

这是另一种方法。使用计数输出列表有点难看,但它非常紧凑:

def counter(seq, count_output_list):
    for x in seq:
        count_output_list[0] += 1
        yield x

Used like so:

像这样使用:

count = [0]
process(counter(lines, count))
print count[0]

One could alternatively make counter() take a dict in which it might add a "count" key, or an object on which it could set a count member.

或许可以使counter()接受一个dict,在该dict中它可以添加一个“count”键,或者一个可以设置count成员的对象。

#5


1  

If you don't need to return the count and just want to log it, you can use a finally block:

如果您不需要返回计数并且只想记录它,则可以使用finally块:

def generator():
    i = 0
    try:
        for x in range(10):
            i += 1
            yield x
    finally:
        print '{} iterations'.format(i)

[ n for n in generator() ]

Which produces:

哪个产生:

10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#6


0  

This is another solution similar to that of @sven-marnach:

这是另一种类似于@ sven-marnach的解决方案:

class IterCounter(object):
  def __init__(self, it):
    self._iter = it
    self.count = 0

  def _counterWrapper(self, it):
    for i in it:
      yield i
      self.count += 1

  def __iter__(self):
    return self._counterWrapper(self._iter)

I wrapped the iterator with a generator function and avoided re-defining next. The result is iterable (not an iterator because it lacks next method) but if it is enugh it is faster. In my tests this is 10% faster.

我用生成器函数包装迭代器,然后避免重新定义。结果是可迭代的(不是迭代器,因为它缺少下一个方法)但是如果它是非常快的话。在我的测试中,这个速度提高了10%。