
时间:2021-03-02 15:22:11

I'm creating a generator that gets consumed by another function, but I'd still like to know how many items were generated:


lines = (line.rstrip('\n') for line in sys.stdin)
print("Processed {} lines.".format( ? ))

The best I can come up with is to wrap the generator with a class that keeps a count, or maybe turn it inside out and send() things in. Is there an elegant and efficient way to see how many items a generator produced when you're not the one consuming it in Python 2?

我能想到的最好的方法是用一个保持计数的类来包装发生器,或者可以将其内部输出并发送()内容。是否有一种优雅而有效的方法来查看当您生成多少项时不是在Python 2中使用它的人吗?

Edit: Here's what I ended up with:


class Count(Iterable):
    """Wrap an iterable (typically a generator) and provide a ``count``
    field counting the number of items.

    Accessing the ``count`` field before iteration is finished will
    invalidate the count.
    def __init__(self, iterable):
        self._iterable = iterable
        self._counter = itertools.count()

    def __iter__(self):
        return itertools.imap(operator.itemgetter(0), itertools.izip(self._iterable, self._counter))

    def count(self):
        self._counter = itertools.repeat(self._counter.next())
        return self._counter.next()

6 个解决方案



Here is another way using itertools.count() example:


import itertools

def generator():
    for i in range(10):
       yield i

def process(l):
    for i in l:
        if i == 5:

def counter_value(counter):
    import re
    return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

Hope this was helpful.




If you don't care that you are consuming the generator, you can just do:


sum(1 for x in gen)



Usually, I'd just turn the generator into a list and take its length. If you have reasons to assume that this will consume too much memory, your best bet indeed seems to be the wrapper class you suggested yourself. It's not too bad, though:


class CountingIterator(object):
    def __init__(self, it):
        self.it = it
        self.count = 0
    def __iter__(self):
        return self
    def next(self):
        nxt = next(self.it)
        self.count += 1
        return nxt
    __next__ = next

(The last line is for forward compatibility to Python 3.x.)

(最后一行是为了向前兼容Python 3.x.)



Here's another approach. The use of a list for the count output is a bit ugly, but it's pretty compact:


def counter(seq, count_output_list):
    for x in seq:
        count_output_list[0] += 1
        yield x

Used like so:


count = [0]
process(counter(lines, count))
print count[0]

One could alternatively make counter() take a dict in which it might add a "count" key, or an object on which it could set a count member.




If you don't need to return the count and just want to log it, you can use a finally block:


def generator():
    i = 0
        for x in range(10):
            i += 1
            yield x
        print '{} iterations'.format(i)

[ n for n in generator() ]

Which produces:


10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]



This is another solution similar to that of @sven-marnach:

这是另一种类似于@ sven-marnach的解决方案:

class IterCounter(object):
  def __init__(self, it):
    self._iter = it
    self.count = 0

  def _counterWrapper(self, it):
    for i in it:
      yield i
      self.count += 1

  def __iter__(self):
    return self._counterWrapper(self._iter)

I wrapped the iterator with a generator function and avoided re-defining next. The result is iterable (not an iterator because it lacks next method) but if it is enugh it is faster. In my tests this is 10% faster.




Here is another way using itertools.count() example:


import itertools

def generator():
    for i in range(10):
       yield i

def process(l):
    for i in l:
        if i == 5:

def counter_value(counter):
    import re
    return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

Hope this was helpful.




If you don't care that you are consuming the generator, you can just do:


sum(1 for x in gen)



Usually, I'd just turn the generator into a list and take its length. If you have reasons to assume that this will consume too much memory, your best bet indeed seems to be the wrapper class you suggested yourself. It's not too bad, though:


class CountingIterator(object):
    def __init__(self, it):
        self.it = it
        self.count = 0
    def __iter__(self):
        return self
    def next(self):
        nxt = next(self.it)
        self.count += 1
        return nxt
    __next__ = next

(The last line is for forward compatibility to Python 3.x.)

(最后一行是为了向前兼容Python 3.x.)



Here's another approach. The use of a list for the count output is a bit ugly, but it's pretty compact:


def counter(seq, count_output_list):
    for x in seq:
        count_output_list[0] += 1
        yield x

Used like so:


count = [0]
process(counter(lines, count))
print count[0]

One could alternatively make counter() take a dict in which it might add a "count" key, or an object on which it could set a count member.




If you don't need to return the count and just want to log it, you can use a finally block:


def generator():
    i = 0
        for x in range(10):
            i += 1
            yield x
        print '{} iterations'.format(i)

[ n for n in generator() ]

Which produces:


10 iterations
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]



This is another solution similar to that of @sven-marnach:

这是另一种类似于@ sven-marnach的解决方案:

class IterCounter(object):
  def __init__(self, it):
    self._iter = it
    self.count = 0

  def _counterWrapper(self, it):
    for i in it:
      yield i
      self.count += 1

  def __iter__(self):
    return self._counterWrapper(self._iter)

I wrapped the iterator with a generator function and avoided re-defining next. The result is iterable (not an iterator because it lacks next method) but if it is enugh it is faster. In my tests this is 10% faster.
