你能用Python生成器函数做什么?

时间:2021-02-10 21:22:28

I'm starting to learn Python and I've come across generator functions, those that have a yield statement in them. I want to know what types of problems that these functions are really good at solving.

我开始学习Python,我遇到了生成器函数,它们有一个yield语句。我想知道这些函数在求解过程中有哪些问题。

16 个解决方案

#1


213  

Generators give you lazy evaluation. You use them by iterating over them, either explicitly with 'for' or implicitly by passing it to any function or construct that iterates. You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested.

生成器会给你一个懒惰的评估。您可以通过迭代它们来使用它们,无论是显式地使用“for”,还是通过将其传递给迭代的任何函数或构造。您可以将生成器看作返回多个项,就像它们返回一个列表一样,但是在返回一个列表时,它们不会立即返回,而生成器函数会暂停,直到下一个项目被请求。

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it's more convenient if that happened as late as possible.

生成器对于计算大量的结果(特别是涉及到循环本身的计算)非常有用,因为您不知道您是否需要所有的结果,或者您不想在同一时间为所有结果分配内存。或者,当生成器使用另一个生成器时,或使用其他资源时,如果发生的尽可能晚,则会更方便。

Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you'd use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little 'for' loop around the generator.

生成器的另一个用途(实际上是相同的)是用迭代替换回调。在某些情况下,您需要一个函数来完成大量工作,并偶尔向调用者报告。传统上,你会使用回调函数。将这个回调传递给工作函数,它会周期性地调用这个回调函数。生成器方法是工作函数(现在是一个生成器)对回调一无所知,并且仅在它想报告某样东西时才会生成。调用者,而不是编写一个单独的回调并将其传递给工作函数,所有的报告工作都在生成器周围的“for”循环中进行。

For example, say you wrote a 'filesystem search' program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.

例如,假设您编写了一个“文件系统搜索”程序。您可以完整地执行搜索,收集结果,然后一次显示一个结果。所有的结果都必须在显示第一个之前被收集,所有的结果都将在内存中同时出现。或者你可以在你找到结果的时候显示出来,这样会更有效率,对用户更友好。后者可以通过将结果打印函数传递到文件系统搜索函数来完成,也可以通过将搜索函数作为生成器并遍历结果来实现。

If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:

如果您希望看到后两种方法的示例,请参见os.path.walk()(带有回调的旧文件系统行走函数)和os.walk()(新的文件系统行走生成器)。当然,如果您真的想要在列表中收集所有结果,那么生成器方法对于转换到大列表方法来说是很简单的:

big_list = list(the_generator)

#2


83  

One of the reasons to use generator is to make the solution clearer for some kind of solutions.

使用生成器的原因之一是使解决方案更清楚一些。

The other is to treat results one at a time, avoiding building huge lists of results that you would process separated anyway.

另一种方法是每次只处理一个结果,避免构建大量的结果列表,而这些结果将会被分离。

If you have a fibonacci-up-to-n function like this:

如果你有一个这样的函数,

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

You can more easily write the function as this:

你可以更容易地写出这个函数:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

The function is clearer. And if you use the function like this:

函数是清晰。如果你用这样的函数:

for x in fibon(1000000):
    print x,

in this example, if using the generator version, the whole 1000000 item list won't be created at all, just one value at a time. That would not be the case when using the list version, where a list would be created first.

在本例中,如果使用生成器版本,将不会创建整个1000000项列表,每次只创建一个值。在使用列表版本时,不会出现这种情况,因为列表会先创建一个列表。

#3


36  

See the "Motivation" section in PEP 255.

参见PEP 255中的“动机”部分。

A non-obvious use of generators is creating interruptible functions, which lets you do things like update UI or run several jobs "simultaneously" (interleaved, actually) while not using threads.

生成器的一个不明显的用途是创建可中断的函数,它允许您在不使用线程的情况下,像更新UI或“同时”运行多个作业(实际上是交叉的)。

#4


32  

I find this explanation which clears my doubt. Because there is a possibility that person who don't know Generators also don't know about yield

我发现这一解释澄清了我的疑虑。因为不知道发电机的人可能也不知道产量。

Return

返回

The return statement is where all the local variables are destroyed and the resulting value is given back (returned) to the caller. Should the same function be called some time later, the function will get a fresh new set of variables.

返回语句是所有本地变量被销毁的地方,结果返回给调用者(返回)。如果在稍后调用相同的函数,函数将获得新的变量集。

Yield

收益率

But what if the local variables aren't thrown away when we exit a function? This implies that we can resume the function where we left off. This is where the concept of generators are introduced and the yield statement resumes where the function left off.

但是,如果当我们退出一个函数时,本地变量不会被丢弃怎么办?这意味着我们可以恢复我们停止的功能。这是引入生成器的概念,而yield语句在函数停止的地方恢复。

  def generate_integers(N):
    for i in xrange(N):
    yield i 

  In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

So that's the difference between return and yield statements in Python.

这就是Python中return和yield语句的区别。

Yield statement is what makes a function a generator function.

Yield语句使函数成为生成器函数。

So Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all the data values and which statement was last executed).

因此,生成器是创建迭代器的简单而强大的工具。它们像普通函数一样编写,但是每当他们想要返回数据时就使用yield语句。每次调用next()时,生成器将恢复它所保留的位置(它会记住所有的数据值,并且该语句最后被执行)。

#5


27  

Buffering. When it is efficient to fetch data in large chunks, but process it in small chunks, then a generator might help:

缓冲。在大数据块中获取数据的效率很高,但是在小块中处理数据,则生成器可能会有所帮助:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

The above lets you easily separate buffering from processing. The consumer function can now just get the values one by one without worrying about buffering.

上面的内容让您可以轻松地将缓冲从处理中分离出来。消费者功能现在可以一个一个地得到值,而不用担心缓冲。

#6


26  

Real World Example

Lets say you have 100 million domains in your MySQL table and you would like to update alexa rank for each domain.

假设在MySQL表中有1亿个域,你想要更新每个域的alexa级别。

First thing you need is to select your domain names from the database.

首先需要从数据库中选择域名。

Lets say your table name is domains and column name is domain

假设您的表名是域,而列名是域。

If you use SELECT domain FROM domains its going to return 100 million rows which is going to consume lot of memory. So your server might crash

如果你使用域的选择域,它将返回1亿行,这将消耗大量内存。服务器可能崩溃。

So you decided to run the program in batches. Let say our batch size is 1000.

所以你决定批量运行这个程序。我们的批大小是1000。

In our first batch we will query the first 1000 rows, check alexa rank for each domain and update the database row.

在第一批中,我们将查询前1000行,检查每个域的alexa级别,并更新数据库行。

In our second batch we will work on the next 1000 rows. In our third batch it will be from 2001 to 3000 and so on.

在第二批中,我们将处理下一个1000行。我们的第三批产品将从2001年到3000年。

Now we need a generator function which generates our batches.

现在我们需要一个生成器函数来生成我们的批次。

Here is our generator function

这是我们的发电机功能。

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

As you can see our function keep yielding the results. If you used the keyword return instead of yield, then the whole function will be ended once it reaches return

正如你看到的,我们的函数不断地产生结果。如果您使用关键字return而不是yield,那么当它到达返回时,整个函数将被终止。

return - returns only once
yield - returns multiple times

If a function uses the keyword yield then its a generator.

如果一个函数使用关键字yield,那么它就是一个生成器。

Now you can iterate like this

现在可以这样迭代。

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()

#7


20  

I have found that generators are very helpful in cleaning up your code and by giving you a very unique way to encapsulate and modularize code. In a situation where you need something to constantly spit out values based on its own internal processing and when that something needs to be called from anywhere in your code (and not just within a loop or a block for example), generators are the feature to use.

我已经发现,生成器非常有助于清理您的代码,并通过提供一种非常独特的方式来封装和模块化代码。在这种情况下,您需要根据自己的内部处理不断地输出值,并且当需要从代码中的任何地方调用某个东西时(不只是在一个循环或一个块中),生成器是使用的特性。

An abstract example would be a fibonacci number generator that does not live within a loop and when it is called from anywhere will always return the next number in sequence:

一个抽象的例子是一个fibonacci数生成器,它不存在于一个循环中,当它从任何地方被调用时,它总是按顺序返回下一个数字:

def fib():
    first=0
    second=1
    yield first
    yield second

    while 1:
        next=first+second
        yield next
        first=second
        second=next

fibgen1=fib()
fibgen2=fib()

Now you have two fibonacci number generator objects which you can call from anywhere in your code and they will always return ever larger fibonacci numbers in sequence as follows:

现在你有了两个斐波那契数生成器对象你可以从你代码的任何地方调用它们它们总是会按顺序返回更大的斐波那契数列如下:

>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5

The lovely thing about generators is that they encapsulate state without having to go through the hoops of creating objects. One way of thinking about them is as "functions" which remember their internal state.

生成器的优点是它们封装了状态,而不必经过创建对象的过程。一种看待它们的方式是“函数”,它们记住它们的内部状态。

I got the fibonacci example from http://www.neotitans.com/resources/python/python-generators-tutorial.html and with a little imagination, you can come up with a lot of other situations where generators make for a great alternative to for-loops and other traditional iteration constructs.

我得到了来自http://www.neotitans.com/resources/python/python/python - generators-tutories.html的斐波那契示例,并且有了一些想象力,您可以找到许多其他情况,在这些情况下,生成器可以替代for循环和其他传统的迭代构造。

#8


18  

The simple explanation: Consider a for statement

简单的解释:考虑一个for语句。

for item in iterable:
   do_stuff()

A lot of the time, all the items in iterable doesn't need to be there from the start, but can be generated on the fly as they're required. This can be a lot more efficient in both

很多时候,iterable中的所有项目都不需要从一开始就在那里,但是可以在需要的时候自动生成。这两种方法都可以提高效率。

  • space (you never need to store all the items simultaneously) and
  • 空间(您不需要同时存储所有的项)和。
  • time (the iteration may finish before all the items are needed).
  • 时间(迭代可能在需要所有项之前完成)。

Other times, you don't even know all the items ahead of time. For example:

有时候,你甚至不知道所有的项目。例如:

for command in user_input():
   do_stuff_with(command)

You have no way of knowing all the user's commands beforehand, but you can use a nice loop like this if you have a generator handing you commands:

您无法预先知道所有用户的命令,但是如果您有一个生成器,您可以使用这样的循环:

def user_input():
    while True:
        wait_for_command()
        cmd = get_command()
        yield cmd

With generators you can also have iteration over infinite sequences, which is of course not possible when iterating over containers.

对于生成器,您还可以在无限序列上进行迭代,这在迭代容器时是不可能的。

#9


12  

My favorite uses are "filter" and "reduce" operations.

我最喜欢的用途是“过滤”和“减少”操作。

Let's say we're reading a file, and only want the lines which begin with "##".

假设我们正在读取一个文件,并且只需要以“##”开头的行。

def filter2sharps( aSequence ):
    for l in aSequence:
        if l.startswith("##"):
            yield l

We can then use the generator function in a proper loop

然后我们可以在一个适当的循环中使用生成器函数。

source= file( ... )
for line in filter2sharps( source.readlines() ):
    print line
source.close()

The reduce example is similar. Let's say we have a file where we need to locate blocks of <Location>...</Location> lines. [Not HTML tags, but lines that happen to look tag-like.]

减少的例子是相似的。假设我们有一个文件,我们需要找到 …< / >位置线。[不是HTML标签,而是像标签一样的线条。]

def reduceLocation( aSequence ):
    keep= False
    block= None
    for line in aSequence:
        if line.startswith("</Location"):
            block.append( line )
            yield block
            block= None
            keep= False
        elif line.startsWith("<Location"):
            block= [ line ]
            keep= True
        elif keep:
            block.append( line )
        else:
            pass
    if block is not None:
        yield block # A partial block, icky

Again, we can use this generator in a proper for loop.

同样,我们可以在适当的循环中使用这个生成器。

source = file( ... )
for b in reduceLocation( source.readlines() ):
    print b
source.close()

The idea is that a generator function allows us to filter or reduce a sequence, producing a another sequence one value at a time.

这个想法是,一个生成器函数允许我们过滤或减少一个序列,一次产生另一个序列一个值。

#10


8  

A practical example where you could make use of a generator is if you have some kind of shape and you want to iterate over its corners, edges or whatever. For my own project (source code here) I had a rectangle:

一个实际的例子,你可以利用一个生成器,如果你有某种形状,你想要在它的角落,边缘或其他地方进行迭代。对于我自己的项目(这里的源代码),我有一个矩形:

class Rect():

    def __init__(self, x, y, width, height):
        self.l_top  = (x, y)
        self.r_top  = (x+width, y)
        self.r_bot  = (x+width, y+height)
        self.l_bot  = (x, y+height)

    def __iter__(self):
        yield self.l_top
        yield self.r_top
        yield self.r_bot
        yield self.l_bot

Now I can create a rectangle and loop over its corners:

现在我可以创建一个矩形,并在其角上循环:

myrect=Rect(50, 50, 100, 100)
for corner in myrect:
    print(corner)

Instead of __iter__ you could have a method iter_corners and call that with for corner in myrect.iter_corners(). It's just more elegant to use __iter__ since then we can use the class instance name directly in the for expression.

而不是__iter__,您可以有一个方法iter_corner,并在myrect.iter_corner()中调用该方法。使用__iter__更加优雅,因为我们可以直接在for表达式中使用类实例名。

#11


6  

Basically avoiding call-back functions when iterating over input maintaining state.

在迭代输入维护状态时,基本避免回调函数。

See here and here for an overview of what can be done using generators.

这里和这里是关于使用生成器可以做什么的概述。

#12


4  

Some good answers here, however, I'd also recommend a complete read of the python Functional Programming tutorial which helps explain some of the more potent use-cases of generators.

不过,这里有一些不错的答案,我还建议您完整阅读python函数编程教程,这有助于解释一些更有效的生成器用例。

  • Particularly interesting is that it is now possible to update the yield variable from outside the generator function, hence making it possible to create dynamic and interwoven coroutines with relatively little effort.
  • 特别有趣的是,现在可以从生成器函数的外部更新yield变量,从而使创建动态的、相互交织的coroutines的工作变得可能相对较少。
  • Also see PEP 342: Coroutines via Enhanced Generators for more information.
  • 也可以看到PEP 342:通过增强的生成器来获取更多信息。

#13


2  

I use generators when our web server is acting as a proxy:

当我们的web服务器充当代理时,我使用生成器:

  1. The client requests a proxied url from the server
  2. 客户端从服务器请求一个proxied url。
  3. The server begins to load the target url
  4. 服务器开始加载目标url。
  5. The server yields to return the results to the client as soon as it gets them
  6. 服务器将结果返回给客户端。

#14


2  

Since the send method of a generator has not been mentioned here is an example:

由于发电机的发送方法没有被提及,这里有一个例子:

def test():
    for i in xrange(5):
        val = yield
        print(val)

t = test()
# proceed to yield statement
next(t)
# send value to yield
t.send(1)
t.send('2')
t.send([3])

It shows the possibility to send a value to a running generator A more advanced course on generators in the video below (including yield from explination, generators for parallel processing, escaping recursion limit etc.)

它显示了在下面的视频中向运行的生成器发送一个更高级的生成器的可能性(包括解释的收益、并行处理的生成器、避免递归限制等)。

David Beazley on generators at PyCon 2014

David Beazley在PyCon 2014的发电机上。

#15


1  

Piles of stuff. Any time you want to generate a sequence of items, but don't want to have to 'materialize' them all into a list at once. For example, you could have a simple generator that returns prime numbers:

成堆的东西。任何时候,您想要生成一系列的项目,但是不要想要将它们全部“物化”到一个列表中。例如,您可以有一个返回质数的简单生成器:

def primes():
    primes_found = set()
    primes_found.add(2)
    yield 2
    for i in itertools.count(1):
        candidate = i * 2 + 1
        if not all(candidate % prime for prime in primes_found):
            primes_found.add(candidate)
            yield candidate

You could then use that to generate the products of subsequent primes:

然后你可以使用它来生成后续启动的产品:

def prime_products():
    primeiter = primes()
    prev = primeiter.next()
    for prime in primeiter:
        yield prime * prev
        prev = prime

These are fairly trivial examples, but you can see how it can be useful for processing large (potentially infinite!) datasets without generating them in advance, which is only one of the more obvious uses.

这些都是非常简单的示例,但是您可以看到它对于处理大型(可能是无限的)数据集是多么有用,而不会提前生成它们,这只是其中一个比较明显的用途。

#16


0  

Also good for printing the prime numbers up to n:

也有利于打印质数到n:

def genprime(n=10):
    for num in range(3, n+1):
        for factor in range(2, num):
            if num%factor == 0:
                break
        else:
            yield(num)

for prime_num in genprime(100):
    print(prime_num)

#1


213  

Generators give you lazy evaluation. You use them by iterating over them, either explicitly with 'for' or implicitly by passing it to any function or construct that iterates. You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested.

生成器会给你一个懒惰的评估。您可以通过迭代它们来使用它们,无论是显式地使用“for”,还是通过将其传递给迭代的任何函数或构造。您可以将生成器看作返回多个项,就像它们返回一个列表一样,但是在返回一个列表时,它们不会立即返回,而生成器函数会暂停,直到下一个项目被请求。

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it's more convenient if that happened as late as possible.

生成器对于计算大量的结果(特别是涉及到循环本身的计算)非常有用,因为您不知道您是否需要所有的结果,或者您不想在同一时间为所有结果分配内存。或者,当生成器使用另一个生成器时,或使用其他资源时,如果发生的尽可能晚,则会更方便。

Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you'd use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little 'for' loop around the generator.

生成器的另一个用途(实际上是相同的)是用迭代替换回调。在某些情况下,您需要一个函数来完成大量工作,并偶尔向调用者报告。传统上,你会使用回调函数。将这个回调传递给工作函数,它会周期性地调用这个回调函数。生成器方法是工作函数(现在是一个生成器)对回调一无所知,并且仅在它想报告某样东西时才会生成。调用者,而不是编写一个单独的回调并将其传递给工作函数,所有的报告工作都在生成器周围的“for”循环中进行。

For example, say you wrote a 'filesystem search' program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.

例如,假设您编写了一个“文件系统搜索”程序。您可以完整地执行搜索,收集结果,然后一次显示一个结果。所有的结果都必须在显示第一个之前被收集,所有的结果都将在内存中同时出现。或者你可以在你找到结果的时候显示出来,这样会更有效率,对用户更友好。后者可以通过将结果打印函数传递到文件系统搜索函数来完成,也可以通过将搜索函数作为生成器并遍历结果来实现。

If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:

如果您希望看到后两种方法的示例,请参见os.path.walk()(带有回调的旧文件系统行走函数)和os.walk()(新的文件系统行走生成器)。当然,如果您真的想要在列表中收集所有结果,那么生成器方法对于转换到大列表方法来说是很简单的:

big_list = list(the_generator)

#2


83  

One of the reasons to use generator is to make the solution clearer for some kind of solutions.

使用生成器的原因之一是使解决方案更清楚一些。

The other is to treat results one at a time, avoiding building huge lists of results that you would process separated anyway.

另一种方法是每次只处理一个结果,避免构建大量的结果列表,而这些结果将会被分离。

If you have a fibonacci-up-to-n function like this:

如果你有一个这样的函数,

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

You can more easily write the function as this:

你可以更容易地写出这个函数:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

The function is clearer. And if you use the function like this:

函数是清晰。如果你用这样的函数:

for x in fibon(1000000):
    print x,

in this example, if using the generator version, the whole 1000000 item list won't be created at all, just one value at a time. That would not be the case when using the list version, where a list would be created first.

在本例中,如果使用生成器版本,将不会创建整个1000000项列表,每次只创建一个值。在使用列表版本时,不会出现这种情况,因为列表会先创建一个列表。

#3


36  

See the "Motivation" section in PEP 255.

参见PEP 255中的“动机”部分。

A non-obvious use of generators is creating interruptible functions, which lets you do things like update UI or run several jobs "simultaneously" (interleaved, actually) while not using threads.

生成器的一个不明显的用途是创建可中断的函数,它允许您在不使用线程的情况下,像更新UI或“同时”运行多个作业(实际上是交叉的)。

#4


32  

I find this explanation which clears my doubt. Because there is a possibility that person who don't know Generators also don't know about yield

我发现这一解释澄清了我的疑虑。因为不知道发电机的人可能也不知道产量。

Return

返回

The return statement is where all the local variables are destroyed and the resulting value is given back (returned) to the caller. Should the same function be called some time later, the function will get a fresh new set of variables.

返回语句是所有本地变量被销毁的地方,结果返回给调用者(返回)。如果在稍后调用相同的函数,函数将获得新的变量集。

Yield

收益率

But what if the local variables aren't thrown away when we exit a function? This implies that we can resume the function where we left off. This is where the concept of generators are introduced and the yield statement resumes where the function left off.

但是,如果当我们退出一个函数时,本地变量不会被丢弃怎么办?这意味着我们可以恢复我们停止的功能。这是引入生成器的概念,而yield语句在函数停止的地方恢复。

  def generate_integers(N):
    for i in xrange(N):
    yield i 

  In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

So that's the difference between return and yield statements in Python.

这就是Python中return和yield语句的区别。

Yield statement is what makes a function a generator function.

Yield语句使函数成为生成器函数。

So Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all the data values and which statement was last executed).

因此,生成器是创建迭代器的简单而强大的工具。它们像普通函数一样编写,但是每当他们想要返回数据时就使用yield语句。每次调用next()时,生成器将恢复它所保留的位置(它会记住所有的数据值,并且该语句最后被执行)。

#5


27  

Buffering. When it is efficient to fetch data in large chunks, but process it in small chunks, then a generator might help:

缓冲。在大数据块中获取数据的效率很高,但是在小块中处理数据,则生成器可能会有所帮助:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

The above lets you easily separate buffering from processing. The consumer function can now just get the values one by one without worrying about buffering.

上面的内容让您可以轻松地将缓冲从处理中分离出来。消费者功能现在可以一个一个地得到值,而不用担心缓冲。

#6


26  

Real World Example

Lets say you have 100 million domains in your MySQL table and you would like to update alexa rank for each domain.

假设在MySQL表中有1亿个域,你想要更新每个域的alexa级别。

First thing you need is to select your domain names from the database.

首先需要从数据库中选择域名。

Lets say your table name is domains and column name is domain

假设您的表名是域,而列名是域。

If you use SELECT domain FROM domains its going to return 100 million rows which is going to consume lot of memory. So your server might crash

如果你使用域的选择域,它将返回1亿行,这将消耗大量内存。服务器可能崩溃。

So you decided to run the program in batches. Let say our batch size is 1000.

所以你决定批量运行这个程序。我们的批大小是1000。

In our first batch we will query the first 1000 rows, check alexa rank for each domain and update the database row.

在第一批中,我们将查询前1000行,检查每个域的alexa级别,并更新数据库行。

In our second batch we will work on the next 1000 rows. In our third batch it will be from 2001 to 3000 and so on.

在第二批中,我们将处理下一个1000行。我们的第三批产品将从2001年到3000年。

Now we need a generator function which generates our batches.

现在我们需要一个生成器函数来生成我们的批次。

Here is our generator function

这是我们的发电机功能。

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

As you can see our function keep yielding the results. If you used the keyword return instead of yield, then the whole function will be ended once it reaches return

正如你看到的,我们的函数不断地产生结果。如果您使用关键字return而不是yield,那么当它到达返回时,整个函数将被终止。

return - returns only once
yield - returns multiple times

If a function uses the keyword yield then its a generator.

如果一个函数使用关键字yield,那么它就是一个生成器。

Now you can iterate like this

现在可以这样迭代。

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()

#7


20  

I have found that generators are very helpful in cleaning up your code and by giving you a very unique way to encapsulate and modularize code. In a situation where you need something to constantly spit out values based on its own internal processing and when that something needs to be called from anywhere in your code (and not just within a loop or a block for example), generators are the feature to use.

我已经发现,生成器非常有助于清理您的代码,并通过提供一种非常独特的方式来封装和模块化代码。在这种情况下,您需要根据自己的内部处理不断地输出值,并且当需要从代码中的任何地方调用某个东西时(不只是在一个循环或一个块中),生成器是使用的特性。

An abstract example would be a fibonacci number generator that does not live within a loop and when it is called from anywhere will always return the next number in sequence:

一个抽象的例子是一个fibonacci数生成器,它不存在于一个循环中,当它从任何地方被调用时,它总是按顺序返回下一个数字:

def fib():
    first=0
    second=1
    yield first
    yield second

    while 1:
        next=first+second
        yield next
        first=second
        second=next

fibgen1=fib()
fibgen2=fib()

Now you have two fibonacci number generator objects which you can call from anywhere in your code and they will always return ever larger fibonacci numbers in sequence as follows:

现在你有了两个斐波那契数生成器对象你可以从你代码的任何地方调用它们它们总是会按顺序返回更大的斐波那契数列如下:

>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5

The lovely thing about generators is that they encapsulate state without having to go through the hoops of creating objects. One way of thinking about them is as "functions" which remember their internal state.

生成器的优点是它们封装了状态,而不必经过创建对象的过程。一种看待它们的方式是“函数”,它们记住它们的内部状态。

I got the fibonacci example from http://www.neotitans.com/resources/python/python-generators-tutorial.html and with a little imagination, you can come up with a lot of other situations where generators make for a great alternative to for-loops and other traditional iteration constructs.

我得到了来自http://www.neotitans.com/resources/python/python/python - generators-tutories.html的斐波那契示例,并且有了一些想象力,您可以找到许多其他情况,在这些情况下,生成器可以替代for循环和其他传统的迭代构造。

#8


18  

The simple explanation: Consider a for statement

简单的解释:考虑一个for语句。

for item in iterable:
   do_stuff()

A lot of the time, all the items in iterable doesn't need to be there from the start, but can be generated on the fly as they're required. This can be a lot more efficient in both

很多时候,iterable中的所有项目都不需要从一开始就在那里,但是可以在需要的时候自动生成。这两种方法都可以提高效率。

  • space (you never need to store all the items simultaneously) and
  • 空间(您不需要同时存储所有的项)和。
  • time (the iteration may finish before all the items are needed).
  • 时间(迭代可能在需要所有项之前完成)。

Other times, you don't even know all the items ahead of time. For example:

有时候,你甚至不知道所有的项目。例如:

for command in user_input():
   do_stuff_with(command)

You have no way of knowing all the user's commands beforehand, but you can use a nice loop like this if you have a generator handing you commands:

您无法预先知道所有用户的命令,但是如果您有一个生成器,您可以使用这样的循环:

def user_input():
    while True:
        wait_for_command()
        cmd = get_command()
        yield cmd

With generators you can also have iteration over infinite sequences, which is of course not possible when iterating over containers.

对于生成器,您还可以在无限序列上进行迭代,这在迭代容器时是不可能的。

#9


12  

My favorite uses are "filter" and "reduce" operations.

我最喜欢的用途是“过滤”和“减少”操作。

Let's say we're reading a file, and only want the lines which begin with "##".

假设我们正在读取一个文件,并且只需要以“##”开头的行。

def filter2sharps( aSequence ):
    for l in aSequence:
        if l.startswith("##"):
            yield l

We can then use the generator function in a proper loop

然后我们可以在一个适当的循环中使用生成器函数。

source= file( ... )
for line in filter2sharps( source.readlines() ):
    print line
source.close()

The reduce example is similar. Let's say we have a file where we need to locate blocks of <Location>...</Location> lines. [Not HTML tags, but lines that happen to look tag-like.]

减少的例子是相似的。假设我们有一个文件,我们需要找到 …< / >位置线。[不是HTML标签,而是像标签一样的线条。]

def reduceLocation( aSequence ):
    keep= False
    block= None
    for line in aSequence:
        if line.startswith("</Location"):
            block.append( line )
            yield block
            block= None
            keep= False
        elif line.startsWith("<Location"):
            block= [ line ]
            keep= True
        elif keep:
            block.append( line )
        else:
            pass
    if block is not None:
        yield block # A partial block, icky

Again, we can use this generator in a proper for loop.

同样,我们可以在适当的循环中使用这个生成器。

source = file( ... )
for b in reduceLocation( source.readlines() ):
    print b
source.close()

The idea is that a generator function allows us to filter or reduce a sequence, producing a another sequence one value at a time.

这个想法是,一个生成器函数允许我们过滤或减少一个序列,一次产生另一个序列一个值。

#10


8  

A practical example where you could make use of a generator is if you have some kind of shape and you want to iterate over its corners, edges or whatever. For my own project (source code here) I had a rectangle:

一个实际的例子,你可以利用一个生成器,如果你有某种形状,你想要在它的角落,边缘或其他地方进行迭代。对于我自己的项目(这里的源代码),我有一个矩形:

class Rect():

    def __init__(self, x, y, width, height):
        self.l_top  = (x, y)
        self.r_top  = (x+width, y)
        self.r_bot  = (x+width, y+height)
        self.l_bot  = (x, y+height)

    def __iter__(self):
        yield self.l_top
        yield self.r_top
        yield self.r_bot
        yield self.l_bot

Now I can create a rectangle and loop over its corners:

现在我可以创建一个矩形,并在其角上循环:

myrect=Rect(50, 50, 100, 100)
for corner in myrect:
    print(corner)

Instead of __iter__ you could have a method iter_corners and call that with for corner in myrect.iter_corners(). It's just more elegant to use __iter__ since then we can use the class instance name directly in the for expression.

而不是__iter__,您可以有一个方法iter_corner,并在myrect.iter_corner()中调用该方法。使用__iter__更加优雅,因为我们可以直接在for表达式中使用类实例名。

#11


6  

Basically avoiding call-back functions when iterating over input maintaining state.

在迭代输入维护状态时,基本避免回调函数。

See here and here for an overview of what can be done using generators.

这里和这里是关于使用生成器可以做什么的概述。

#12


4  

Some good answers here, however, I'd also recommend a complete read of the python Functional Programming tutorial which helps explain some of the more potent use-cases of generators.

不过,这里有一些不错的答案,我还建议您完整阅读python函数编程教程,这有助于解释一些更有效的生成器用例。

  • Particularly interesting is that it is now possible to update the yield variable from outside the generator function, hence making it possible to create dynamic and interwoven coroutines with relatively little effort.
  • 特别有趣的是,现在可以从生成器函数的外部更新yield变量,从而使创建动态的、相互交织的coroutines的工作变得可能相对较少。
  • Also see PEP 342: Coroutines via Enhanced Generators for more information.
  • 也可以看到PEP 342:通过增强的生成器来获取更多信息。

#13


2  

I use generators when our web server is acting as a proxy:

当我们的web服务器充当代理时,我使用生成器:

  1. The client requests a proxied url from the server
  2. 客户端从服务器请求一个proxied url。
  3. The server begins to load the target url
  4. 服务器开始加载目标url。
  5. The server yields to return the results to the client as soon as it gets them
  6. 服务器将结果返回给客户端。

#14


2  

Since the send method of a generator has not been mentioned here is an example:

由于发电机的发送方法没有被提及,这里有一个例子:

def test():
    for i in xrange(5):
        val = yield
        print(val)

t = test()
# proceed to yield statement
next(t)
# send value to yield
t.send(1)
t.send('2')
t.send([3])

It shows the possibility to send a value to a running generator A more advanced course on generators in the video below (including yield from explination, generators for parallel processing, escaping recursion limit etc.)

它显示了在下面的视频中向运行的生成器发送一个更高级的生成器的可能性(包括解释的收益、并行处理的生成器、避免递归限制等)。

David Beazley on generators at PyCon 2014

David Beazley在PyCon 2014的发电机上。

#15


1  

Piles of stuff. Any time you want to generate a sequence of items, but don't want to have to 'materialize' them all into a list at once. For example, you could have a simple generator that returns prime numbers:

成堆的东西。任何时候,您想要生成一系列的项目,但是不要想要将它们全部“物化”到一个列表中。例如,您可以有一个返回质数的简单生成器:

def primes():
    primes_found = set()
    primes_found.add(2)
    yield 2
    for i in itertools.count(1):
        candidate = i * 2 + 1
        if not all(candidate % prime for prime in primes_found):
            primes_found.add(candidate)
            yield candidate

You could then use that to generate the products of subsequent primes:

然后你可以使用它来生成后续启动的产品:

def prime_products():
    primeiter = primes()
    prev = primeiter.next()
    for prime in primeiter:
        yield prime * prev
        prev = prime

These are fairly trivial examples, but you can see how it can be useful for processing large (potentially infinite!) datasets without generating them in advance, which is only one of the more obvious uses.

这些都是非常简单的示例,但是您可以看到它对于处理大型(可能是无限的)数据集是多么有用,而不会提前生成它们,这只是其中一个比较明显的用途。

#16


0  

Also good for printing the prime numbers up to n:

也有利于打印质数到n:

def genprime(n=10):
    for num in range(3, n+1):
        for factor in range(2, num):
            if num%factor == 0:
                break
        else:
            yield(num)

for prime_num in genprime(100):
    print(prime_num)