在Python中哪里使用yield最好?

时间:2022-02-28 23:31:29

I know how yield works. I know permutation, think it just as a math simplicity.

我知道收益率如何运作。我知道排列,认为它只是一个简单的数学。

But what's yield's true force? When should I use it? A simple and good example is better.

但是什么是收益率的真正力量?我应该什么时候使用它?一个简单而好的例子就更好了。

4 个解决方案

#1


53  

yield is best used when you have a function that returns a sequence and you want to iterate over that sequence, but you do not need to have every value in memory at once.

当你有一个返回一个序列的函数并且想要迭代该序列时,最好使用yield,但是你不需要一次在内存中包含每个值。

For example, I have a python script that parses a large list of CSV files, and I want to return each line to be processed in another function. I don't want to store the megabytes of data in memory all at once, so I yield each line in a python data structure. So the function to get lines from the file might look something like:

例如,我有一个python脚本解析大量的CSV文件,我想返回每一行在另一个函数中处理。我不想一次将兆字节的数据存储在内存中,所以我在python数据结构中产生每一行。因此,从文件中获取行的功能可能如下所示:

def get_lines(files):
    for f in files:
        for line in f:
            #preprocess line
            yield line

I can then use the same syntax as with lists to access the output of this function:

然后,我可以使用与列表相同的语法来访问此函数的输出:

for line in get_lines(files):
    #process line

but I save a lot of memory usage.

但我节省了大量的内存使用量。

#2


15  

Simply put, yield gives you a generator. You'd use it where you would normally use a return in a function. As a really contrived example cut and pasted from a prompt...

简而言之,yield可以为您提供发电机。你可以在通常在函数中使用return的地方使用它。作为一个真正做作的例子,从一个提示切割和粘贴......

>>> def get_odd_numbers(i):
...     return range(1, i, 2)
... 
>>> def yield_odd_numbers(i):
...     for x in range(1, i, 2):
...             yield x
... 
>>> foo = get_odd_numbers(10)
>>> bar = yield_odd_numbers(10)
>>> foo
[1, 3, 5, 7, 9]
>>> bar
<generator object yield_odd_numbers at 0x1029c6f50>
>>> bar.next()
1
>>> bar.next()
3
>>> bar.next()
5

As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable--which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it--this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)

如您所见,在第一种情况下,foo会立即将整个列表保存在内存中。对于包含5个元素的列表来说,这不是什么大问题,但是如果你想要一个500万的列表呢?这不仅是一个巨大的内存消耗者,而且在调用函数时也需要花费大量时间来构建。在第二种情况下,bar只给你一个发电机。生成器是可迭代的 - 这意味着您可以在for循环等中使用它,但每个值只能被访问一次。所有值也不会同时存储在内存中;生成器对象“记住”上次调用它时循环的位置 - 这样,如果你使用一个可迭代(比如说)计数到500亿,那么你不需要数到500亿全部立刻存储500亿个数字。再次,这是一个非常人为的例子,如果你真的想要数到500亿,你可能会使用itertools。 :)

This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.

这是生成器最简单的用例。正如你所说,它可以用来编写有效的排列,使用yield来通过调用堆栈推送,而不是使用某种堆栈变量。生成器也可以用于专门的树遍历,以及其他各种方式。

Further reading:

#3


3  

Another use is in a network client. Use 'yield' in a generator function to round-robin through multiple sockets without the complexity of threads.

另一种用途是在网络客户端中。在生成器函数中使用'yield'来循环遍历多个套接字,而不会出现线程的复杂性。

For example, I had a hardware test client that needed to send a R,G,B planes of an image to firmware. The data needed to be sent in lockstep: red, green, blue, red, green, blue. Rather than spawn three threads, I had a generator that read from the file, encoded the buffer. Each buffer was a 'yield buf'. End of file, function returned and I had end-of-iteration.

例如,我有一个硬件测试客户端,需要将图像的R,G,B平面发送到固件。需要以锁步方式发送数据:红色,绿色,蓝色,红色,绿色,蓝色。我没有生成三个线程,而是有一个从文件读取的生成器,编码缓冲区。每个缓冲区都是'yield buf'。文件结束,函数返回,我有迭代结束。

My client code looped through the three generator functions, getting buffers until end-of-iteration.

我的客户端代码循环通过三个生成器函数,获取缓冲区直到迭代结束。

#4


1  

I'm reading Data Structures and Algorithms in Python

我正在阅读Python中的数据结构和算法

There is a fabonacci function using yield. I think it's the best moment to use yield.

使用yield有一个fabonacci函数。我认为这是使用收益率的最佳时机。

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b

you can use this like:

你可以这样使用:

f = fibonacci()
for i, f in enumerate(f):
    print i, f
    if i >= 100: break

So, I think, maybe, when the next element is depending on previous elements, it's time to use yield.

所以,我想,也许,当下一个元素依赖于之前的元素时,是时候使用yield了。

#1


53  

yield is best used when you have a function that returns a sequence and you want to iterate over that sequence, but you do not need to have every value in memory at once.

当你有一个返回一个序列的函数并且想要迭代该序列时,最好使用yield,但是你不需要一次在内存中包含每个值。

For example, I have a python script that parses a large list of CSV files, and I want to return each line to be processed in another function. I don't want to store the megabytes of data in memory all at once, so I yield each line in a python data structure. So the function to get lines from the file might look something like:

例如,我有一个python脚本解析大量的CSV文件,我想返回每一行在另一个函数中处理。我不想一次将兆字节的数据存储在内存中,所以我在python数据结构中产生每一行。因此,从文件中获取行的功能可能如下所示:

def get_lines(files):
    for f in files:
        for line in f:
            #preprocess line
            yield line

I can then use the same syntax as with lists to access the output of this function:

然后,我可以使用与列表相同的语法来访问此函数的输出:

for line in get_lines(files):
    #process line

but I save a lot of memory usage.

但我节省了大量的内存使用量。

#2


15  

Simply put, yield gives you a generator. You'd use it where you would normally use a return in a function. As a really contrived example cut and pasted from a prompt...

简而言之,yield可以为您提供发电机。你可以在通常在函数中使用return的地方使用它。作为一个真正做作的例子,从一个提示切割和粘贴......

>>> def get_odd_numbers(i):
...     return range(1, i, 2)
... 
>>> def yield_odd_numbers(i):
...     for x in range(1, i, 2):
...             yield x
... 
>>> foo = get_odd_numbers(10)
>>> bar = yield_odd_numbers(10)
>>> foo
[1, 3, 5, 7, 9]
>>> bar
<generator object yield_odd_numbers at 0x1029c6f50>
>>> bar.next()
1
>>> bar.next()
3
>>> bar.next()
5

As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable--which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it--this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)

如您所见,在第一种情况下,foo会立即将整个列表保存在内存中。对于包含5个元素的列表来说,这不是什么大问题,但是如果你想要一个500万的列表呢?这不仅是一个巨大的内存消耗者,而且在调用函数时也需要花费大量时间来构建。在第二种情况下,bar只给你一个发电机。生成器是可迭代的 - 这意味着您可以在for循环等中使用它,但每个值只能被访问一次。所有值也不会同时存储在内存中;生成器对象“记住”上次调用它时循环的位置 - 这样,如果你使用一个可迭代(比如说)计数到500亿,那么你不需要数到500亿全部立刻存储500亿个数字。再次,这是一个非常人为的例子,如果你真的想要数到500亿,你可能会使用itertools。 :)

This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.

这是生成器最简单的用例。正如你所说,它可以用来编写有效的排列,使用yield来通过调用堆栈推送,而不是使用某种堆栈变量。生成器也可以用于专门的树遍历,以及其他各种方式。

Further reading:

#3


3  

Another use is in a network client. Use 'yield' in a generator function to round-robin through multiple sockets without the complexity of threads.

另一种用途是在网络客户端中。在生成器函数中使用'yield'来循环遍历多个套接字,而不会出现线程的复杂性。

For example, I had a hardware test client that needed to send a R,G,B planes of an image to firmware. The data needed to be sent in lockstep: red, green, blue, red, green, blue. Rather than spawn three threads, I had a generator that read from the file, encoded the buffer. Each buffer was a 'yield buf'. End of file, function returned and I had end-of-iteration.

例如,我有一个硬件测试客户端,需要将图像的R,G,B平面发送到固件。需要以锁步方式发送数据:红色,绿色,蓝色,红色,绿色,蓝色。我没有生成三个线程,而是有一个从文件读取的生成器,编码缓冲区。每个缓冲区都是'yield buf'。文件结束,函数返回,我有迭代结束。

My client code looped through the three generator functions, getting buffers until end-of-iteration.

我的客户端代码循环通过三个生成器函数,获取缓冲区直到迭代结束。

#4


1  

I'm reading Data Structures and Algorithms in Python

我正在阅读Python中的数据结构和算法

There is a fabonacci function using yield. I think it's the best moment to use yield.

使用yield有一个fabonacci函数。我认为这是使用收益率的最佳时机。

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b

you can use this like:

你可以这样使用:

f = fibonacci()
for i, f in enumerate(f):
    print i, f
    if i >= 100: break

So, I think, maybe, when the next element is depending on previous elements, it's time to use yield.

所以,我想,也许,当下一个元素依赖于之前的元素时,是时候使用yield了。