multiprocessing.Pool:map_async和imap有什么区别?

时间:2021-11-29 21:02:41

I'm trying to learn how to use Python's multiprocessing package, but I don't understand the difference between map_async and imap. I noticed that both map_async and imap are executed asynchronously. So when should I use one over the other? And how should I retrieve the result returned by map_async?

我正在尝试学习如何使用Python的多处理包,但我不明白map_async和imap之间的区别。我注意到map_async和imap都是异步执行的。那我什么时候应该使用另一个呢?我应该如何检索map_async返回的结果?

Should I use something like this?

我应该使用这样的东西吗?

def test():
    result = pool.map_async()
    pool.close()
    pool.join()
    return result.get()

result=test()
for i in result:
    print i

1 个解决方案

#1


292  

There are two key differences between imap/imap_unordered and map/map_async:

imap / imap_unordered和map / map_async之间有两个主要区别:

  1. The way they consume the iterable you pass to them.
  2. 他们使用迭代的方式传递给他们。
  3. The way they return the result back to you.
  4. 他们将结果返回给您的方式。

map consumes your iterable by converting the iterable to a list (assuming it isn't a list already), breaking it into chunks, and sending those chunks to the worker processes in the Pool. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.

map通过将iterable转换为列表(假设它不是列表),将其分解为块,并将这些块发送到池中的工作进程来消耗您的iterable。将iterable分解为块比执行一次一个项目之间的可迭代中的每个项目更好 - 特别是如果可迭代的大。但是,将iterable转换为列表以便进行块化可能会产生非常高的内存成本,因为整个列表需要保存在内存中。

imap doesn't turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process. This means you don't take the memory hit of converting the whole iterable to a list, but it also means the performance is slower for large iterables, because of the lack of chunking. This can be mitigated by passing a chunksize argument larger than default of 1, however.

imap不会将您提供的可迭代变为列表,也不会将其分解为块(默认情况下)。它将一次遍历可迭代的一个元素,并将它们分别发送到工作进程。这意味着您不会将整个迭代转换为列表的内存命中,但这也意味着大型迭代的性能较慢,因为缺少分块。但是,可以通过传递大于默认值1的chunksize参数来减轻这种情况。

The other major difference between imap/imap_unordered and map/map_async, is that with imap/imap_unordered, you can start receiving results from workers as soon as they're ready, rather than having to wait for all of them to be finished. With map_async, an AsyncResult is returned right away, but you can't actually retrieve results from that object until all of them have been processed, at which points it returns the same list that map does (map is actually implemented internally as map_async(...).get()). There's no way to get partial results; you either have the entire result, or nothing.

imap / imap_unordered和map / map_async之间的另一个主要区别是,使用imap / imap_unordered,您可以在工作准备就绪后立即开始接收工作结果,而不必等待所有工作完成。使用map_async,会立即返回AsyncResult,但是在完成所有对象之前,您无法实际检索该对象的结果,此时它将返回映射所执行的相同列表(映射实际上在内部实现为map_async(。 ..)。得到())。没有办法得到部分结果;你要么拥有整个结果,要么没有结果。

imap and imap_unordered both return iterables right away. With imap, the results will be yielded from the iterable as soon as they're ready, while still preserving the ordering of the input iterable. With imap_unordered, results will be yielded as soon as they're ready, regardless of the order of the input iterable. So, say you have this:

imap和imap_unordered都会立即返回iterables。使用imap,结果将在它们准备就绪时从迭代中产生,同时仍保留输入可迭代的顺序。使用imap_unordered,无论输入可迭代的顺序如何,只要它们准备就会产生结果。所以,说你有这个:

import multiprocessing
import time

def func(x):
    time.sleep(x)
    return x + 2

if __name__ == "__main__":    
    p = multiprocessing.Pool()
    start = time.time()
    for x in p.imap(func, [1,5,3]):
        print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))

This will output:

这将输出:

3 (Time elapsed: 1s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)

If you use p.imap_unordered instead of p.imap, you'll see:

如果你使用p.imap_unordered而不是p.imap,你会看到:

3 (Time elapsed: 1s)
5 (Time elapsed: 3s)
7 (Time elapsed: 5s)

If you use p.map or p.map_async().get(), you'll see:

如果你使用p.map或p.map_async()。get(),你会看到:

3 (Time elapsed: 5s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)

So, the primary reasons to use imap/imap_unordered over map_async are:

因此,在map_async上使用imap / imap_unordered的主要原因是:

  1. Your iterable is large enough that converting it to a list would cause you to run out of/use too much memory.
  2. 您的iterable足够大,将其转换为列表会导致您耗尽/使用太多内存。
  3. You want to be able to start processing the results before all of them are completed.
  4. 您希望能够在完成所有结果之前开始处理结果。

#1


292  

There are two key differences between imap/imap_unordered and map/map_async:

imap / imap_unordered和map / map_async之间有两个主要区别:

  1. The way they consume the iterable you pass to them.
  2. 他们使用迭代的方式传递给他们。
  3. The way they return the result back to you.
  4. 他们将结果返回给您的方式。

map consumes your iterable by converting the iterable to a list (assuming it isn't a list already), breaking it into chunks, and sending those chunks to the worker processes in the Pool. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.

map通过将iterable转换为列表(假设它不是列表),将其分解为块,并将这些块发送到池中的工作进程来消耗您的iterable。将iterable分解为块比执行一次一个项目之间的可迭代中的每个项目更好 - 特别是如果可迭代的大。但是,将iterable转换为列表以便进行块化可能会产生非常高的内存成本,因为整个列表需要保存在内存中。

imap doesn't turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process. This means you don't take the memory hit of converting the whole iterable to a list, but it also means the performance is slower for large iterables, because of the lack of chunking. This can be mitigated by passing a chunksize argument larger than default of 1, however.

imap不会将您提供的可迭代变为列表,也不会将其分解为块(默认情况下)。它将一次遍历可迭代的一个元素,并将它们分别发送到工作进程。这意味着您不会将整个迭代转换为列表的内存命中,但这也意味着大型迭代的性能较慢,因为缺少分块。但是,可以通过传递大于默认值1的chunksize参数来减轻这种情况。

The other major difference between imap/imap_unordered and map/map_async, is that with imap/imap_unordered, you can start receiving results from workers as soon as they're ready, rather than having to wait for all of them to be finished. With map_async, an AsyncResult is returned right away, but you can't actually retrieve results from that object until all of them have been processed, at which points it returns the same list that map does (map is actually implemented internally as map_async(...).get()). There's no way to get partial results; you either have the entire result, or nothing.

imap / imap_unordered和map / map_async之间的另一个主要区别是,使用imap / imap_unordered,您可以在工作准备就绪后立即开始接收工作结果,而不必等待所有工作完成。使用map_async,会立即返回AsyncResult,但是在完成所有对象之前,您无法实际检索该对象的结果,此时它将返回映射所执行的相同列表(映射实际上在内部实现为map_async(。 ..)。得到())。没有办法得到部分结果;你要么拥有整个结果,要么没有结果。

imap and imap_unordered both return iterables right away. With imap, the results will be yielded from the iterable as soon as they're ready, while still preserving the ordering of the input iterable. With imap_unordered, results will be yielded as soon as they're ready, regardless of the order of the input iterable. So, say you have this:

imap和imap_unordered都会立即返回iterables。使用imap,结果将在它们准备就绪时从迭代中产生,同时仍保留输入可迭代的顺序。使用imap_unordered,无论输入可迭代的顺序如何,只要它们准备就会产生结果。所以,说你有这个:

import multiprocessing
import time

def func(x):
    time.sleep(x)
    return x + 2

if __name__ == "__main__":    
    p = multiprocessing.Pool()
    start = time.time()
    for x in p.imap(func, [1,5,3]):
        print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))

This will output:

这将输出:

3 (Time elapsed: 1s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)

If you use p.imap_unordered instead of p.imap, you'll see:

如果你使用p.imap_unordered而不是p.imap,你会看到:

3 (Time elapsed: 1s)
5 (Time elapsed: 3s)
7 (Time elapsed: 5s)

If you use p.map or p.map_async().get(), you'll see:

如果你使用p.map或p.map_async()。get(),你会看到:

3 (Time elapsed: 5s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)

So, the primary reasons to use imap/imap_unordered over map_async are:

因此,在map_async上使用imap / imap_unordered的主要原因是:

  1. Your iterable is large enough that converting it to a list would cause you to run out of/use too much memory.
  2. 您的iterable足够大,将其转换为列表会导致您耗尽/使用太多内存。
  3. You want to be able to start processing the results before all of them are completed.
  4. 您希望能够在完成所有结果之前开始处理结果。