如何在Python中进行并行编程

时间:2021-12-03 02:57:45

For C++, we can use OpenMP to do parallel programming; however, OpenMP will not work for Python. What should I do if I want to parallel some parts of my python program?

对于c++,我们可以使用OpenMP进行并行编程;但是,OpenMP不能用于Python。如果我想要并行我的python程序的某些部分,我该怎么做?

The structure of the code may be considered as:

守则的结构可视为:

 solve1(A)
 solve2(B)

Where solve1 and solve2 are two independent function. How to run this kind of code in parallel instead of in sequence in order to reduce the running time? Hope someone can help me. Thanks very much in advance. The code is:

其中,solve1和solve2是两个独立的函数。如何并行运行这类代码而不是顺序运行,以减少运行时间?希望有人能帮助我。非常感谢。的代码是:

def solve(Q, G, n):
    i = 0
    tol = 10 ** -4

    while i < 1000:
        inneropt, partition, x = setinner(Q, G, n)
        outeropt = setouter(Q, G, n)

        if (outeropt - inneropt) / (1 + abs(outeropt) + abs(inneropt)) < tol:
            break

        node1 = partition[0]
        node2 = partition[1]

        G = updateGraph(G, node1, node2)

        if i == 999:
            print "Maximum iteration reaches"
    print inneropt

Where setinner and setouter are two independent functions. That's where I want to parallel...

其中setinner和setouter是两个独立的函数。这就是我想要平行的。

4 个解决方案

#1


102  

You can use the multiprocessing module. For this case I might use a processing pool:

您可以使用多处理模块。对于这种情况,我可以使用一个处理池:

from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A])    # evaluate "solve1(A)" asynchronously
result2 = pool.apply_async(solve2, [B])    # evaluate "solve2(B)" asynchronously
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)

This will spawn processes that can do generic work for you. Since we did not pass processes, it will spawn one process for each CPU core on your machine. Each CPU core can execute one process simultaneously.

这将产生可以为您做通用工作的进程。由于我们没有传递进程,它将为您的机器上的每个CPU内核生成一个进程。每个CPU内核可以同时执行一个进程。

If you want to map a list to a single function you would do this:

如果你想将一个列表映射到一个函数,你可以这样做:

args = [A, B]
results = pool.map(solve1, args)

Don't use threads because the GIL locks any operations on python objects.

不要使用线程,因为GIL锁定了python对象上的任何操作。

#2


5  

If you mean concurrent programming, you can use multiprocessing in Python.

如果您指的是并发编程,那么您可以在Python中使用多处理。

If you want to do some hardcore large-scale parallel data analytics, then try Anaconda, which is now free.

如果你想做一些大型并行数据分析,那么试试Anaconda,现在是免费的。

#3


4  

This can be done very elegantly with Ray.

这可以用Ray很优雅地完成。

To parallelize your example, you'd need to define your functions with the @ray.remote decorator, and then invoke them with .remote.

要并行化示例,需要使用@ray定义函数。远程decorator,然后使用.remote调用它们。

import ray

ray.init()

# Define the functions.

@ray.remote
def solve1(a):
    return 1

@ray.remote
def solve2(b):
    return 2

# Start two tasks in the background.
x_id = solve1.remote(0)
y_id = solve2.remote(1)

# Block until the tasks are done and get the results.
x, y = ray.get([x_id, y_id])

There are a number of advantages of this over the multiprocessing module.

与多处理模块相比,这有许多优点。

  1. The same code will run on a multicore machine as well as a cluster of machines.
  2. 同样的代码将在多核机器上运行,也可以在一组机器上运行。
  3. Processes share data efficiently through shared memory and zero-copy serialization.
  4. 通过共享内存和零拷贝序列化来有效地共享数据。
  5. Error messages are propagated nicely.
  6. 错误消息被很好地传播。
  7. These function calls can be composed together, e.g.,

    这些函数调用可以组合在一起,例如:

    @ray.remote
    def f(x):
        return x + 1
    
    x_id = f.remote(1)
    y_id = f.remote(x_id)
    z_id = f.remote(y_id)
    ray.get(z_id)  # returns 4
    
  8. In addition to invoking functions remotely, classes can be instantiated remotely as actors.
  9. 除了远程调用函数之外,类还可以作为参与者远程实例化。

Note that Ray is a framework I've been helping develop.

请注意,Ray是我一直在帮助开发的一个框架。

#4


3  

CPython uses the Global Interpreter Lock which makes parallel programing a bit more interesting than C++

CPython使用全局解释器锁,这使得并行编程比c++更加有趣

This topic has several useful examples and descriptions of the challenge:

本主题有几个有用的例子和挑战的描述:

Python Global Interpreter Lock (GIL) workaround on multi-core systems using taskset on Linux?

Python全局解释器锁(GIL)在Linux上使用taskset解决多核系统的问题?

#1


102  

You can use the multiprocessing module. For this case I might use a processing pool:

您可以使用多处理模块。对于这种情况,我可以使用一个处理池:

from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A])    # evaluate "solve1(A)" asynchronously
result2 = pool.apply_async(solve2, [B])    # evaluate "solve2(B)" asynchronously
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)

This will spawn processes that can do generic work for you. Since we did not pass processes, it will spawn one process for each CPU core on your machine. Each CPU core can execute one process simultaneously.

这将产生可以为您做通用工作的进程。由于我们没有传递进程,它将为您的机器上的每个CPU内核生成一个进程。每个CPU内核可以同时执行一个进程。

If you want to map a list to a single function you would do this:

如果你想将一个列表映射到一个函数,你可以这样做:

args = [A, B]
results = pool.map(solve1, args)

Don't use threads because the GIL locks any operations on python objects.

不要使用线程,因为GIL锁定了python对象上的任何操作。

#2


5  

If you mean concurrent programming, you can use multiprocessing in Python.

如果您指的是并发编程,那么您可以在Python中使用多处理。

If you want to do some hardcore large-scale parallel data analytics, then try Anaconda, which is now free.

如果你想做一些大型并行数据分析,那么试试Anaconda,现在是免费的。

#3


4  

This can be done very elegantly with Ray.

这可以用Ray很优雅地完成。

To parallelize your example, you'd need to define your functions with the @ray.remote decorator, and then invoke them with .remote.

要并行化示例,需要使用@ray定义函数。远程decorator,然后使用.remote调用它们。

import ray

ray.init()

# Define the functions.

@ray.remote
def solve1(a):
    return 1

@ray.remote
def solve2(b):
    return 2

# Start two tasks in the background.
x_id = solve1.remote(0)
y_id = solve2.remote(1)

# Block until the tasks are done and get the results.
x, y = ray.get([x_id, y_id])

There are a number of advantages of this over the multiprocessing module.

与多处理模块相比,这有许多优点。

  1. The same code will run on a multicore machine as well as a cluster of machines.
  2. 同样的代码将在多核机器上运行,也可以在一组机器上运行。
  3. Processes share data efficiently through shared memory and zero-copy serialization.
  4. 通过共享内存和零拷贝序列化来有效地共享数据。
  5. Error messages are propagated nicely.
  6. 错误消息被很好地传播。
  7. These function calls can be composed together, e.g.,

    这些函数调用可以组合在一起,例如:

    @ray.remote
    def f(x):
        return x + 1
    
    x_id = f.remote(1)
    y_id = f.remote(x_id)
    z_id = f.remote(y_id)
    ray.get(z_id)  # returns 4
    
  8. In addition to invoking functions remotely, classes can be instantiated remotely as actors.
  9. 除了远程调用函数之外,类还可以作为参与者远程实例化。

Note that Ray is a framework I've been helping develop.

请注意,Ray是我一直在帮助开发的一个框架。

#4


3  

CPython uses the Global Interpreter Lock which makes parallel programing a bit more interesting than C++

CPython使用全局解释器锁,这使得并行编程比c++更加有趣

This topic has several useful examples and descriptions of the challenge:

本主题有几个有用的例子和挑战的描述:

Python Global Interpreter Lock (GIL) workaround on multi-core systems using taskset on Linux?

Python全局解释器锁(GIL)在Linux上使用taskset解决多核系统的问题?