如何在python子进程之间传递大型numpy数组而不保存到磁盘?

时间:2022-10-23 21:43:14

Is there a good way to pass a large chunk of data between two python subprocesses without using the disk? Here's a cartoon example of what I'm hoping to accomplish:

有没有一种很好的方法可以在不使用磁盘的情况下在两个python子进程之间传递大量数据?这是我希望完成的动画示例:

import sys, subprocess, numpy

cmdString = """
import sys, numpy

done = False
while not done:
    cmd = raw_input()
    if cmd == 'done':
        done = True
    elif cmd == 'data':
        ##Fake data. In real life, get data from hardware.
        data = numpy.zeros(1000000, dtype=numpy.uint8)
        data.dump('data.pkl')
        sys.stdout.write('data.pkl' + '\\n')
        sys.stdout.flush()"""

proc = subprocess.Popen( #python vs. pythonw on Windows?
    [sys.executable, '-c %s'%cmdString],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE)

for i in range(3):
    proc.stdin.write('data\n')
    print proc.stdout.readline().rstrip()
    a = numpy.load('data.pkl')
    print a.shape

proc.stdin.write('done\n')

This creates a subprocess which generates a numpy array and saves the array to disk. The parent process then loads the array from disk. It works!

这将创建一个子进程,该子进程生成numpy数组并将数组保存到磁盘。然后父进程从磁盘加载数组。有用!

The problem is, our hardware can generate data 10x faster than the disk can read/write. Is there a way to transfer data from one python process to another purely in-memory, maybe even without making a copy of the data? Can I do something like passing-by-reference?

问题是,我们的硬件可以生成比磁盘可读/写快10倍的数据。有没有办法将数据从一个python进程传输到另一个纯内存中,甚至可能没有复制数据?我可以做一些像传递参考的东西吗?

My first attempt at transferring data purely in-memory is pretty lousy:

我第一次尝试纯粹在内存中传输数据非常糟糕:

import sys, subprocess, numpy

cmdString = """
import sys, numpy

done = False
while not done:
    cmd = raw_input()
    if cmd == 'done':
        done = True
    elif cmd == 'data':
        ##Fake data. In real life, get data from hardware.
        data = numpy.zeros(1000000, dtype=numpy.uint8)
        ##Note that this is NFG if there's a '10' in the array:
        sys.stdout.write(data.tostring() + '\\n')
        sys.stdout.flush()"""

proc = subprocess.Popen( #python vs. pythonw on Windows?
    [sys.executable, '-c %s'%cmdString],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE)

for i in range(3):
    proc.stdin.write('data\n')
    a = numpy.fromstring(proc.stdout.readline().rstrip(), dtype=numpy.uint8)
    print a.shape

proc.stdin.write('done\n')

This is extremely slow (much slower than saving to disk) and very, very fragile. There's got to be a better way!

这非常慢(比保存到磁盘慢得多)而且非常非常脆弱。必须有更好的方法!

I'm not married to the 'subprocess' module, as long as the data-taking process doesn't block the parent application. I briefly tried 'multiprocessing', but without success so far.

我没有与'subprocess'模块结合,只要数据获取过程不会阻止父应用程序。我简单地尝试了“多处理”,但到目前为止还没有成功。

Background: We have a piece of hardware that generates up to ~2 GB/s of data in a series of ctypes buffers. The python code to handle these buffers has its hands full just dealing with the flood of information. I want to coordinate this flow of information with several other pieces of hardware running simultaneously in a 'master' program, without the subprocesses blocking each other. My current approach is to boil the data down a little bit in the subprocess before saving to disk, but it'd be nice to pass the full monty to the 'master' process.

背景:我们有一块硬件可以在一系列ctypes缓冲区中生成高达~2 GB / s的数据。处理这些缓冲区的python代码完全处理大量信息。我希望将这些信息流与在“主”程序中同时运行的其他几个硬件进行协调,而不会使子进程相互阻塞。我目前的做法是在保存到磁盘之前将数据在子进程中稍微降低一点,但将完整的monty传递给'master'进程会很好。

6 个解决方案

#1


26  

While googling around for more information about the code Joe Kington posted, I found the numpy-sharedmem package. Judging from this numpy/multiprocessing tutorial it seems to share the same intellectual heritage (maybe largely the same authors? -- I'm not sure).

在谷歌搜索有关Joe Kington发布的代码的更多信息时,我找到了numpy-sharedmem包。从这个numpy /多处理教程来看,它似乎拥有相同的知识遗产(可能主要是同一作者? - 我不确定)。

Using the sharedmem module, you can create a shared-memory numpy array (awesome!), and use it with multiprocessing like this:

使用sharedmem模块,您可以创建一个共享内存numpy数组(太棒了!),并将它与多处理一起使用,如下所示:

import sharedmem as shm
import numpy as np
import multiprocessing as mp

def worker(q,arr):
    done = False
    while not done:
        cmd = q.get()
        if cmd == 'done':
            done = True
        elif cmd == 'data':
            ##Fake data. In real life, get data from hardware.
            rnd=np.random.randint(100)
            print('rnd={0}'.format(rnd))
            arr[:]=rnd
        q.task_done()

if __name__=='__main__':
    N=10
    arr=shm.zeros(N,dtype=np.uint8)
    q=mp.JoinableQueue()    
    proc = mp.Process(target=worker, args=[q,arr])
    proc.daemon=True
    proc.start()

    for i in range(3):
        q.put('data')
        # Wait for the computation to finish
        q.join()   
        print arr.shape
        print(arr)
    q.put('done')
    proc.join()

Running yields

rnd=53
(10,)
[53 53 53 53 53 53 53 53 53 53]
rnd=15
(10,)
[15 15 15 15 15 15 15 15 15 15]
rnd=87
(10,)
[87 87 87 87 87 87 87 87 87 87]

#2


9  

Basically, you just want to share a block of memory between processes and view it as a numpy array, right?

基本上,你只想在进程之间共享一块内存并将其视为一个numpy数组,对吧?

In that case, have a look at this (Posted to numpy-discussion by Nadav Horesh awhile back, not my work). There are a couple of similar implementations (some more flexible), but they all essentially use this principle.

在这种情况下,看看这个(由Nadav Horesh发布到Numpy讨论了一段时间,而不是我的工作)。有几个类似的实现(一些更灵活),但它们都基本上使用这个原则。

#    "Using Python, multiprocessing and NumPy/SciPy for parallel numerical computing"
# Modified and corrected by Nadav Horesh, Mar 2010
# No rights reserved


import numpy as N
import ctypes
import multiprocessing as MP

_ctypes_to_numpy = {
    ctypes.c_char   : N.dtype(N.uint8),
    ctypes.c_wchar  : N.dtype(N.int16),
    ctypes.c_byte   : N.dtype(N.int8),
    ctypes.c_ubyte  : N.dtype(N.uint8),
    ctypes.c_short  : N.dtype(N.int16),
    ctypes.c_ushort : N.dtype(N.uint16),
    ctypes.c_int    : N.dtype(N.int32),
    ctypes.c_uint   : N.dtype(N.uint32),
    ctypes.c_long   : N.dtype(N.int64),
    ctypes.c_ulong  : N.dtype(N.uint64),
    ctypes.c_float  : N.dtype(N.float32),
    ctypes.c_double : N.dtype(N.float64)}

_numpy_to_ctypes = dict(zip(_ctypes_to_numpy.values(), _ctypes_to_numpy.keys()))


def shmem_as_ndarray(raw_array, shape=None ):

    address = raw_array._obj._wrapper.get_address()
    size = len(raw_array)
    if (shape is None) or (N.asarray(shape).prod() != size):
        shape = (size,)
    elif type(shape) is int:
        shape = (shape,)
    else:
        shape = tuple(shape)

    dtype = _ctypes_to_numpy[raw_array._obj._type_]
    class Dummy(object): pass
    d = Dummy()
    d.__array_interface__ = {
        'data' : (address, False),
        'typestr' : dtype.str,
        'descr' :   dtype.descr,
        'shape' : shape,
        'strides' : None,
        'version' : 3}
    return N.asarray(d)

def empty_shared_array(shape, dtype, lock=True):
    '''
    Generate an empty MP shared array given ndarray parameters
    '''

    if type(shape) is not int:
        shape = N.asarray(shape).prod()
    try:
        c_type = _numpy_to_ctypes[dtype]
    except KeyError:
        c_type = _numpy_to_ctypes[N.dtype(dtype)]
    return MP.Array(c_type, shape, lock=lock)

def emptylike_shared_array(ndarray, lock=True):
    'Generate a empty shared array with size and dtype of a  given array'
    return empty_shared_array(ndarray.size, ndarray.dtype, lock)

#3


5  

From the other answers, it seems that numpy-sharedmem is the way to go.

从其他答案来看,似乎numpy-sharedmem是要走的路。

However, if you need a pure python solution, or installing extensions, cython or the like is a (big) hassle, you might want to use the following code which is a simplified version of Nadav's code:

但是,如果你需要一个纯python解决方案,或安装扩展,cython等是一个(大)麻烦,你可能想使用下面的代码,这是Nadav的代码的简化版本:

import numpy, ctypes, multiprocessing

_ctypes_to_numpy = {
    ctypes.c_char   : numpy.dtype(numpy.uint8),
    ctypes.c_wchar  : numpy.dtype(numpy.int16),
    ctypes.c_byte   : numpy.dtype(numpy.int8),
    ctypes.c_ubyte  : numpy.dtype(numpy.uint8),
    ctypes.c_short  : numpy.dtype(numpy.int16),
    ctypes.c_ushort : numpy.dtype(numpy.uint16),
    ctypes.c_int    : numpy.dtype(numpy.int32),
    ctypes.c_uint   : numpy.dtype(numpy.uint32),
    ctypes.c_long   : numpy.dtype(numpy.int64),
    ctypes.c_ulong  : numpy.dtype(numpy.uint64),
    ctypes.c_float  : numpy.dtype(numpy.float32),
    ctypes.c_double : numpy.dtype(numpy.float64)}

_numpy_to_ctypes = dict(zip(_ctypes_to_numpy.values(),
                            _ctypes_to_numpy.keys()))


def shm_as_ndarray(mp_array, shape = None):
    '''Given a multiprocessing.Array, returns an ndarray pointing to
    the same data.'''

    # support SynchronizedArray:
    if not hasattr(mp_array, '_type_'):
        mp_array = mp_array.get_obj()

    dtype = _ctypes_to_numpy[mp_array._type_]
    result = numpy.frombuffer(mp_array, dtype)

    if shape is not None:
        result = result.reshape(shape)

    return numpy.asarray(result)


def ndarray_to_shm(array, lock = False):
    '''Generate an 1D multiprocessing.Array containing the data from
    the passed ndarray.  The data will be *copied* into shared
    memory.'''

    array1d = array.ravel(order = 'A')

    try:
        c_type = _numpy_to_ctypes[array1d.dtype]
    except KeyError:
        c_type = _numpy_to_ctypes[numpy.dtype(array1d.dtype)]

    result = multiprocessing.Array(c_type, array1d.size, lock = lock)
    shm_as_ndarray(result)[:] = array1d
    return result

You would use it like this:

你会像这样使用它:

  1. Use sa = ndarray_to_shm(a) to convert the ndarray a into a shared multiprocessing.Array.
  2. 使用sa = ndarray_to_shm(a)将ndarray a转换为共享的multiprocessing.Array。

  3. Use multiprocessing.Process(target = somefunc, args = (sa, ) (and start, maybe join) to call somefunc in a separate process, passing the shared array.
  4. 使用multiprocessing.Process(target = somefunc,args =(sa,)(并启动,可能是join)在单独的进程中调用somefunc,传递共享数组。

  5. In somefunc, use a = shm_as_ndarray(sa) to get an ndarray pointing to the shared data. (Actually, you may want to do the same in the original process, immediately after creating sa, in order to have two ndarrays referencing the same data.)
  6. 在somefunc中,使用= shm_as_ndarray(sa)来获取指向共享数据的ndarray。 (实际上,您可能希望在创建sa之后立即在原始进程中执行相同操作,以便有两个ndarray引用相同的数据。)

AFAICS, you don't need to set lock to True, since shm_as_ndarray will not use the locking anyhow. If you need locking, you would set lock to True and call acquire/release on sa.

AFAICS,您不需要将lock设置为True,因为shm_as_ndarray无论如何都不会使用锁定。如果需要锁定,可以将lock设置为True并在sa上调用acquire / release。

Also, if your array is not 1-dimensional, you might want to transfer the shape along with sa (e.g. use args = (sa, a.shape)).

此外,如果您的数组不是一维的,您可能希望将形状与sa一起传输(例如,使用args =(sa,a.shape))。

This solution has the advantage that it does not need additional packages or extension modules, except multiprocessing (which is in the standard library).

该解决方案的优点是除了多处理(在标准库中)之外,它不需要额外的包或扩展模块。

#4


3  

Use threads. But I guess you are going to get problems with the GIL.

使用线程。但我想你会遇到GIL的问题。

Instead: Choose your poison.

相反:选择你的毒药。

I know from the MPI implementations I work with, that they use shared memory for on-node-communications. You will have to code your own synchronization in that case.

我从我使用的MPI实现中了解到,他们使用共享内存进行节点间通信。在这种情况下,您必须编写自己的同步代码。

2 GB/s sounds like you will get problems with most "easy" methods, depending on your real-time constraints and available main memory.

2 GB / s听起来像你会遇到大多数“简单”方法的问题,这取决于你的实时约束和可用的主存储器。

#5


1  

Use threads. You probably won't have problems with the GIL.

使用线程。您可能不会遇到GIL问题。

The GIL only affects Python code, not C/Fortran/Cython backed libraries. Most numpy operations and a good chunk of the C-backed Scientific Python stack release the GIL and can operate just fine on multiple cores. This blogpost discusses the GIL and scientific Python in more depth.

GIL只影响Python代码,而不影响C / Fortran / Cython支持的库。大多数numpy操作和C-backed Scientific Python堆栈的很大一部分都会释放GIL,并且可以在多个内核上运行良好。这篇博文更深入地讨论了GIL和科学Python。

Edit

Simple ways to use threads include the threading module and multiprocessing.pool.ThreadPool.

使用线程的简单方法包括线程模块和multiprocessing.pool.ThreadPool。

#6


1  

One possibility to consider is to use a RAM drive for the temporary storage of files to be shared between processes. A RAM drive is where a portion of RAM is treated as a logical hard drive, to which files can be written/read as you would with a regular drive, but at RAM read/write speeds.

考虑的一种可能性是使用RAM驱动器临时存储要在进程之间共享的文件。 RAM驱动器将RAM的一部分视为逻辑硬盘驱动器,可以像使用常规驱动器一样写入/读取文件,但RAM读/写速度。

This article describes using the ImDisk software (for MS Win) to create such disk and obtains file read/write speeds of 6-10 Gigabytes/second: https://www.tekrevue.com/tip/create-10-gbs-ram-disk-windows/

本文介绍如何使用ImDisk软件(用于MS Win)创建此类磁盘并获得6-10千兆字节/秒的文件读/写速度:https://www.tekrevue.com/tip/create-10-gbs-ram -disk窗口/

An example in Ubuntu: https://askubuntu.com/questions/152868/how-do-i-make-a-ram-disk#152871

Ubuntu中的一个例子:https://askubuntu.com/questions/152868/how-do-i-make-a-ram-disk#152871

Another noted benefit is that files with arbitrary formats can be passed around with such method: e.g. Picke, JSON, XML, CSV, HDF5, etc...

另一个值得注意的好处是具有任意格式的文件可以通过这种方法传递:例如, Picke,JSON,XML,CSV,HDF5等......

Keep in mind that anything stored on the RAM disk is wiped on reboot.

请记住,重新启动时会擦除存储在RAM磁盘上的所有内容。

#1


26  

While googling around for more information about the code Joe Kington posted, I found the numpy-sharedmem package. Judging from this numpy/multiprocessing tutorial it seems to share the same intellectual heritage (maybe largely the same authors? -- I'm not sure).

在谷歌搜索有关Joe Kington发布的代码的更多信息时,我找到了numpy-sharedmem包。从这个numpy /多处理教程来看,它似乎拥有相同的知识遗产(可能主要是同一作者? - 我不确定)。

Using the sharedmem module, you can create a shared-memory numpy array (awesome!), and use it with multiprocessing like this:

使用sharedmem模块,您可以创建一个共享内存numpy数组(太棒了!),并将它与多处理一起使用,如下所示:

import sharedmem as shm
import numpy as np
import multiprocessing as mp

def worker(q,arr):
    done = False
    while not done:
        cmd = q.get()
        if cmd == 'done':
            done = True
        elif cmd == 'data':
            ##Fake data. In real life, get data from hardware.
            rnd=np.random.randint(100)
            print('rnd={0}'.format(rnd))
            arr[:]=rnd
        q.task_done()

if __name__=='__main__':
    N=10
    arr=shm.zeros(N,dtype=np.uint8)
    q=mp.JoinableQueue()    
    proc = mp.Process(target=worker, args=[q,arr])
    proc.daemon=True
    proc.start()

    for i in range(3):
        q.put('data')
        # Wait for the computation to finish
        q.join()   
        print arr.shape
        print(arr)
    q.put('done')
    proc.join()

Running yields

rnd=53
(10,)
[53 53 53 53 53 53 53 53 53 53]
rnd=15
(10,)
[15 15 15 15 15 15 15 15 15 15]
rnd=87
(10,)
[87 87 87 87 87 87 87 87 87 87]

#2


9  

Basically, you just want to share a block of memory between processes and view it as a numpy array, right?

基本上,你只想在进程之间共享一块内存并将其视为一个numpy数组,对吧?

In that case, have a look at this (Posted to numpy-discussion by Nadav Horesh awhile back, not my work). There are a couple of similar implementations (some more flexible), but they all essentially use this principle.

在这种情况下,看看这个(由Nadav Horesh发布到Numpy讨论了一段时间,而不是我的工作)。有几个类似的实现(一些更灵活),但它们都基本上使用这个原则。

#    "Using Python, multiprocessing and NumPy/SciPy for parallel numerical computing"
# Modified and corrected by Nadav Horesh, Mar 2010
# No rights reserved


import numpy as N
import ctypes
import multiprocessing as MP

_ctypes_to_numpy = {
    ctypes.c_char   : N.dtype(N.uint8),
    ctypes.c_wchar  : N.dtype(N.int16),
    ctypes.c_byte   : N.dtype(N.int8),
    ctypes.c_ubyte  : N.dtype(N.uint8),
    ctypes.c_short  : N.dtype(N.int16),
    ctypes.c_ushort : N.dtype(N.uint16),
    ctypes.c_int    : N.dtype(N.int32),
    ctypes.c_uint   : N.dtype(N.uint32),
    ctypes.c_long   : N.dtype(N.int64),
    ctypes.c_ulong  : N.dtype(N.uint64),
    ctypes.c_float  : N.dtype(N.float32),
    ctypes.c_double : N.dtype(N.float64)}

_numpy_to_ctypes = dict(zip(_ctypes_to_numpy.values(), _ctypes_to_numpy.keys()))


def shmem_as_ndarray(raw_array, shape=None ):

    address = raw_array._obj._wrapper.get_address()
    size = len(raw_array)
    if (shape is None) or (N.asarray(shape).prod() != size):
        shape = (size,)
    elif type(shape) is int:
        shape = (shape,)
    else:
        shape = tuple(shape)

    dtype = _ctypes_to_numpy[raw_array._obj._type_]
    class Dummy(object): pass
    d = Dummy()
    d.__array_interface__ = {
        'data' : (address, False),
        'typestr' : dtype.str,
        'descr' :   dtype.descr,
        'shape' : shape,
        'strides' : None,
        'version' : 3}
    return N.asarray(d)

def empty_shared_array(shape, dtype, lock=True):
    '''
    Generate an empty MP shared array given ndarray parameters
    '''

    if type(shape) is not int:
        shape = N.asarray(shape).prod()
    try:
        c_type = _numpy_to_ctypes[dtype]
    except KeyError:
        c_type = _numpy_to_ctypes[N.dtype(dtype)]
    return MP.Array(c_type, shape, lock=lock)

def emptylike_shared_array(ndarray, lock=True):
    'Generate a empty shared array with size and dtype of a  given array'
    return empty_shared_array(ndarray.size, ndarray.dtype, lock)

#3


5  

From the other answers, it seems that numpy-sharedmem is the way to go.

从其他答案来看,似乎numpy-sharedmem是要走的路。

However, if you need a pure python solution, or installing extensions, cython or the like is a (big) hassle, you might want to use the following code which is a simplified version of Nadav's code:

但是,如果你需要一个纯python解决方案,或安装扩展,cython等是一个(大)麻烦,你可能想使用下面的代码,这是Nadav的代码的简化版本:

import numpy, ctypes, multiprocessing

_ctypes_to_numpy = {
    ctypes.c_char   : numpy.dtype(numpy.uint8),
    ctypes.c_wchar  : numpy.dtype(numpy.int16),
    ctypes.c_byte   : numpy.dtype(numpy.int8),
    ctypes.c_ubyte  : numpy.dtype(numpy.uint8),
    ctypes.c_short  : numpy.dtype(numpy.int16),
    ctypes.c_ushort : numpy.dtype(numpy.uint16),
    ctypes.c_int    : numpy.dtype(numpy.int32),
    ctypes.c_uint   : numpy.dtype(numpy.uint32),
    ctypes.c_long   : numpy.dtype(numpy.int64),
    ctypes.c_ulong  : numpy.dtype(numpy.uint64),
    ctypes.c_float  : numpy.dtype(numpy.float32),
    ctypes.c_double : numpy.dtype(numpy.float64)}

_numpy_to_ctypes = dict(zip(_ctypes_to_numpy.values(),
                            _ctypes_to_numpy.keys()))


def shm_as_ndarray(mp_array, shape = None):
    '''Given a multiprocessing.Array, returns an ndarray pointing to
    the same data.'''

    # support SynchronizedArray:
    if not hasattr(mp_array, '_type_'):
        mp_array = mp_array.get_obj()

    dtype = _ctypes_to_numpy[mp_array._type_]
    result = numpy.frombuffer(mp_array, dtype)

    if shape is not None:
        result = result.reshape(shape)

    return numpy.asarray(result)


def ndarray_to_shm(array, lock = False):
    '''Generate an 1D multiprocessing.Array containing the data from
    the passed ndarray.  The data will be *copied* into shared
    memory.'''

    array1d = array.ravel(order = 'A')

    try:
        c_type = _numpy_to_ctypes[array1d.dtype]
    except KeyError:
        c_type = _numpy_to_ctypes[numpy.dtype(array1d.dtype)]

    result = multiprocessing.Array(c_type, array1d.size, lock = lock)
    shm_as_ndarray(result)[:] = array1d
    return result

You would use it like this:

你会像这样使用它:

  1. Use sa = ndarray_to_shm(a) to convert the ndarray a into a shared multiprocessing.Array.
  2. 使用sa = ndarray_to_shm(a)将ndarray a转换为共享的multiprocessing.Array。

  3. Use multiprocessing.Process(target = somefunc, args = (sa, ) (and start, maybe join) to call somefunc in a separate process, passing the shared array.
  4. 使用multiprocessing.Process(target = somefunc,args =(sa,)(并启动,可能是join)在单独的进程中调用somefunc,传递共享数组。

  5. In somefunc, use a = shm_as_ndarray(sa) to get an ndarray pointing to the shared data. (Actually, you may want to do the same in the original process, immediately after creating sa, in order to have two ndarrays referencing the same data.)
  6. 在somefunc中,使用= shm_as_ndarray(sa)来获取指向共享数据的ndarray。 (实际上,您可能希望在创建sa之后立即在原始进程中执行相同操作,以便有两个ndarray引用相同的数据。)

AFAICS, you don't need to set lock to True, since shm_as_ndarray will not use the locking anyhow. If you need locking, you would set lock to True and call acquire/release on sa.

AFAICS,您不需要将lock设置为True,因为shm_as_ndarray无论如何都不会使用锁定。如果需要锁定,可以将lock设置为True并在sa上调用acquire / release。

Also, if your array is not 1-dimensional, you might want to transfer the shape along with sa (e.g. use args = (sa, a.shape)).

此外,如果您的数组不是一维的,您可能希望将形状与sa一起传输(例如,使用args =(sa,a.shape))。

This solution has the advantage that it does not need additional packages or extension modules, except multiprocessing (which is in the standard library).

该解决方案的优点是除了多处理(在标准库中)之外,它不需要额外的包或扩展模块。

#4


3  

Use threads. But I guess you are going to get problems with the GIL.

使用线程。但我想你会遇到GIL的问题。

Instead: Choose your poison.

相反:选择你的毒药。

I know from the MPI implementations I work with, that they use shared memory for on-node-communications. You will have to code your own synchronization in that case.

我从我使用的MPI实现中了解到,他们使用共享内存进行节点间通信。在这种情况下,您必须编写自己的同步代码。

2 GB/s sounds like you will get problems with most "easy" methods, depending on your real-time constraints and available main memory.

2 GB / s听起来像你会遇到大多数“简单”方法的问题,这取决于你的实时约束和可用的主存储器。

#5


1  

Use threads. You probably won't have problems with the GIL.

使用线程。您可能不会遇到GIL问题。

The GIL only affects Python code, not C/Fortran/Cython backed libraries. Most numpy operations and a good chunk of the C-backed Scientific Python stack release the GIL and can operate just fine on multiple cores. This blogpost discusses the GIL and scientific Python in more depth.

GIL只影响Python代码,而不影响C / Fortran / Cython支持的库。大多数numpy操作和C-backed Scientific Python堆栈的很大一部分都会释放GIL,并且可以在多个内核上运行良好。这篇博文更深入地讨论了GIL和科学Python。

Edit

Simple ways to use threads include the threading module and multiprocessing.pool.ThreadPool.

使用线程的简单方法包括线程模块和multiprocessing.pool.ThreadPool。

#6


1  

One possibility to consider is to use a RAM drive for the temporary storage of files to be shared between processes. A RAM drive is where a portion of RAM is treated as a logical hard drive, to which files can be written/read as you would with a regular drive, but at RAM read/write speeds.

考虑的一种可能性是使用RAM驱动器临时存储要在进程之间共享的文件。 RAM驱动器将RAM的一部分视为逻辑硬盘驱动器,可以像使用常规驱动器一样写入/读取文件,但RAM读/写速度。

This article describes using the ImDisk software (for MS Win) to create such disk and obtains file read/write speeds of 6-10 Gigabytes/second: https://www.tekrevue.com/tip/create-10-gbs-ram-disk-windows/

本文介绍如何使用ImDisk软件(用于MS Win)创建此类磁盘并获得6-10千兆字节/秒的文件读/写速度:https://www.tekrevue.com/tip/create-10-gbs-ram -disk窗口/

An example in Ubuntu: https://askubuntu.com/questions/152868/how-do-i-make-a-ram-disk#152871

Ubuntu中的一个例子:https://askubuntu.com/questions/152868/how-do-i-make-a-ram-disk#152871

Another noted benefit is that files with arbitrary formats can be passed around with such method: e.g. Picke, JSON, XML, CSV, HDF5, etc...

另一个值得注意的好处是具有任意格式的文件可以通过这种方法传递:例如, Picke,JSON,XML,CSV,HDF5等......

Keep in mind that anything stored on the RAM disk is wiped on reboot.

请记住,重新启动时会擦除存储在RAM磁盘上的所有内容。