如何有效地处理大型复杂的numpy数组？

For my research I am working with large numpy arrays consisting of complex data.

对于我的研究,我正在处理由复杂数据组成的大型numpy数组。

arr = np.empty((15000, 25400), dtype='complex128')
np.save('array.npy'), arr)

When stored they are about 3 GB each. Loading these arrays is a time consuming process, which made me wonder if there are ways to speed this process up

存储时,每个大约3 GB。加载这些数组是一个耗时的过程,这让我想知道是否有办法加快这个过程

One of the things I was thinking of was splitting the array into its complex and real part:

我想到的一件事就是将数组分成复杂而真实的部分:

arr_real = arr.real 
arr_im = arr.imag

and saving each part separately. However, this didn't seem to improve processing speed significantly. There is some documentation about working with large arrays, but I haven't found much information on working with complex data. Are there smart(er) ways to work with large complex arrays?

并分别保存每个部分。但是,这似乎没有显着提高处理速度。有一些关于使用大型数组的文档,但我没有找到有关使用复杂数据的大量信息。是否有智能(呃)方法来处理大型复杂阵列?

1 个解决方案

#1

If you only need parts of the array in memory, you can load it using memory mapping:

如果您只需要内存中的部分数组,则可以使用内存映射加载它:

arr = np.load('array.npy', mmap_mode='r')

From the docs:

来自文档:

A memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file into memory.

内存映射阵列保留在磁盘上。但是,它可以像任何ndarray一样访问和切片。内存映射对于访问大型文件的小片段而不将整个文件读入内存特别有用。

#1