高效的Python数组,1亿个0 ?

时间:2022-10-07 21:28:36

What is an efficient way to initialize and access elements of a large array in Python?

在Python中初始化和访问大型数组元素的有效方法是什么?

I want to create an array in Python with 100 million entries, unsigned 4-byte integers, initialized to zero. I want fast array access, preferably with contiguous memory.

我想在Python中创建一个包含1亿个条目的数组,未签名的4字节整数,初始化为0。我想要快速的数组访问,最好有连续的内存。

Strangely, NumPy arrays seem to be performing very slow. Are there alternatives I can try?

奇怪的是,NumPy数组似乎执行得很慢。有其他的选择吗?

There is the array.array module, but I don't see a method to efficiently allocate a block of 100 million entries.

有一个数组。数组模块,但是我没有看到一种方法可以有效地分配1亿个条目。

Responses to comments:

反应的评论:

  • I cannot use a sparse array. It will be too slow for this algorithm because the array becomes dense very quickly.
  • 我不能使用稀疏数组。对于这个算法来说太慢了,因为数组密度很快。
  • I know Python is interpreted, but surely there is a way to do fast array operations?
  • 我知道Python是被解释的,但是肯定有办法进行快速数组操作吗?
  • I did some profiling, and I get about 160K array accesses (looking up or updating an element by index) per second with NumPy. This seems very slow.
  • 我做了一些分析,得到了每秒160K的数组访问(按索引查找或更新一个元素)和NumPy。这似乎是非常缓慢的。

10 个解决方案

#1


29  

I have done some profiling, and the results are completely counterintuitive. For simple array access operations, numpy and array.array are 10x slower than native Python arrays.

我做了一些分析,结果完全违反直觉。对于简单的数组访问操作,numpy和数组。数组比本地Python数组慢10倍。

Note that for array access, I am doing operations of the form:

注意,对于数组访问,我正在执行表单的操作:

a[i] += 1

Profiles:

配置文件:

  • [0] * 20000000

    [0]* 20000000

    • Access: 2.3M / sec
    • 访问:2.3米/秒
    • Initialization: 0.8s
    • 初始化:0.8秒
  • numpy.zeros(shape=(20000000,), dtype=numpy.int32)

    numpy.zeros(形状=(20000000),dtype = numpy.int32)

    • Access: 160K/sec
    • 访问:160 k /秒
    • Initialization: 0.2s
    • 初始化:0.2秒
  • array.array('L', [0] * 20000000)

    数组中。数组(“L”,[0]* 20000000)

    • Access: 175K/sec
    • 访问:175 k /秒
    • Initialization: 2.0s
    • 初始化:2.0秒
  • array.array('L', (0 for i in range(20000000)))

    数组中。数组('L',范围内i为0 (20000000))

    • Access: 175K/sec, presumably, based upon the profile for the other array.array
    • 访问:175K/秒,大概是基于另一个array.array的概要文件
    • Initialization: 6.7s
    • 初始化:6.7秒

#2


13  

Just a reminder how Python's integers work: if you allocate a list by saying

只要提醒您Python的整数是如何工作的:如果您通过以下方式分配列表

a = [0] * K

you need the memory for the list (sizeof(PyListObject) + K * sizeof(PyObject*)) and the memory for the single integer object 0. As long as the numbers in the list stay below the magic number V that Python uses for caching, you are fine because those are shared, i.e. any name that points to a number n < V points to the exact same object. You can find this value by using the following snippet:

您需要列表(sizeof(PyListObject) + K * sizeof(PyObject*)的内存,以及单个整数对象0的内存。只要列表中的数字保持在Python用于缓存的神奇数字V以下,就没问题,因为它们是共享的,也就是说,任何指向数字n < V的名称都指向同一个对象。您可以使用以下代码片段找到这个值:

>>> i = 0
>>> j = 0
>>> while i is j:
...    i += 1
...    j += 1
>>> i # on my system!
257 

This means that as soon as the counts go above this number, the memory you need is sizeof(PyListObject) + K * sizeof(PyObject*) + d * sizeof(PyIntObject), where d < K is the number of integers above V (== 256). On a 64 bit system, sizeof(PyIntObject) == 24 and sizeof(PyObject*) == 8, i.e. the worst case memory consumption is 3,200,000,000 bytes.

这意味着一旦计数超过这个数字,您需要的内存就是sizeof(PyListObject) + K * sizeof(PyObject*) + d * sizeof(PyIntObject),其中d < K是V(= 256)上面的整数的数目。在64位系统上,sizeof(PyIntObject) = 24, sizeof(PyObject*) = 8,即最坏情况下内存消耗为3200,000,000字节。

With numpy.ndarray or array.array, memory consumption is constant after initialization, but you pay for the wrapper objects that are created transparently, as Thomas Wouters said. Probably, you should think about converting the update code (which accesses and increases the positions in the array) to C code, either by using Cython or scipy.weave.

numpy。ndarray或数组。数组,初始化后内存消耗是常量,但是您需要为透明创建的包装器对象付费,正如Thomas Wouters所说。可能,您应该考虑将更新代码(它访问和增加数组中的位置)转换为C代码,或者使用Cython或scipy.weave。

#3


7  

Try this:

试试这个:

x = [0] * 100000000

It takes just a few seconds to execute on my machine, and access is close to instant.

在我的机器上执行只需要几秒钟,访问几乎是即时的。

#4


5  

If you are are not able to vectorize your calculuations, Python/Numpy will be slow. Numpy is fast because vectorized calculations occur at a lower level than Python. The core numpy functions are all written in C or Fortran. Hence sum(a) is not a python loop with many accesses, it's a single low level C call.

如果您无法向量化您的计算,那么Python/Numpy将会很慢。Numpy是快速的,因为矢量计算发生在比Python低的级别上。核心的numpy函数都是用C或Fortran编写的。因此sum(a)不是一个具有许多访问权限的python循环,而是一个低级别C调用。

Numpy's Performance Python demo page has a good example with different options. You can easily get 100x increase by using a lower level compiled language, Cython, or using vectorized functions if feasible. This blog post that shows a 43 fold increase using Cython for a numpy usecase.

Numpy的Performance Python演示页面有一个具有不同选项的很好的示例。如果可行,您可以使用较低级别编译语言、Cython或使用矢量化函数轻松获得100x的增加。这篇博客文章显示使用Cython可以使麻木的usecase增加43倍。

#5


3  

It's unlikely you'll find anything faster than numpy's array. The implementation of the array itself is as efficient as it would be in, say, C (and basically the same as array.array, just with more usefulness.)

你不可能找到比numpy的数组更快的东西。数组本身的实现与C(基本上与数组相同)中的实现一样高效。数组,更有用)

If you want to speed up your code, you'll have to do it by doing just that. Even though the array is implemented efficiently, accessing it from Python code has certain overhead; for example, indexing the array produces integer objects, which have to be created on the fly. numpy offers a number of operations implemented efficiently in C, but without seeing the actual code that isn't performing as well as you want it's hard to make any specific suggestions.

如果你想要加快你的代码,你必须这么做。即使数组被有效地实现,从Python代码访问它也有一定的开销;例如,对数组进行索引会产生整数对象,这些对象必须动态创建。numpy提供了许多在C中有效实现的操作,但是如果没有看到实际的代码,那么就很难给出任何具体的建议。

#6


3  

For fast creation, use the array module.

要快速创建,请使用数组模块。

Using the array module is ~5 times faster for creation, but about twice as slow for accessing elements compared to a normal list:

使用数组模块进行创建的速度要快5倍,但访问元素的速度要比普通列表慢两倍:

# Create array
python -m timeit -s "from array import array" "a = array('I', '\x00'
 * 100000000)"
10 loops, best of 3: 204 msec per loop

# Access array
python -m timeit -s "from array import array; a = array('I', '\x00'
* 100000000)" "a[4975563]"
10000000 loops, best of 3: 0.0902 usec per loop

# Create list
python -m timeit "a = [0] * 100000000"
10 loops, best of 3: 949 msec per loop

# Access list
python -m timeit  -s "a = [0] * 100000000" "a[4975563]"
10000000 loops, best of 3: 0.0417 usec per loop

#7


1  

In addition to the other excellent solutions, another way is to use a dict instead of an array (elements which exist are non-zero, otherwise they're zero). Lookup time is O(1).

除了其他优秀的解决方案之外,另一种方法是使用dict类型而不是数组(存在的元素是非零的,否则它们是零)。查找时间是O(1)。

You might also check if your application is resident in RAM, rather than swapping out. It's only 381 MB, but the system may not be giving you it all for whatever reason.

您还可以检查应用程序是否驻留在RAM中,而不是进行交换。它只有381 MB,但是系统可能不会因为任何原因给你全部。

However there are also some really fast sparse matrices (SciPy and ndsparse). They are done in low-level C, and might also be good.

但是也有一些非常快速的稀疏矩阵(SciPy和nd稀)。它们是在低级别C中完成的,也可能是好的。

#8


0  

I would simply create your own data type that doesn't initialize ANY values.

我只需要创建不初始化任何值的数据类型。

If you want to read an index position that has NOT been initialized, you return zeroes. Still, do not initialize any storage.

如果希望读取一个尚未初始化的索引位置,则返回0。但是,不要初始化任何存储。

If you want to read an index position that HAS been initialized, simply return the value.

如果希望读取已初始化的索引位置,只需返回值。

If you want to write to an index position that has NOT been initialized, initialize it, and store the input.

如果您想要写入一个尚未初始化的索引位置,请初始化它,并存储输入。

#9


0  

NumPy is the appropriate tool for a large, fixed-size, homogeneous array. Accessing individual elements of anything in Python isn't going to be all that fast, though whole-array operations can often be conducted at speeds similar to C or Fortran. If you need to do operations on millions and millions of elements individually quickly, there is only so much you can get out of Python.

NumPy是适用于大型、固定大小、同构数组的适当工具。访问Python中的任何元素都不会那么快,尽管整个数组操作通常可以以类似C或Fortran的速度进行。如果您需要快速地对数以百万计的元素执行操作,那么您可以从Python中得到的就只有这么多了。

What sort of algorithm are you implementing? How do you know that using sparse arrays is too slow if you haven't tried it? What do you mean by "efficient"? You want quick initialization? That is the bottleneck of your code?

你正在实现什么样的算法?如果没有尝试过,如何知道使用稀疏数组太慢?“高效”是什么意思?你想要快速初始化?这是代码的瓶颈吗?

#10


0  

If

如果

  • access speed of array.array is acceptable for your application
  • 数组的访问速度。数组对于应用程序是可接受的
  • compact storage is most important
  • 紧凑的存储是最重要的
  • you want to use standard modules (no NumPy dependency)
  • 您希望使用标准模块(没有NumPy依赖性)
  • you are on platforms that have /dev/zero
  • 你在拥有/dev/ 0的平台上

the following may be of interest to you. It initialises array.array about 27 times faster than array.array('L', [0]*size):

你可能对以下内容感兴趣。它初始化数组。数组比数组快27倍。数组(“L”,[0]*大小):

myarray = array.array('L')
f = open('/dev/zero', 'rb')
myarray.fromfile(f, size)
f.close()

On How to initialise an integer array.array object with zeros in Python I'm looking for an even better way.

关于如何初始化一个整数数组。Python中带有0的数组对象,我正在寻找一种更好的方法。

#1


29  

I have done some profiling, and the results are completely counterintuitive. For simple array access operations, numpy and array.array are 10x slower than native Python arrays.

我做了一些分析,结果完全违反直觉。对于简单的数组访问操作,numpy和数组。数组比本地Python数组慢10倍。

Note that for array access, I am doing operations of the form:

注意,对于数组访问,我正在执行表单的操作:

a[i] += 1

Profiles:

配置文件:

  • [0] * 20000000

    [0]* 20000000

    • Access: 2.3M / sec
    • 访问:2.3米/秒
    • Initialization: 0.8s
    • 初始化:0.8秒
  • numpy.zeros(shape=(20000000,), dtype=numpy.int32)

    numpy.zeros(形状=(20000000),dtype = numpy.int32)

    • Access: 160K/sec
    • 访问:160 k /秒
    • Initialization: 0.2s
    • 初始化:0.2秒
  • array.array('L', [0] * 20000000)

    数组中。数组(“L”,[0]* 20000000)

    • Access: 175K/sec
    • 访问:175 k /秒
    • Initialization: 2.0s
    • 初始化:2.0秒
  • array.array('L', (0 for i in range(20000000)))

    数组中。数组('L',范围内i为0 (20000000))

    • Access: 175K/sec, presumably, based upon the profile for the other array.array
    • 访问:175K/秒,大概是基于另一个array.array的概要文件
    • Initialization: 6.7s
    • 初始化:6.7秒

#2


13  

Just a reminder how Python's integers work: if you allocate a list by saying

只要提醒您Python的整数是如何工作的:如果您通过以下方式分配列表

a = [0] * K

you need the memory for the list (sizeof(PyListObject) + K * sizeof(PyObject*)) and the memory for the single integer object 0. As long as the numbers in the list stay below the magic number V that Python uses for caching, you are fine because those are shared, i.e. any name that points to a number n < V points to the exact same object. You can find this value by using the following snippet:

您需要列表(sizeof(PyListObject) + K * sizeof(PyObject*)的内存,以及单个整数对象0的内存。只要列表中的数字保持在Python用于缓存的神奇数字V以下,就没问题,因为它们是共享的,也就是说,任何指向数字n < V的名称都指向同一个对象。您可以使用以下代码片段找到这个值:

>>> i = 0
>>> j = 0
>>> while i is j:
...    i += 1
...    j += 1
>>> i # on my system!
257 

This means that as soon as the counts go above this number, the memory you need is sizeof(PyListObject) + K * sizeof(PyObject*) + d * sizeof(PyIntObject), where d < K is the number of integers above V (== 256). On a 64 bit system, sizeof(PyIntObject) == 24 and sizeof(PyObject*) == 8, i.e. the worst case memory consumption is 3,200,000,000 bytes.

这意味着一旦计数超过这个数字,您需要的内存就是sizeof(PyListObject) + K * sizeof(PyObject*) + d * sizeof(PyIntObject),其中d < K是V(= 256)上面的整数的数目。在64位系统上,sizeof(PyIntObject) = 24, sizeof(PyObject*) = 8,即最坏情况下内存消耗为3200,000,000字节。

With numpy.ndarray or array.array, memory consumption is constant after initialization, but you pay for the wrapper objects that are created transparently, as Thomas Wouters said. Probably, you should think about converting the update code (which accesses and increases the positions in the array) to C code, either by using Cython or scipy.weave.

numpy。ndarray或数组。数组,初始化后内存消耗是常量,但是您需要为透明创建的包装器对象付费,正如Thomas Wouters所说。可能,您应该考虑将更新代码(它访问和增加数组中的位置)转换为C代码,或者使用Cython或scipy.weave。

#3


7  

Try this:

试试这个:

x = [0] * 100000000

It takes just a few seconds to execute on my machine, and access is close to instant.

在我的机器上执行只需要几秒钟,访问几乎是即时的。

#4


5  

If you are are not able to vectorize your calculuations, Python/Numpy will be slow. Numpy is fast because vectorized calculations occur at a lower level than Python. The core numpy functions are all written in C or Fortran. Hence sum(a) is not a python loop with many accesses, it's a single low level C call.

如果您无法向量化您的计算,那么Python/Numpy将会很慢。Numpy是快速的,因为矢量计算发生在比Python低的级别上。核心的numpy函数都是用C或Fortran编写的。因此sum(a)不是一个具有许多访问权限的python循环,而是一个低级别C调用。

Numpy's Performance Python demo page has a good example with different options. You can easily get 100x increase by using a lower level compiled language, Cython, or using vectorized functions if feasible. This blog post that shows a 43 fold increase using Cython for a numpy usecase.

Numpy的Performance Python演示页面有一个具有不同选项的很好的示例。如果可行,您可以使用较低级别编译语言、Cython或使用矢量化函数轻松获得100x的增加。这篇博客文章显示使用Cython可以使麻木的usecase增加43倍。

#5


3  

It's unlikely you'll find anything faster than numpy's array. The implementation of the array itself is as efficient as it would be in, say, C (and basically the same as array.array, just with more usefulness.)

你不可能找到比numpy的数组更快的东西。数组本身的实现与C(基本上与数组相同)中的实现一样高效。数组,更有用)

If you want to speed up your code, you'll have to do it by doing just that. Even though the array is implemented efficiently, accessing it from Python code has certain overhead; for example, indexing the array produces integer objects, which have to be created on the fly. numpy offers a number of operations implemented efficiently in C, but without seeing the actual code that isn't performing as well as you want it's hard to make any specific suggestions.

如果你想要加快你的代码,你必须这么做。即使数组被有效地实现,从Python代码访问它也有一定的开销;例如,对数组进行索引会产生整数对象,这些对象必须动态创建。numpy提供了许多在C中有效实现的操作,但是如果没有看到实际的代码,那么就很难给出任何具体的建议。

#6


3  

For fast creation, use the array module.

要快速创建,请使用数组模块。

Using the array module is ~5 times faster for creation, but about twice as slow for accessing elements compared to a normal list:

使用数组模块进行创建的速度要快5倍,但访问元素的速度要比普通列表慢两倍:

# Create array
python -m timeit -s "from array import array" "a = array('I', '\x00'
 * 100000000)"
10 loops, best of 3: 204 msec per loop

# Access array
python -m timeit -s "from array import array; a = array('I', '\x00'
* 100000000)" "a[4975563]"
10000000 loops, best of 3: 0.0902 usec per loop

# Create list
python -m timeit "a = [0] * 100000000"
10 loops, best of 3: 949 msec per loop

# Access list
python -m timeit  -s "a = [0] * 100000000" "a[4975563]"
10000000 loops, best of 3: 0.0417 usec per loop

#7


1  

In addition to the other excellent solutions, another way is to use a dict instead of an array (elements which exist are non-zero, otherwise they're zero). Lookup time is O(1).

除了其他优秀的解决方案之外,另一种方法是使用dict类型而不是数组(存在的元素是非零的,否则它们是零)。查找时间是O(1)。

You might also check if your application is resident in RAM, rather than swapping out. It's only 381 MB, but the system may not be giving you it all for whatever reason.

您还可以检查应用程序是否驻留在RAM中,而不是进行交换。它只有381 MB,但是系统可能不会因为任何原因给你全部。

However there are also some really fast sparse matrices (SciPy and ndsparse). They are done in low-level C, and might also be good.

但是也有一些非常快速的稀疏矩阵(SciPy和nd稀)。它们是在低级别C中完成的,也可能是好的。

#8


0  

I would simply create your own data type that doesn't initialize ANY values.

我只需要创建不初始化任何值的数据类型。

If you want to read an index position that has NOT been initialized, you return zeroes. Still, do not initialize any storage.

如果希望读取一个尚未初始化的索引位置,则返回0。但是,不要初始化任何存储。

If you want to read an index position that HAS been initialized, simply return the value.

如果希望读取已初始化的索引位置,只需返回值。

If you want to write to an index position that has NOT been initialized, initialize it, and store the input.

如果您想要写入一个尚未初始化的索引位置,请初始化它,并存储输入。

#9


0  

NumPy is the appropriate tool for a large, fixed-size, homogeneous array. Accessing individual elements of anything in Python isn't going to be all that fast, though whole-array operations can often be conducted at speeds similar to C or Fortran. If you need to do operations on millions and millions of elements individually quickly, there is only so much you can get out of Python.

NumPy是适用于大型、固定大小、同构数组的适当工具。访问Python中的任何元素都不会那么快,尽管整个数组操作通常可以以类似C或Fortran的速度进行。如果您需要快速地对数以百万计的元素执行操作,那么您可以从Python中得到的就只有这么多了。

What sort of algorithm are you implementing? How do you know that using sparse arrays is too slow if you haven't tried it? What do you mean by "efficient"? You want quick initialization? That is the bottleneck of your code?

你正在实现什么样的算法?如果没有尝试过,如何知道使用稀疏数组太慢?“高效”是什么意思?你想要快速初始化?这是代码的瓶颈吗?

#10


0  

If

如果

  • access speed of array.array is acceptable for your application
  • 数组的访问速度。数组对于应用程序是可接受的
  • compact storage is most important
  • 紧凑的存储是最重要的
  • you want to use standard modules (no NumPy dependency)
  • 您希望使用标准模块(没有NumPy依赖性)
  • you are on platforms that have /dev/zero
  • 你在拥有/dev/ 0的平台上

the following may be of interest to you. It initialises array.array about 27 times faster than array.array('L', [0]*size):

你可能对以下内容感兴趣。它初始化数组。数组比数组快27倍。数组(“L”,[0]*大小):

myarray = array.array('L')
f = open('/dev/zero', 'rb')
myarray.fromfile(f, size)
f.close()

On How to initialise an integer array.array object with zeros in Python I'm looking for an even better way.

关于如何初始化一个整数数组。Python中带有0的数组对象,我正在寻找一种更好的方法。