如何在Numpy中就地扩展数组?

时间:2021-09-30 15:59:57

Currently, I have some code like this

目前,我有一些像这样的代码

import numpy as np
ret = np.array([])
for i in range(100000):
  tmp =  get_input(i)
  ret = np.append(ret, np.zeros(len(tmp)))
  ret = np.append(ret, np.ones(fixed_length))

I think this code is not efficient as np.append needs to return a copy of the array instead of modify the ret in-place

我认为这段代码效率不高,因为np.append需要返回数组的副本而不是修改ret就地

I was wondering whether I can use the extend for a numpy array like this:

我想知道我是否可以使用这个numpy数组的扩展:

import numpy as np
from somewhere import np_extend
ret = np.array([])
for i in range(100000):
  tmp =  get_input(i)
  np_extend(ret, np.zeros(len(tmp)))
  np_extend(ret, np.ones(fixed_length))

So that the extend would be much more efficient. Does anyone have ideas about this? Thanks!

这样扩展效率会更高。有没有人有这个想法?谢谢!

3 个解决方案

#1


24  

Imagine a numpy array as occupying one contiguous block of memory. Now imagine other objects, say other numpy arrays, which are occupying the memory just to the left and right of our numpy array. There would be no room to append to or extend our numpy array. The underlying data in a numpy array always occupies a contiguous block of memory.

想象一个numpy数组占用一个连续的内存块。现在想象一下其他对象,比如其他numpy数组,它们占据了我们numpy数组左右两侧的内存。没有空间可以附加或扩展我们的numpy数组。 numpy数组中的基础数据总是占用连续的内存块。

So any request to append to or extend our numpy array can only be satisfied by allocating a whole new larger block of memory, copying the old data into the new block and then appending or extending.

因此,任何追加或扩展我们的numpy数组的请求只能通过分配一个全新的更大的内存块,将旧数据复制到新块然后追加或扩展来满足。

So:

所以:

  1. It will not occur in-place.
  2. 它不会就地发生。
  3. It will not be efficient.
  4. 它效率不高。

#2


10  

You can use the .resize() method of ndarrays. It requires that the memory is not referred to by other arrays/variables.

您可以使用ndarrays的.resize()方法。它要求内存不被其他数组/变量引用。

import numpy as np
ret = np.array([])
for i in range(100):
    tmp = np.random.rand(np.random.randint(1, 100))
    ret.resize(len(ret) + len(tmp)) # <- ret is not referred to by anything else,
                                    #    so this works
    ret[-len(tmp):] = tmp

The efficiency can be improved by using the usual array memory overrallocation schemes.

通过使用通常的阵列存储器叠加方案可以提高效率。

#3


7  

The usual way to handle this is something like this:

处理此问题的常用方法是这样的:

import numpy as np
ret = []
for i in range(100000):
  tmp =  get_input(i)
  ret.append(np.zeros(len(tmp)))
  ret.append(np.zeros(fixed_length))
ret = np.concatenate(ret)

For reasons that other answers have gotten into, it is in general impossible to extend an array without copying the data.

由于其他答案已经进入的原因,通常不能在不复制数据的情况下扩展数组。

#1


24  

Imagine a numpy array as occupying one contiguous block of memory. Now imagine other objects, say other numpy arrays, which are occupying the memory just to the left and right of our numpy array. There would be no room to append to or extend our numpy array. The underlying data in a numpy array always occupies a contiguous block of memory.

想象一个numpy数组占用一个连续的内存块。现在想象一下其他对象,比如其他numpy数组,它们占据了我们numpy数组左右两侧的内存。没有空间可以附加或扩展我们的numpy数组。 numpy数组中的基础数据总是占用连续的内存块。

So any request to append to or extend our numpy array can only be satisfied by allocating a whole new larger block of memory, copying the old data into the new block and then appending or extending.

因此,任何追加或扩展我们的numpy数组的请求只能通过分配一个全新的更大的内存块,将旧数据复制到新块然后追加或扩展来满足。

So:

所以:

  1. It will not occur in-place.
  2. 它不会就地发生。
  3. It will not be efficient.
  4. 它效率不高。

#2


10  

You can use the .resize() method of ndarrays. It requires that the memory is not referred to by other arrays/variables.

您可以使用ndarrays的.resize()方法。它要求内存不被其他数组/变量引用。

import numpy as np
ret = np.array([])
for i in range(100):
    tmp = np.random.rand(np.random.randint(1, 100))
    ret.resize(len(ret) + len(tmp)) # <- ret is not referred to by anything else,
                                    #    so this works
    ret[-len(tmp):] = tmp

The efficiency can be improved by using the usual array memory overrallocation schemes.

通过使用通常的阵列存储器叠加方案可以提高效率。

#3


7  

The usual way to handle this is something like this:

处理此问题的常用方法是这样的:

import numpy as np
ret = []
for i in range(100000):
  tmp =  get_input(i)
  ret.append(np.zeros(len(tmp)))
  ret.append(np.zeros(fixed_length))
ret = np.concatenate(ret)

For reasons that other answers have gotten into, it is in general impossible to extend an array without copying the data.

由于其他答案已经进入的原因,通常不能在不复制数据的情况下扩展数组。