内存中numpy strided阵列/广播阵列的大小?

I'm trying to create efficient broadcast arrays in numpy, e.g. a set of shape=[1000,1000,1000] arrays that have only 1000 elements, but repeated 1e6 times. This can be achieved both through np.lib.stride_tricks.as_strided and np.broadcast_arrays.

我尝试在numpy中创建高效的广播数组，例如一组形状=[1000,1000,1000]数组，它们只有1000个元素，但是重复了1e6次。这可以通过np.lib.stride_tricks实现。as_strided np.broadcast_arrays。

However, I am having trouble verifying that there is no duplication in memory, and this is critical since tests that actually duplicate the arrays in memory tend to crash my machine leaving no traceback.

但是，我很难验证内存中没有重复，这是非常重要的，因为实际上在内存中复制阵列的测试会使我的机器崩溃，不会留下任何回溯。

I've tried examining the size of the arrays using .nbytes, but that doesn't seem to correspond to the actual memory usage:

我尝试过使用.nbytes检查数组的大小，但这似乎与实际内存使用情况不符:

>>> import numpy as np
>>> import resource
>>> initial_memuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> pagesize = resource.getpagesize()
>>>
>>> x = np.arange(1000)
>>> memuse_x = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of x = {0} MB".format(x.nbytes/1e6))
Size of x = 0.008 MB
>>> print("Memory used = {0} MB".format((memuse_x-initial_memuse)*resource.getpagesize()/1e6))
Memory used = 150.994944 MB
>>>
>>> y = np.lib.stride_tricks.as_strided(x, [1000,10,10], strides=x.strides + (0, 0))
>>> memuse_y = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of y = {0} MB".format(y.nbytes/1e6))
Size of y = 0.8 MB
>>> print("Memory used = {0} MB".format((memuse_y-memuse_x)*resource.getpagesize()/1e6))
Memory used = 201.326592 MB
>>>
>>> z = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
>>> memuse_z = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of z = {0} MB".format(z.nbytes/1e6))
Size of z = 80.0 MB
>>> print("Memory used = {0} MB".format((memuse_z-memuse_y)*resource.getpagesize()/1e6))
Memory used = 0.0 MB

So .nbytes reports the "theoretical" size of the array, but apparently not the actual size. The resource checking is a little awkward, as it looks like there are some things being loaded & cached (perhaps?) that result in the first striding taking up some amount of memory, but future strides take none.

所以.nbytes报告了数组的“理论”大小，但显然不是实际大小。资源检查有点笨拙，因为看起来有一些东西正在加载和缓存(也许?)，这导致第一次跨行占用了一定数量的内存，但是未来的步伐却没有。

tl;dr: How do you determine the actual size of a numpy array or array view in memory?

您如何确定内存中的numpy数组或数组视图的实际大小?

1 个解决方案

#1

One way would be to examine the .base attribute of the array, which references the object from which an array "borrows" its memory. For example:

一种方法是检查数组的.base属性，该属性引用数组“借用”其内存的对象。例如:

x = np.arange(1000)
print(x.flags.owndata)      # x "owns" its data
# True
print(x.base is None)       # its base is therefore 'None'
# True

a = x.reshape(100, 10)      # a is a reshaped view onto x
print(a.flags.owndata)      # it therefore "borrows" its data
# False
print(a.base is x)          # its .base is x
# True

Things are slightly more complicated with np.lib.stride_tricks:

有些事情稍微复杂一些。

b = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))

print(b.flags.owndata)
# False
print(b.base)   
# <numpy.lib.stride_tricks.DummyArray object at 0x7fb40c02b0f0>

Here, b.base is a numpy.lib.stride_tricks.DummyArray instance, which looks like this:

在这里,b。基地是一个numpy.lib.stride_tricks。DummyArray实例，它是这样的:

class DummyArray(object):
    """Dummy object that just exists to hang __array_interface__ dictionaries
    and possibly keep alive a reference to a base array.
    """

    def __init__(self, interface, base=None):
        self.__array_interface__ = interface
        self.base = base

We can therefore examine b.base.base:

因此，我们可以检查b.base:

print(b.base.base is x)
# True

Once you have the base array then its .nbytes attribute should accurately reflect the amount of memory it occupies.

一旦您拥有了基本数组，那么它的.nbytes属性应该能够准确地反映它所占用的内存数量。

In principle it's possible to have a view of a view of an array, or to create a strided array from another strided array. Assuming that your view or strided array is ultimately backed by another numpy array, you could recursively reference its .base attribute. Once you find an object whose .base is None, you have found the underlying object from which your array is borrowing its memory:

原则上，可以看到一个数组的视图，或者从另一个strided数组创建一个strided数组。假设您的视图或strided数组最终由另一个numpy数组支持，您可以递归地引用它的.base属性。一旦你找到了一个对象的。base是None，你就会发现你的数组从它的内存中获取的对象:

def find_base_nbytes(obj):
    if obj.base is not None:
        return find_base_nbytes(obj.base)
    return obj.nbytes

As expected,

正如所料,

print(find_base_nbytes(x))
# 8000

print(find_base_nbytes(y))
# 8000

print(find_base_nbytes(z))
# 8000

#1