内存中numpy strided阵列/广播阵列的大小?

时间:2022-10-04 21:21:19

I'm trying to create efficient broadcast arrays in numpy, e.g. a set of shape=[1000,1000,1000] arrays that have only 1000 elements, but repeated 1e6 times. This can be achieved both through np.lib.stride_tricks.as_strided and np.broadcast_arrays.

我尝试在numpy中创建高效的广播数组,例如一组形状=[1000,1000,1000]数组,它们只有1000个元素,但是重复了1e6次。这可以通过np.lib.stride_tricks实现。as_strided np.broadcast_arrays。

However, I am having trouble verifying that there is no duplication in memory, and this is critical since tests that actually duplicate the arrays in memory tend to crash my machine leaving no traceback.


I've tried examining the size of the arrays using .nbytes, but that doesn't seem to correspond to the actual memory usage:


>>> import numpy as np
>>> import resource
>>> initial_memuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> pagesize = resource.getpagesize()
>>> x = np.arange(1000)
>>> memuse_x = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of x = {0} MB".format(x.nbytes/1e6))
Size of x = 0.008 MB
>>> print("Memory used = {0} MB".format((memuse_x-initial_memuse)*resource.getpagesize()/1e6))
Memory used = 150.994944 MB
>>> y = np.lib.stride_tricks.as_strided(x, [1000,10,10], strides=x.strides + (0, 0))
>>> memuse_y = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of y = {0} MB".format(y.nbytes/1e6))
Size of y = 0.8 MB
>>> print("Memory used = {0} MB".format((memuse_y-memuse_x)*resource.getpagesize()/1e6))
Memory used = 201.326592 MB
>>> z = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
>>> memuse_z = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of z = {0} MB".format(z.nbytes/1e6))
Size of z = 80.0 MB
>>> print("Memory used = {0} MB".format((memuse_z-memuse_y)*resource.getpagesize()/1e6))
Memory used = 0.0 MB

So .nbytes reports the "theoretical" size of the array, but apparently not the actual size. The resource checking is a little awkward, as it looks like there are some things being loaded & cached (perhaps?) that result in the first striding taking up some amount of memory, but future strides take none.


tl;dr: How do you determine the actual size of a numpy array or array view in memory?


1 个解决方案



One way would be to examine the .base attribute of the array, which references the object from which an array "borrows" its memory. For example:


x = np.arange(1000)
print(x.flags.owndata)      # x "owns" its data
# True
print(x.base is None)       # its base is therefore 'None'
# True

a = x.reshape(100, 10)      # a is a reshaped view onto x
print(a.flags.owndata)      # it therefore "borrows" its data
# False
print(a.base is x)          # its .base is x
# True

Things are slightly more complicated with np.lib.stride_tricks:


b = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))

# False
# <numpy.lib.stride_tricks.DummyArray object at 0x7fb40c02b0f0>

Here, b.base is a numpy.lib.stride_tricks.DummyArray instance, which looks like this:


class DummyArray(object):
    """Dummy object that just exists to hang __array_interface__ dictionaries
    and possibly keep alive a reference to a base array.

    def __init__(self, interface, base=None):
        self.__array_interface__ = interface
        self.base = base

We can therefore examine b.base.base:


print(b.base.base is x)
# True

Once you have the base array then its .nbytes attribute should accurately reflect the amount of memory it occupies.


In principle it's possible to have a view of a view of an array, or to create a strided array from another strided array. Assuming that your view or strided array is ultimately backed by another numpy array, you could recursively reference its .base attribute. Once you find an object whose .base is None, you have found the underlying object from which your array is borrowing its memory:


def find_base_nbytes(obj):
    if obj.base is not None:
        return find_base_nbytes(obj.base)
    return obj.nbytes

As expected,


# 8000

# 8000

# 8000



