为什么元组在内存中占用的空间少于列表?

时间:2022-02-11 16:05:40

A tuple takes less memory space in Python:

元组在Python中占用更少的内存空间:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

whereas lists takes more memory space:

而列表需要更多的内存空间:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

What happens internally on the Python memory management?

Python内存管理内部会发生什么?

4 个解决方案

#1


121  

I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

我假设您正在使用CPython和64位(我在CPython 2.7 64位上获得了相同的结果)。在其他Python实现中可能存在差异,或者如果您有32位Python。

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

无论实现如何,列表都是可变大小的,而元组是固定大小的。

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.

因此元组可以直接在结构中存储元素,另一方面,列表需要一个间接层(它存储指向元素的指针)。这个间接层是一个指针,在64位系统上是64位,因此是8字节。

But there's another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always - to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.

但列表还有另一件事:它们过度分配。否则list.append将始终是O(n)操作 - 使其分摊O(1)(更快!!!)它过度分配。但现在它必须跟踪分配的大小和填充的大小(元组只需要存储一个大小,因为分配和填充的大小总是相同)。这意味着每个列表必须存储另一个“大小”,它在64位系统上是64位整数,再为8个字节。

So lists need at least 16 bytes more memory than tuples. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:

因此列表需要至少比元组多16个字节的内存。为什么我说“至少”?因为过度分配。过度分配意味着它分配的空间超出了需要。但是,过度分配的数量取决于您创建列表的“方式”以及追加/删除历史记录:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

我决定创建一些图像以配合上面的解释。也许这些都很有帮助

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

这就是它(示意图)在你的例子中存储在内存中的方式。我强调了红色(*手)循环的差异:

为什么元组在内存中占用的空间少于列表?

That's actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

这实际上只是一个近似值,因为int对象也是Python对象,CPython甚至可以重用小整数,因此内存中对象的一个​​可能更准确的表示(虽然不是可读)将是:

为什么元组在内存中占用的空间少于列表?

Useful links:

有用的链接:

Note that __sizeof__ doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

请注意__sizeof__并没有真正返回“正确”的大小!它只返回存储值的大小。但是,当您使用sys.getsizeof时,结果会有所不同:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__ method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).

有24个“额外”字节。这些是真实的,这是__sizeof__方法中没有考虑的垃圾收集器开销。那是因为你通常不应该直接使用魔术方法 - 使用知道如何处理它们的函数,在这种情况下:sys.getsizeof(实际上将GC开销添加到__sizeof__返回的值)。

#2


30  

I'll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won't touch on that.

我将深入研究CPython代码库,以便我们可以看到实际计算的大小。在您的具体示例中,没有执行过度分配,因此我不会涉及到这一点。

I'm going to use 64-bit values here, as you are.

我会像你一样在这里使用64位值。


The size for lists is calculated from the following function, list_sizeof:

列表的大小由以下函数list_sizeof计算:

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.

这里Py_TYPE(self)是一个抓取self的ob_type(返回PyList_Type)的宏,而_PyObject_SIZE是另一个从该类型抓取tp_basicsize的宏。 tp_basicsize计算为sizeof(PyListObject),其中PyListObject是实例结构。

The PyListObject structure has three fields:

PyListObject结构有三个字段:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.

这些评论(我修剪)解释它们是什么,按照上面的链接阅读它们。 PyObject_VAR_HEAD扩展为三个8字节字段(ob_refcount,ob_type和ob_size),因此是24字节的贡献。

So for now res is:

所以现在res是:

sizeof(PyListObject) + self->allocated * sizeof(void*)

or:

要么:

40 + self->allocated * sizeof(void*)

If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it's name implies, holds the number of allocated elements.

如果列表实例具有已分配的元素。第二部分计算他们的贡献。 self-> assigned,顾名思义,保存已分配元素的数量。

Without any elements, the size of lists is calculated to be:

没有任何元素,列表的大小计算为:

>>> [].__sizeof__()
40

i.e the size of the instance struct.

即实例结构的大小。


tuple objects don't define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:

元组对象不定义tuple_sizeof函数。相反,他们使用object_sizeof来计算它们的大小:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.

对于列表,这将获取tp_basicsize,如果对象具有非零tp_itemsize(意味着它具有可变长度实例),则它将元组中的项目数(通过Py_SIZE获取)与tp_itemsize相乘。

tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:

tp_basicsize再次使用sizeof(PyTupleObject),其中PyTupleObject结构包含:

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):

因此,没有任何元素(即Py_SIZE返回0),空元组的大小等于sizeof(PyTupleObject):

>>> ().__sizeof__()
24

huh? Well, here's an oddity which I haven't found an explanation for, the tp_basicsize of tuples is actually calculated as follows:

是吧?好吧,这是一个奇怪的我没有找到解释,元组的tp_basicsize实际计算如下:

sizeof(PyTupleObject) - sizeof(PyObject *)

why an additional 8 bytes is removed from tp_basicsize is something I haven't been able to find out. (See MSeifert's comment for a possible explanation)

为什么从tp_basicsize中删除额外的8个字节是我无法找到的。 (有关可能的解释,请参阅MSeifert的评论)


But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.

但是,这基本上是您具体示例的不同之处。列表还保留了许多已分配的元素,这有助于确定何时再次进行过度分配。

Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert's covers nicely in his answer.

现在,当添加其他元素时,列表确实会执行此过度分配以实现O(1)追加。这导致更大的尺寸,因为MSeifert在他的回答中很好地涵盖了。

#3


28  

MSeifert answer covers it broadly; to keep it simple you can think of:

MSeifert答案涵盖范围广泛;保持简单你可以想到:

tuple is immutable. Once it set, you can't change it. So you know in advance how much memory you need to allocate for that object.

元组是不可变的。设置后,您无法更改它。因此,您事先知道需要为该对象分配多少内存。

list is mutable. You can add or remove items to or from it. It has to know the size of it (for internal impl.). It resizes as needed.

列表是可变的。您可以在其中添加或删除项目。它必须知道它的大小(对于内部impl。)。它根据需要调整大小。

There are no free meals - these capabilities comes with a cost. Hence the overhead in memory for lists.

没有免费餐 - 这些功能需要付出代价。因此列表的内存开销。

#4


3  

The size of the tuple is prefixed, meaning at tuple initialization the interpreter allocate enough space for the contained data, and that's the end of it, giving it's immutable (can't be modified), whereas a list is a mutable object hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list ( allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future append, modifications, ... that pretty much sums it up.

元组的大小是前缀的,这意味着在元组初始化时,解释器为包含的数据分配足够的空间,这就是它的结束,赋予它不可变(不能被修改),而列表是一个可变对象因此暗示动态分配内存,所以为了避免每次附加或修改列表时分配空间(分配足够的空间来包含已更改的数据并将数据复制到它),它会为将来的追加,修改分配额外的空间......总结。

#1


121  

I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

我假设您正在使用CPython和64位(我在CPython 2.7 64位上获得了相同的结果)。在其他Python实现中可能存在差异,或者如果您有32位Python。

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

无论实现如何,列表都是可变大小的,而元组是固定大小的。

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.

因此元组可以直接在结构中存储元素,另一方面,列表需要一个间接层(它存储指向元素的指针)。这个间接层是一个指针,在64位系统上是64位,因此是8字节。

But there's another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always - to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.

但列表还有另一件事:它们过度分配。否则list.append将始终是O(n)操作 - 使其分摊O(1)(更快!!!)它过度分配。但现在它必须跟踪分配的大小和填充的大小(元组只需要存储一个大小,因为分配和填充的大小总是相同)。这意味着每个列表必须存储另一个“大小”,它在64位系统上是64位整数,再为8个字节。

So lists need at least 16 bytes more memory than tuples. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:

因此列表需要至少比元组多16个字节的内存。为什么我说“至少”?因为过度分配。过度分配意味着它分配的空间超出了需要。但是,过度分配的数量取决于您创建列表的“方式”以及追加/删除历史记录:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

我决定创建一些图像以配合上面的解释。也许这些都很有帮助

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

这就是它(示意图)在你的例子中存储在内存中的方式。我强调了红色(*手)循环的差异:

为什么元组在内存中占用的空间少于列表?

That's actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

这实际上只是一个近似值,因为int对象也是Python对象,CPython甚至可以重用小整数,因此内存中对象的一个​​可能更准确的表示(虽然不是可读)将是:

为什么元组在内存中占用的空间少于列表?

Useful links:

有用的链接:

Note that __sizeof__ doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

请注意__sizeof__并没有真正返回“正确”的大小!它只返回存储值的大小。但是,当您使用sys.getsizeof时,结果会有所不同:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__ method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).

有24个“额外”字节。这些是真实的,这是__sizeof__方法中没有考虑的垃圾收集器开销。那是因为你通常不应该直接使用魔术方法 - 使用知道如何处理它们的函数,在这种情况下:sys.getsizeof(实际上将GC开销添加到__sizeof__返回的值)。

#2


30  

I'll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won't touch on that.

我将深入研究CPython代码库,以便我们可以看到实际计算的大小。在您的具体示例中,没有执行过度分配,因此我不会涉及到这一点。

I'm going to use 64-bit values here, as you are.

我会像你一样在这里使用64位值。


The size for lists is calculated from the following function, list_sizeof:

列表的大小由以下函数list_sizeof计算:

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.

这里Py_TYPE(self)是一个抓取self的ob_type(返回PyList_Type)的宏,而_PyObject_SIZE是另一个从该类型抓取tp_basicsize的宏。 tp_basicsize计算为sizeof(PyListObject),其中PyListObject是实例结构。

The PyListObject structure has three fields:

PyListObject结构有三个字段:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.

这些评论(我修剪)解释它们是什么,按照上面的链接阅读它们。 PyObject_VAR_HEAD扩展为三个8字节字段(ob_refcount,ob_type和ob_size),因此是24字节的贡献。

So for now res is:

所以现在res是:

sizeof(PyListObject) + self->allocated * sizeof(void*)

or:

要么:

40 + self->allocated * sizeof(void*)

If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it's name implies, holds the number of allocated elements.

如果列表实例具有已分配的元素。第二部分计算他们的贡献。 self-> assigned,顾名思义,保存已分配元素的数量。

Without any elements, the size of lists is calculated to be:

没有任何元素,列表的大小计算为:

>>> [].__sizeof__()
40

i.e the size of the instance struct.

即实例结构的大小。


tuple objects don't define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:

元组对象不定义tuple_sizeof函数。相反,他们使用object_sizeof来计算它们的大小:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.

对于列表,这将获取tp_basicsize,如果对象具有非零tp_itemsize(意味着它具有可变长度实例),则它将元组中的项目数(通过Py_SIZE获取)与tp_itemsize相乘。

tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:

tp_basicsize再次使用sizeof(PyTupleObject),其中PyTupleObject结构包含:

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):

因此,没有任何元素(即Py_SIZE返回0),空元组的大小等于sizeof(PyTupleObject):

>>> ().__sizeof__()
24

huh? Well, here's an oddity which I haven't found an explanation for, the tp_basicsize of tuples is actually calculated as follows:

是吧?好吧,这是一个奇怪的我没有找到解释,元组的tp_basicsize实际计算如下:

sizeof(PyTupleObject) - sizeof(PyObject *)

why an additional 8 bytes is removed from tp_basicsize is something I haven't been able to find out. (See MSeifert's comment for a possible explanation)

为什么从tp_basicsize中删除额外的8个字节是我无法找到的。 (有关可能的解释,请参阅MSeifert的评论)


But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.

但是,这基本上是您具体示例的不同之处。列表还保留了许多已分配的元素,这有助于确定何时再次进行过度分配。

Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert's covers nicely in his answer.

现在,当添加其他元素时,列表确实会执行此过度分配以实现O(1)追加。这导致更大的尺寸,因为MSeifert在他的回答中很好地涵盖了。

#3


28  

MSeifert answer covers it broadly; to keep it simple you can think of:

MSeifert答案涵盖范围广泛;保持简单你可以想到:

tuple is immutable. Once it set, you can't change it. So you know in advance how much memory you need to allocate for that object.

元组是不可变的。设置后,您无法更改它。因此,您事先知道需要为该对象分配多少内存。

list is mutable. You can add or remove items to or from it. It has to know the size of it (for internal impl.). It resizes as needed.

列表是可变的。您可以在其中添加或删除项目。它必须知道它的大小(对于内部impl。)。它根据需要调整大小。

There are no free meals - these capabilities comes with a cost. Hence the overhead in memory for lists.

没有免费餐 - 这些功能需要付出代价。因此列表的内存开销。

#4


3  

The size of the tuple is prefixed, meaning at tuple initialization the interpreter allocate enough space for the contained data, and that's the end of it, giving it's immutable (can't be modified), whereas a list is a mutable object hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list ( allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future append, modifications, ... that pretty much sums it up.

元组的大小是前缀的,这意味着在元组初始化时,解释器为包含的数据分配足够的空间,这就是它的结束,赋予它不可变(不能被修改),而列表是一个可变对象因此暗示动态分配内存,所以为了避免每次附加或修改列表时分配空间(分配足够的空间来包含已更改的数据并将数据复制到它),它会为将来的追加,修改分配额外的空间......总结。