在Python 3.6+中是否订购了字典?

时间:2021-11-10 18:09:53

Dictionaries are ordered in Python 3.6 (under the CPython implementation at least) unlike in previous incarnations. This seems like a substantial change, but it's only a short paragraph in the documentation. It is described as a CPython implementation detail rather than a language feature, but also implies this may become standard in the future.

在Python 3.6(至少在CPython实现下)中,字典是有序的,与之前的版本不同。这看起来是一个很大的变化,但它只是文档中的一小段。它被描述为CPython实现的细节,而不是语言特性,但也意味着这可能在未来成为标准。

How does the new dictionary implementation perform better than the older one while preserving element order?

在保持元素顺序的同时,新字典实现如何比旧字典执行得更好?

Here is the text from the documentation:

以下是文件文本:

dict() now uses a “compact” representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)

dict()现在使用PyPy首创的“紧凑”表示。与Python 3.5相比,新dict()的内存使用减少了20%到25%。PEP 468(在函数中保持**kwargs的顺序)就是这样实现的。这个新实现的保序方面被认为是一个实现细节,不应该依赖(这在未来可能会改变,但它需要这个新dict实现的语言改变了前几个版本的语言规范,要求所有当前和未来的Python实现保序语义;这也有助于在随机迭代顺序仍然有效的情况下保持与旧版本的向后兼容性,例如Python 3.5)。(INADA Naoki在第27350期提供的资料。这个想法最初是由雷蒙德·赫廷格提出的。

Update December 2017: dicts retaining insertion order is guaranteed for Python 3.7

2017年12月更新:Python 3.7保证命令保持插入顺序

3 个解决方案

#1


199  

Are dictionaries ordered in Python 3.6+?

在Python 3.6+中是否订购了字典?

They are insertion ordered[1]. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).

他们是插入命令[1]。在Python 3.6中,对于Python的CPython实现,字典记住插入条目的顺序。这在Python 3.6中被视为实现细节;如果希望在Python的其他实现(以及其他有序行为[1])中保证插入顺序,则需要使用OrderedDict。

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature. From a python-dev message by GvR:

在Python 3.7中,这不再是实现细节,而是成为语言特性。来自GvR的python-dev消息:

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

让它如此。“命令保持插入顺序”是规则。谢谢!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.

这仅仅意味着你可以依靠它。如果Python的其他实现希望成为Python 3.7的符合标准的实现,则还必须提供插入有序字典。


How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

Python 3.6字典实现在保持元素顺序的同时,如何比旧的实现更好地执行[2]?

Essentially, by keeping two arrays.

本质上,通过保持两个数组。

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

    第一个数组dk_entries按插入字典的顺序保存字典的条目(类型为PyDictKeyEntry)。保持顺序是通过只添加一个数组来实现的,在这个数组中,总是在末尾插入新项(插入顺序)。

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

    第二个是dk_indices,它保存dk_entries数组的索引(即表示dk_entries中对应条目位置的值)。这个数组充当哈希表。当一个键被散列时,它会导致一个存储在dk_index中的索引,并通过索引dk_entries获取相应的条目。由于只保留索引,所以这个数组的类型取决于字典的总体大小(从类型int8_t(1字节)到类型为32/64位的int32_t/int64_t(4/8字节)))

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

在以前的实现中,必须分配一个PyDictKeyEntry类型和大小为dk_size的稀疏数组;不幸的是,由于性能原因,该数组不允许超过2/3 * dk_size,因此也导致了大量的空空间。(这个空空间仍然有PyDictKeyEntry大小!)

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

现在不是这样了,因为只存储了必需的条目(已插入的条目)和一个类型为intX_t的稀疏数组(根据字典大小X) 2/3 * dk_size full。空白的空间从类型PyDictKeyEntry更改为intX_t。

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

因此,显然,创建一个PyDictKeyEntry类型的稀疏数组比存储ints的稀疏数组需要更多的内存。

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.

您可以在Python-Dev上看到关于这个特性的完整对话,如果感兴趣的话,它是一个很好的读物。


In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

在Raymond Hettinger最初的提议中,可以看到所使用的数据结构的可视化,从而抓住了这个想法的要点。

For example, the dictionary:

例如,字典:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

目前存储为:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

相反,数据应该组织如下:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.

正如您现在可以看到的,在最初的提议中,很多空间基本上是空的,以减少碰撞,使查找速度更快。使用这种新方法,您可以通过在索引中移动真正需要的稀疏性来减少所需的内存。


[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the dict object doesn't provide. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (==, !=). dicts currently don't offer any of those behaviors/methods.

[1]:我说的是“有序插入”而不是“有序”,因为有了“有序指令”,“有序”意味着“有序”对象不提供的进一步行为。OrderedDicts是可逆的,提供对订单敏感的方法,主要是提供一个订单感知平等测试(==,!=)。dicts目前不提供任何这些行为/方法。


[2]: The new dictionary implementations performs better memory wise by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.

[2]:新的字典实现通过更紧凑的设计实现更好的内存;这是主要的好处。就速度而言,差异并不大,在某些地方,新法令可能会引入轻微的回归(例如键查找),而在其他地方(迭代和调整大小),性能应该得到提升。

Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.

总的来说,词典的性能,尤其是在现实生活中,由于紧凑性的引入而提高。

#2


53  

Below is answering the original first question:

下面是回答最初的第一个问题:

Should I use dict or OrderedDict in Python 3.6?

我应该使用Python 3.6中的dict或OrderedDict吗?

I think this sentence from the documentation is actually enough to answer your question

我认为文档中的这句话实际上足以回答你的问题

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

这个新实现的订单保存方面被认为是一个实现细节,不应该被依赖。

dict is not explicitly meant to be an ordered collection, so if you want to stay consistent and not rely on a side effect of the new implementation you should stick with OrderedDict.

dict类型并不是显式地表示为有序集合,因此,如果您希望保持一致,而不依赖于新实现的副作用,那么应该坚持OrderedDict类型。

Make your code future proof :)

使您的代码未来证明:)

There's a debate about that here.

这里有一个关于这个的辩论。

EDIT: Python 3.7 will keep this as a feature see

编辑:Python 3.7将保留这一特性

#3


14  

Update: Guido van Rossum announced on the mailing list that as of Python 3.7 dicts in all Python implementations must preserve insertion order.

更新:Guido van Rossum在邮件列表中宣布,在Python 3.7中,所有Python实现中都必须保留插入顺序。

#1


199  

Are dictionaries ordered in Python 3.6+?

在Python 3.6+中是否订购了字典?

They are insertion ordered[1]. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).

他们是插入命令[1]。在Python 3.6中,对于Python的CPython实现,字典记住插入条目的顺序。这在Python 3.6中被视为实现细节;如果希望在Python的其他实现(以及其他有序行为[1])中保证插入顺序,则需要使用OrderedDict。

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature. From a python-dev message by GvR:

在Python 3.7中,这不再是实现细节,而是成为语言特性。来自GvR的python-dev消息:

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

让它如此。“命令保持插入顺序”是规则。谢谢!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.

这仅仅意味着你可以依靠它。如果Python的其他实现希望成为Python 3.7的符合标准的实现,则还必须提供插入有序字典。


How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

Python 3.6字典实现在保持元素顺序的同时,如何比旧的实现更好地执行[2]?

Essentially, by keeping two arrays.

本质上,通过保持两个数组。

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

    第一个数组dk_entries按插入字典的顺序保存字典的条目(类型为PyDictKeyEntry)。保持顺序是通过只添加一个数组来实现的,在这个数组中,总是在末尾插入新项(插入顺序)。

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

    第二个是dk_indices,它保存dk_entries数组的索引(即表示dk_entries中对应条目位置的值)。这个数组充当哈希表。当一个键被散列时,它会导致一个存储在dk_index中的索引,并通过索引dk_entries获取相应的条目。由于只保留索引,所以这个数组的类型取决于字典的总体大小(从类型int8_t(1字节)到类型为32/64位的int32_t/int64_t(4/8字节)))

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

在以前的实现中,必须分配一个PyDictKeyEntry类型和大小为dk_size的稀疏数组;不幸的是,由于性能原因,该数组不允许超过2/3 * dk_size,因此也导致了大量的空空间。(这个空空间仍然有PyDictKeyEntry大小!)

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

现在不是这样了,因为只存储了必需的条目(已插入的条目)和一个类型为intX_t的稀疏数组(根据字典大小X) 2/3 * dk_size full。空白的空间从类型PyDictKeyEntry更改为intX_t。

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

因此,显然,创建一个PyDictKeyEntry类型的稀疏数组比存储ints的稀疏数组需要更多的内存。

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.

您可以在Python-Dev上看到关于这个特性的完整对话,如果感兴趣的话,它是一个很好的读物。


In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

在Raymond Hettinger最初的提议中,可以看到所使用的数据结构的可视化,从而抓住了这个想法的要点。

For example, the dictionary:

例如,字典:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

目前存储为:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

相反,数据应该组织如下:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.

正如您现在可以看到的,在最初的提议中,很多空间基本上是空的,以减少碰撞,使查找速度更快。使用这种新方法,您可以通过在索引中移动真正需要的稀疏性来减少所需的内存。


[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the dict object doesn't provide. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (==, !=). dicts currently don't offer any of those behaviors/methods.

[1]:我说的是“有序插入”而不是“有序”,因为有了“有序指令”,“有序”意味着“有序”对象不提供的进一步行为。OrderedDicts是可逆的,提供对订单敏感的方法,主要是提供一个订单感知平等测试(==,!=)。dicts目前不提供任何这些行为/方法。


[2]: The new dictionary implementations performs better memory wise by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.

[2]:新的字典实现通过更紧凑的设计实现更好的内存;这是主要的好处。就速度而言,差异并不大,在某些地方,新法令可能会引入轻微的回归(例如键查找),而在其他地方(迭代和调整大小),性能应该得到提升。

Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.

总的来说,词典的性能,尤其是在现实生活中,由于紧凑性的引入而提高。

#2


53  

Below is answering the original first question:

下面是回答最初的第一个问题:

Should I use dict or OrderedDict in Python 3.6?

我应该使用Python 3.6中的dict或OrderedDict吗?

I think this sentence from the documentation is actually enough to answer your question

我认为文档中的这句话实际上足以回答你的问题

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

这个新实现的订单保存方面被认为是一个实现细节,不应该被依赖。

dict is not explicitly meant to be an ordered collection, so if you want to stay consistent and not rely on a side effect of the new implementation you should stick with OrderedDict.

dict类型并不是显式地表示为有序集合,因此,如果您希望保持一致,而不依赖于新实现的副作用,那么应该坚持OrderedDict类型。

Make your code future proof :)

使您的代码未来证明:)

There's a debate about that here.

这里有一个关于这个的辩论。

EDIT: Python 3.7 will keep this as a feature see

编辑:Python 3.7将保留这一特性

#3


14  

Update: Guido van Rossum announced on the mailing list that as of Python 3.7 dicts in all Python implementations must preserve insertion order.

更新:Guido van Rossum在邮件列表中宣布,在Python 3.7中,所有Python实现中都必须保留插入顺序。