python如何计算元组的哈希值

时间:2021-10-14 00:30:32

In python, if I have a tuple with many elements, is its hash calculated from its elements' ids or its elements' content?

在python中,如果我有一个包含许多元素的元组,那么它的哈希值是根据元素的ids还是元素的内容来计算的?

In this example,

在这个例子中,

a = (1, [1,2])
hash(a)

It errors out saying list is unhashable. So I guess it's not computed by id, or probably there is a check on whether the element is mutable.

它错误地说列表是不可用的。所以我猜它不是由id计算的,或者可能是检查元素是否可变。

Now see this example

现在看这个例子

class A: pass
a0 = A()
ta = (1, a0)
hash(ta)  # -1122968024
a0.x = 20
hash(ta)  # -1122968024

Here it turns out the hash of ta does not change with the modification of its element, i.e., a0. So maybe a0's id is used for the hash calculation? Is a0 somehow considered as immutable? How does python know if a type is mutable?

在这里,事实证明ta的散列不随其元素的修改而改变,即a0。那么也许a0的id用于哈希计算? a0在某种程度上被认为是不可变的吗? python如何知道类型是否可变?

Now consider this case

现在考虑这个案例

b = (1, 2)
id(b)  # 3980742764
c = (1, 2)
id(c)  # 3980732588
tb = (1, b)
tc = (1, c) 
hash(tb)  # -1383040070
hash(tc)  # -1383040070

It seems the content of b and c are used for the hash calculation.

似乎b和c的内容用于哈希计算。

How should I understand these examples?

我该如何理解这些例子?

4 个解决方案

#1


23  

Neither. It is calculated on the basis of the hashes of these elements, not the contents (values).

都不是。它是根据这些元素的散列计算的,而不是内容(值)。

Take a look at this paragraph in python's documentation glossary.

在python的文档词汇表中查看此段落。

Whether something is hashable or not, and how it is hashed, depends on the implementation of its .__hash__() method. Python itself has no idea about mutability of an object.

是否某些东西是可以清除的,以及它是如何散列的,取决于它的.__ hash __()方法的实现。 Python本身并不知道对象的可变性。

In your first example, tuple happens to hash itself on the basis of its elements, while a list doesn't have a hash at all - the .__hash__() method is not implemented for it (and for a good reason). That's why a tuple with a list object inside of it is not hashable.

在你的第一个例子中,元组恰好在其元素的基础上散列自身,而列表根本没有散列 - .__散列__()方法没有为它实现(并且有充分的理由)。这就是为什么在其中包含列表对象的元组不可清除的原因。

Now, having that in mind, let's have a look at python data model documentation, and what it has to say on the topic:

现在,考虑到这一点,让我们看看python数据模型文档,以及它对该主题的看法:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

用户定义的类默认具有__eq __()和__hash __()方法;与它们相比,所有对象都比较不相等(除了它们自己)和x .__ hash __()返回一个合适的值,使得x == y意味着x是y而hash(x)== hash(y)。

That's why you don't have to define .__hash__() for your classes - python does it for you in this case. The default implementation doesn't take instance fields into the account though. That's why you can change the values inside your object without changing its hash.

这就是为什么你不必为你的类定义.__ hash __() - 在这种情况下python为你做了。默认实现不会将实例字段带入帐户。这就是为什么您可以更改对象内部的值而不更改其哈希值的原因。

In this regard you're right - the default (CPython's) implementation of the hashing function for custom classes relies on the id() of an object, and not on the values inside of it. It is an implementation detail, and it differs between Python versions though. In more recent versions of Python the relation between hash() and id() involves some randomization.

在这方面你是对的 - 自定义类的哈希函数的默认(CPython)实现依赖于对象的id(),而不依赖于它内部的值。它是一个实现细节,但它在Python版本之间有所不同。在更新的Python版本中,hash()和id()之间的关系涉及一些随机化。


But how does it actually hash itself?

While the details are quite complicated and probably involve some advanced math, the implementation of the hash function for tuple objects is written in C, and can be seen here (see static Py_hash_t tuplehash(PyTupleObject *v).

虽然细节非常复杂并且可能涉及一些高级数学,但是元组对象的散列函数的实现是用C语言编写的,可以在这里看到(参见静态Py_hash_t tuplehash(PyTupleObject * v)。

The calculation involves XORing a constant with the hashes of each of the tuple's elements. The line responsible for hashing of the elements is this one:

计算涉及使用每个元组元素的哈希对一​​个常量进行异或运算。负责元素散列的行是这样的:

y = PyObject_Hash(*p++);

So, to answer your original question: it does a bunch of XOR hokus-pocus with the hashes of each of its elements. Whether or not the contents of these elements are used depends on their specific hash functions.

所以,回答你原来的问题:它有一堆XOR hokus-pocus和它的每个元素的哈希。是否使用这些元素的内容取决于它们的特定散列函数。

#2


7  

The core contract of hashing is that equal objects have equal hashes. In particular, hashing does not directly care about mutability or mutation; it only cares about mutation that affects equality comparisons.

哈希的核心契约是等对象具有相等的哈希值。特别是,散列并不直接关注可变性或突变;它只关心影响平等比较的突变。


Your first tuple is unhashable because mutating the nested list would change how the tuple behaves in equality comparisons.

你的第一个元组是不可取的,因为改变嵌套列表会改变元组在相等比较中的行为方式。

Mutating a0 in your second example doesn't affect the hash of the tuple because it doesn't affect equality comparisons. a0 is still only equal to itself, and its hash is unchanged.

在第二个示例中对a0进行变换不会影响元组的散列,因为它不会影响相等比较。 a0仍然只等于它自己,并且它的哈希值不变。

tb and tc in your third example have equal hashes because they are equal tuples, regardless of whether their elements are the same objects.

第三个示例中的tb和tc具有相等的哈希值,因为它们是相等的元组,无论它们的元素是否是相同的对象。


This all means that tuples cannot (directly) use id for hashes. If they did, equal tuples with distinct but equal elements could hash differently, violating the contract of hashing. Without special-casing element types, the only things tuples can use to compute their own hashes are their elements' hashes, so tuples base their hashes on their elements' hashes.

这意味着元组不能(直接)使用id作为哈希。如果他们这样做,具有不同但相同元素的相等元组可能会以不同的方式散列,违反哈希合约。如果没有特殊的外壳元素类型,元组可以用来计算它们自己的哈希的唯一东西是它们的元素的哈希,所以元组将它们的哈希基于它们的元素的哈希。

#3


3  

The answer to the question "Is the tuple's hash calculated based on the identity or the value?" is: Neither.

问题的答案“是否根据身份或价值计算元组的哈希?”是:都不是。

The correct answer is that the tuple's hash is calculated from the elements' hashes. How those hashes are calculated is (more or less) irrelevant.

正确的答案是元组的哈希值是从元素的哈希值计算出来的。如何计算这些哈希值(或多或少)无关紧要。

An easy way to prove this is to see what happens when you put a list into a tuple:

证明这一点的一个简单方法是查看将列表放入元组时会发生什么:

>>> hash( (1, 2) )
3713081631934410656
>>> hash( (1, []) )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Because lists aren't hashable, a tuple containing a list isn't hashable either.

由于列表不可清除,因此包含列表的元组也不可清除。


Let's take a closer look at this example you brought:

让我们仔细看看你带来的这个例子:

class A: pass
a0 = A()
ta = (1, a0)
hash(ta)  # -1122968024
a0.x = 20
hash(ta)  # -1122968024

Why doesn't setting a0.x = 20 affect the tuple's hash? Well, if we modify this code to output the hash of a0, you'll see that setting a0.x = 20 has no effect on a0's hash value:

为什么不设置a0.x = 20会影响元组的哈希?好吧,如果我们修改此代码以输出a0的散列,您将看到设置a0.x = 20对a0的散列值没有影响:

a0 = A()
print(hash(a0))  # -9223363274645980307
a0.x = 20
print(hash(a0))  # -9223363274645980307

The reason for this is that python implements a default hash function for you. From the docs:

原因是python为你实现了一个默认的哈希函数。来自文档:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

用户定义的类默认具有__eq __()和__hash __()方法;与它们相比,所有对象都比较不相等(除了它们自己)和x .__ hash __()返回一个合适的值,使得x == y意味着x是y而hash(x)== hash(y)。

The default hash function ignores the object's attributes and calculates the hash based on the object's id. No matter what changes you make to a0, its hash will always stay the same. (Though it is possible to define a custom hash function for instances of your A class by implementing a custom __hash__ method.)

默认的哈希函数忽略对象的属性,并根据对象的id计算哈希值。无论你对a0做了什么改变,它的散列总是保持不变。 (尽管可以通过实现自定义__hash__方法为A类的实例定义自定义哈希函数。)


Addendum: The reason why lists aren't hashable is because they're mutable. From the docs:

附录:列表不可清除的原因是因为它们是可变的。来自文档:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

如果一个类定义了可变对象并实现了一个__eq __()方法,那么它不应该实现__hash __(),因为hashable集合的实现要求一个键的哈希值是不可变的(如果对象的哈希值发生变化,那么它将是错误的哈希桶)。

Lists fall into this category.

列表属于此类别。

#4


2  

the hash of a tuple is based on the contents, not on the _id_s of the tuples. And the hashes are computed recursively: if one element isn't hashable (like a list element), then the tuple itself isn't hashable.

元组的哈希基于内容,而不是基于元组的_id_s。并且以递归方式计算哈希:如果一个元素不可哈希(如列表元素),则元组本身不可哈希。

That's perfectly normal that if a and b are tuples and a == b, then hash(a) == hash(b) (if hashes can be computed of course), even if a is not b.

如果a和b是元组并且a == b,则哈希(a)==哈希(b)(如果哈希可以当然计算),即使a不是b,这是完全正常的。

(on the contrary hash(a) == hash(b) doesn't mean that a == b)

(相反,hash(a)== hash(b)并不代表a == b)

The information conveyed by is is often not very useful, because of python object interning for example.

由于例如python对象实习,由is传达的信息通常不是很有用。

#1


23  

Neither. It is calculated on the basis of the hashes of these elements, not the contents (values).

都不是。它是根据这些元素的散列计算的,而不是内容(值)。

Take a look at this paragraph in python's documentation glossary.

在python的文档词汇表中查看此段落。

Whether something is hashable or not, and how it is hashed, depends on the implementation of its .__hash__() method. Python itself has no idea about mutability of an object.

是否某些东西是可以清除的,以及它是如何散列的,取决于它的.__ hash __()方法的实现。 Python本身并不知道对象的可变性。

In your first example, tuple happens to hash itself on the basis of its elements, while a list doesn't have a hash at all - the .__hash__() method is not implemented for it (and for a good reason). That's why a tuple with a list object inside of it is not hashable.

在你的第一个例子中,元组恰好在其元素的基础上散列自身,而列表根本没有散列 - .__散列__()方法没有为它实现(并且有充分的理由)。这就是为什么在其中包含列表对象的元组不可清除的原因。

Now, having that in mind, let's have a look at python data model documentation, and what it has to say on the topic:

现在,考虑到这一点,让我们看看python数据模型文档,以及它对该主题的看法:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

用户定义的类默认具有__eq __()和__hash __()方法;与它们相比,所有对象都比较不相等(除了它们自己)和x .__ hash __()返回一个合适的值,使得x == y意味着x是y而hash(x)== hash(y)。

That's why you don't have to define .__hash__() for your classes - python does it for you in this case. The default implementation doesn't take instance fields into the account though. That's why you can change the values inside your object without changing its hash.

这就是为什么你不必为你的类定义.__ hash __() - 在这种情况下python为你做了。默认实现不会将实例字段带入帐户。这就是为什么您可以更改对象内部的值而不更改其哈希值的原因。

In this regard you're right - the default (CPython's) implementation of the hashing function for custom classes relies on the id() of an object, and not on the values inside of it. It is an implementation detail, and it differs between Python versions though. In more recent versions of Python the relation between hash() and id() involves some randomization.

在这方面你是对的 - 自定义类的哈希函数的默认(CPython)实现依赖于对象的id(),而不依赖于它内部的值。它是一个实现细节,但它在Python版本之间有所不同。在更新的Python版本中,hash()和id()之间的关系涉及一些随机化。


But how does it actually hash itself?

While the details are quite complicated and probably involve some advanced math, the implementation of the hash function for tuple objects is written in C, and can be seen here (see static Py_hash_t tuplehash(PyTupleObject *v).

虽然细节非常复杂并且可能涉及一些高级数学,但是元组对象的散列函数的实现是用C语言编写的,可以在这里看到(参见静态Py_hash_t tuplehash(PyTupleObject * v)。

The calculation involves XORing a constant with the hashes of each of the tuple's elements. The line responsible for hashing of the elements is this one:

计算涉及使用每个元组元素的哈希对一​​个常量进行异或运算。负责元素散列的行是这样的:

y = PyObject_Hash(*p++);

So, to answer your original question: it does a bunch of XOR hokus-pocus with the hashes of each of its elements. Whether or not the contents of these elements are used depends on their specific hash functions.

所以,回答你原来的问题:它有一堆XOR hokus-pocus和它的每个元素的哈希。是否使用这些元素的内容取决于它们的特定散列函数。

#2


7  

The core contract of hashing is that equal objects have equal hashes. In particular, hashing does not directly care about mutability or mutation; it only cares about mutation that affects equality comparisons.

哈希的核心契约是等对象具有相等的哈希值。特别是,散列并不直接关注可变性或突变;它只关心影响平等比较的突变。


Your first tuple is unhashable because mutating the nested list would change how the tuple behaves in equality comparisons.

你的第一个元组是不可取的,因为改变嵌套列表会改变元组在相等比较中的行为方式。

Mutating a0 in your second example doesn't affect the hash of the tuple because it doesn't affect equality comparisons. a0 is still only equal to itself, and its hash is unchanged.

在第二个示例中对a0进行变换不会影响元组的散列,因为它不会影响相等比较。 a0仍然只等于它自己,并且它的哈希值不变。

tb and tc in your third example have equal hashes because they are equal tuples, regardless of whether their elements are the same objects.

第三个示例中的tb和tc具有相等的哈希值,因为它们是相等的元组,无论它们的元素是否是相同的对象。


This all means that tuples cannot (directly) use id for hashes. If they did, equal tuples with distinct but equal elements could hash differently, violating the contract of hashing. Without special-casing element types, the only things tuples can use to compute their own hashes are their elements' hashes, so tuples base their hashes on their elements' hashes.

这意味着元组不能(直接)使用id作为哈希。如果他们这样做,具有不同但相同元素的相等元组可能会以不同的方式散列,违反哈希合约。如果没有特殊的外壳元素类型,元组可以用来计算它们自己的哈希的唯一东西是它们的元素的哈希,所以元组将它们的哈希基于它们的元素的哈希。

#3


3  

The answer to the question "Is the tuple's hash calculated based on the identity or the value?" is: Neither.

问题的答案“是否根据身份或价值计算元组的哈希?”是:都不是。

The correct answer is that the tuple's hash is calculated from the elements' hashes. How those hashes are calculated is (more or less) irrelevant.

正确的答案是元组的哈希值是从元素的哈希值计算出来的。如何计算这些哈希值(或多或少)无关紧要。

An easy way to prove this is to see what happens when you put a list into a tuple:

证明这一点的一个简单方法是查看将列表放入元组时会发生什么:

>>> hash( (1, 2) )
3713081631934410656
>>> hash( (1, []) )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Because lists aren't hashable, a tuple containing a list isn't hashable either.

由于列表不可清除,因此包含列表的元组也不可清除。


Let's take a closer look at this example you brought:

让我们仔细看看你带来的这个例子:

class A: pass
a0 = A()
ta = (1, a0)
hash(ta)  # -1122968024
a0.x = 20
hash(ta)  # -1122968024

Why doesn't setting a0.x = 20 affect the tuple's hash? Well, if we modify this code to output the hash of a0, you'll see that setting a0.x = 20 has no effect on a0's hash value:

为什么不设置a0.x = 20会影响元组的哈希?好吧,如果我们修改此代码以输出a0的散列,您将看到设置a0.x = 20对a0的散列值没有影响:

a0 = A()
print(hash(a0))  # -9223363274645980307
a0.x = 20
print(hash(a0))  # -9223363274645980307

The reason for this is that python implements a default hash function for you. From the docs:

原因是python为你实现了一个默认的哈希函数。来自文档:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

用户定义的类默认具有__eq __()和__hash __()方法;与它们相比,所有对象都比较不相等(除了它们自己)和x .__ hash __()返回一个合适的值,使得x == y意味着x是y而hash(x)== hash(y)。

The default hash function ignores the object's attributes and calculates the hash based on the object's id. No matter what changes you make to a0, its hash will always stay the same. (Though it is possible to define a custom hash function for instances of your A class by implementing a custom __hash__ method.)

默认的哈希函数忽略对象的属性,并根据对象的id计算哈希值。无论你对a0做了什么改变,它的散列总是保持不变。 (尽管可以通过实现自定义__hash__方法为A类的实例定义自定义哈希函数。)


Addendum: The reason why lists aren't hashable is because they're mutable. From the docs:

附录:列表不可清除的原因是因为它们是可变的。来自文档:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

如果一个类定义了可变对象并实现了一个__eq __()方法,那么它不应该实现__hash __(),因为hashable集合的实现要求一个键的哈希值是不可变的(如果对象的哈希值发生变化,那么它将是错误的哈希桶)。

Lists fall into this category.

列表属于此类别。

#4


2  

the hash of a tuple is based on the contents, not on the _id_s of the tuples. And the hashes are computed recursively: if one element isn't hashable (like a list element), then the tuple itself isn't hashable.

元组的哈希基于内容,而不是基于元组的_id_s。并且以递归方式计算哈希:如果一个元素不可哈希(如列表元素),则元组本身不可哈希。

That's perfectly normal that if a and b are tuples and a == b, then hash(a) == hash(b) (if hashes can be computed of course), even if a is not b.

如果a和b是元组并且a == b,则哈希(a)==哈希(b)(如果哈希可以当然计算),即使a不是b,这是完全正常的。

(on the contrary hash(a) == hash(b) doesn't mean that a == b)

(相反,hash(a)== hash(b)并不代表a == b)

The information conveyed by is is often not very useful, because of python object interning for example.

由于例如python对象实习,由is传达的信息通常不是很有用。