Python的字典映射使用什么哈希算法?

时间:2021-09-17 23:45:56

I was messing around with making a command line parser and was wondering what kind of hash algorithm python dict's use?

我正忙着制作一个命令行解析器,并想知道什么样的哈希算法python dict使用?

The way I have it set up, I have a pattern match algorithm which matches tokenized input sequences with a dictionary key. Some of the keys are relatively long (length 5 or 6 tuples of 6-7 character strings). I was wondering if there was a point at which long dictionary keys significantly reduce the efficiency of key retrieval.

我设置它的方式,我有一个模式匹配算法,它将标记化的输入序列与字典键匹配。一些键相对较长(长度为5或6个6-7个字符串的元组)。我想知道长字典键是否会显着降低密钥检索的效率。

1 个解决方案

#1


29  

The hash that it uses depends on the object being used as a key -- each class can define its own __hash__() method, and the value that it returns for a particular instance is what is used for the dictionary.

它使用的散列取决于被用作键的对象 - 每个类可以定义自己的__hash __()方法,并且它为特定实例返回的值是用于字典的值。

Python itself provides the hash implementation for str and tuple types. A quick look at the source should reveal the exact algorithm for those.

Python本身为str和tuple类型提供了哈希实现。快速查看源代码应该揭示出那些确切的算法。

The hash of a tuple is based on the hashes of its contents. The algorithm is essentially this (simplified slightly):

元组的哈希基于其内容的哈希。该算法基本上是这个(略有简化):

def hash(tuple):
    mult = 1000003
    x = 0x345678
    for index, item in enumerate(tuple):
        x = ((x ^ hash(item)) * mult) & (1<<32)
        mult += (82520 + (len(tuple)-index)*2)
    return x + 97531

For strings, the interpreter also iterates over every character, combining them with this (again, slightly simplified) algorithm:

对于字符串,解释器还迭代每个字符,将它们与此(再次,略微简化)算法组合:

def hash(string):
    x = string[0] << 7
    for chr in string[1:]:
        x = ((1000003 * x) ^ chr) & (1<<32)
    return x

A bigger issue to worry about is avoiding hash collisions. Colliding hash keys will cause a linear search as the dictionary tries to find a place to store the new object (this is now being recognized as a security issue, and tha behavior may be changing in upcoming python versions)

需要担心的一个更大问题是避免哈希冲突。当字典试图找到存储新对象的位置时,碰撞哈希键将导致线性搜索(现在这被认为是一个安全问题,并且在即将发布的python版本中行为可能会发生变化)

#1


29  

The hash that it uses depends on the object being used as a key -- each class can define its own __hash__() method, and the value that it returns for a particular instance is what is used for the dictionary.

它使用的散列取决于被用作键的对象 - 每个类可以定义自己的__hash __()方法,并且它为特定实例返回的值是用于字典的值。

Python itself provides the hash implementation for str and tuple types. A quick look at the source should reveal the exact algorithm for those.

Python本身为str和tuple类型提供了哈希实现。快速查看源代码应该揭示出那些确切的算法。

The hash of a tuple is based on the hashes of its contents. The algorithm is essentially this (simplified slightly):

元组的哈希基于其内容的哈希。该算法基本上是这个(略有简化):

def hash(tuple):
    mult = 1000003
    x = 0x345678
    for index, item in enumerate(tuple):
        x = ((x ^ hash(item)) * mult) & (1<<32)
        mult += (82520 + (len(tuple)-index)*2)
    return x + 97531

For strings, the interpreter also iterates over every character, combining them with this (again, slightly simplified) algorithm:

对于字符串,解释器还迭代每个字符,将它们与此(再次,略微简化)算法组合:

def hash(string):
    x = string[0] << 7
    for chr in string[1:]:
        x = ((1000003 * x) ^ chr) & (1<<32)
    return x

A bigger issue to worry about is avoiding hash collisions. Colliding hash keys will cause a linear search as the dictionary tries to find a place to store the new object (this is now being recognized as a security issue, and tha behavior may be changing in upcoming python versions)

需要担心的一个更大问题是避免哈希冲突。当字典试图找到存储新对象的位置时,碰撞哈希键将导致线性搜索(现在这被认为是一个安全问题,并且在即将发布的python版本中行为可能会发生变化)