How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code? I read that keys should implement Comparable
to define an ordering. How does HashMap combine hashing and natural ordering to implement the trees? What about classes that do not implement Comparable
, or when multiple, non-mutually-comparable Comparable
implementations are keys in the same map?
当许多键具有相同的哈希代码时,Java 8的HashMap是如何退化为平衡树的呢?我读到键应该实现可比来定义排序。HashMap是如何结合哈希和自然排序来实现树的呢?如果没有实现可比的类,或者当多个、非相互可比的可比实现在同一个映射中是键时,该怎么办?
3 个解决方案
#1
14
The implementation notes comment in HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:
HashMap中的实现注释对HashMap的操作的描述比我自己编写的更好。理解树节点及其排序的相关部分是:
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]
这个映射通常作为一个绑定(嵌套)哈希表,但是当容器变得太大时,它们被转换为TreeNodes的容器,每个容器的结构与java.util.TreeMap中的类似。[…[参考译文]TreeNodes的垃圾箱可能像其他的一样被遍历和使用,但在人口过多时,它还支持快速查找。[…]
Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable" type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)
树垃圾箱(即。对于所有元素都是TreeNodes的垃圾箱,主要是按hashCode排序的,但是对于tie,如果两个元素属于相同的“类C实现可比”类型,则使用它们的compareTo方法进行排序。(我们通过反射保守地检查泛型类型以验证这一点——请参见方法comparableClassFor)。树的额外复杂性垃圾箱是值得的在提供坏的O(log n)操作键有不同的散列或公开,定货时因此,性能降低优雅地在意外或恶意使用hashCode()方法返回值的差分布,以及许多密钥共享一个hashCode,只要他们也类似。(如果这两种方法都不适用,我们可能会在时间和空间上浪费两倍于不采取预防措施的时间。但是,唯一已知的情况是由于糟糕的用户编程实践导致的,这些实践已经太慢了,这几乎没有什么区别。
When two objects have equal hash codes but are not mutually comparable, method tieBreakOrder
is invoked to break the tie, first by string comparison on getClass().getName()
(!), then by comparing System.identityHashCode
.
当两个对象具有相同的哈希码但不具有相互可比性时,将调用方法tieBreakOrder来中断连接,首先通过getClass(). getname()(!)的字符串比较,然后通过比较System.identityHashCode。
The actual tree building starts in treeifyBin
, beginning when a bin reaches TREEIFY_THRESHOLD
(currently 8), assuming the hash table has at least MIN_TREEIFY_CAPACITY
capacity (currently 64). It's a mostly-normal red-black tree implementation (crediting CLR), with some complications to support traversal in the same way as hash bins (e.g., removeTreeNode
).
实际的树构建从treeifyBin开始,从bin到达TREEIFY_THRESHOLD(目前为8)开始,假设哈希表至少有MIN_TREEIFY_CAPACITY容量(目前为64)。这是一种非常普通的红黑树实现(信任CLR),使用与散列表容器(例如removeTreeNode)相同的方式支持遍历有一些困难。
#2
2
Read the code. It is mostly a red-black tree.
阅读代码。它几乎是一棵红黑相间的树。
It does not actually require the implementation of Comparable
, but can use it if available (see for instance the find method)
它实际上并不需要可比的实现,但是如果有的话可以使用它(例如查找方法)
#3
0
HashMap
has it's own hash method that applies a supplemental 2 bit lenght hash to the objects inside in order to avoid this problems:
HashMap有自己的散列方法,对内部的对象应用一个附加的2位lenght散列,以避免出现以下问题:
Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits. Note: Null keys always map to hash 0, thus index 0.
将一个附加的散列函数应用到给定的散列代码中,它可以防止质量较差的散列函数。这一点至关重要,因为HashMap使用两倍长度的散列表,否则会遇到在低位上没有差异的散列代码的冲突。注意:空键总是映射到哈希0,因此索引为0。
If you want to see how it's done, take a look is inside the source of the HashMap class.
如果您想了解它是如何完成的,请查看HashMap类的源代码。
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
#1
14
The implementation notes comment in HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:
HashMap中的实现注释对HashMap的操作的描述比我自己编写的更好。理解树节点及其排序的相关部分是:
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]
这个映射通常作为一个绑定(嵌套)哈希表,但是当容器变得太大时,它们被转换为TreeNodes的容器,每个容器的结构与java.util.TreeMap中的类似。[…[参考译文]TreeNodes的垃圾箱可能像其他的一样被遍历和使用,但在人口过多时,它还支持快速查找。[…]
Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable" type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)
树垃圾箱(即。对于所有元素都是TreeNodes的垃圾箱,主要是按hashCode排序的,但是对于tie,如果两个元素属于相同的“类C实现可比”类型,则使用它们的compareTo方法进行排序。(我们通过反射保守地检查泛型类型以验证这一点——请参见方法comparableClassFor)。树的额外复杂性垃圾箱是值得的在提供坏的O(log n)操作键有不同的散列或公开,定货时因此,性能降低优雅地在意外或恶意使用hashCode()方法返回值的差分布,以及许多密钥共享一个hashCode,只要他们也类似。(如果这两种方法都不适用,我们可能会在时间和空间上浪费两倍于不采取预防措施的时间。但是,唯一已知的情况是由于糟糕的用户编程实践导致的,这些实践已经太慢了,这几乎没有什么区别。
When two objects have equal hash codes but are not mutually comparable, method tieBreakOrder
is invoked to break the tie, first by string comparison on getClass().getName()
(!), then by comparing System.identityHashCode
.
当两个对象具有相同的哈希码但不具有相互可比性时,将调用方法tieBreakOrder来中断连接,首先通过getClass(). getname()(!)的字符串比较,然后通过比较System.identityHashCode。
The actual tree building starts in treeifyBin
, beginning when a bin reaches TREEIFY_THRESHOLD
(currently 8), assuming the hash table has at least MIN_TREEIFY_CAPACITY
capacity (currently 64). It's a mostly-normal red-black tree implementation (crediting CLR), with some complications to support traversal in the same way as hash bins (e.g., removeTreeNode
).
实际的树构建从treeifyBin开始,从bin到达TREEIFY_THRESHOLD(目前为8)开始,假设哈希表至少有MIN_TREEIFY_CAPACITY容量(目前为64)。这是一种非常普通的红黑树实现(信任CLR),使用与散列表容器(例如removeTreeNode)相同的方式支持遍历有一些困难。
#2
2
Read the code. It is mostly a red-black tree.
阅读代码。它几乎是一棵红黑相间的树。
It does not actually require the implementation of Comparable
, but can use it if available (see for instance the find method)
它实际上并不需要可比的实现,但是如果有的话可以使用它(例如查找方法)
#3
0
HashMap
has it's own hash method that applies a supplemental 2 bit lenght hash to the objects inside in order to avoid this problems:
HashMap有自己的散列方法,对内部的对象应用一个附加的2位lenght散列,以避免出现以下问题:
Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits. Note: Null keys always map to hash 0, thus index 0.
将一个附加的散列函数应用到给定的散列代码中,它可以防止质量较差的散列函数。这一点至关重要,因为HashMap使用两倍长度的散列表,否则会遇到在低位上没有差异的散列代码的冲突。注意:空键总是映射到哈希0,因此索引为0。
If you want to see how it's done, take a look is inside the source of the HashMap class.
如果您想了解它是如何完成的,请查看HashMap类的源代码。
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}