为什么Java的TreeMap不允许初始尺寸呢?

时间:2021-08-26 22:01:59

Loading 1 000 000 numbers takes 2 seconds to load into a treemap (binary search tree), but takes milliseconds to load into a hashmap (in java).
The only difference between the two is that I can see is I can set a hashmap's initial size so it does not constantly need to re-size.

加载100 000个数字需要2秒的时间加载到treemap(二叉搜索树)中,但是要在java中加载到hashmap中需要几毫秒的时间。两者之间唯一的区别是,我可以看到,我可以设置hashmap的初始大小,这样它就不会经常需要重新调整大小。

Am I wrong to assume a TreeMap's array's initial size should be able to be set? Is there a different reason that it is so slow?
Is there a logical reason for why one cannot set TreeMap's, or any generic binary search tree's, size or is this wrong?

假设TreeMap数组的初始大小应该可以设置,我错了吗?是否有不同的原因导致它如此缓慢?为什么一个人不能设置TreeMap,或者任何通用的二叉搜索树的大小或者是这个错误,有一个合乎逻辑的原因吗?

4 个解决方案

#1


10  

Unlike HashMap that re-allocates its internals as new ones get inserted, the TreeMap does not generally reallocate its nodes on adding new ones. The difference can be very loosely illustrated as that between an ArrayList and a LinkedList: the first re-allocates to resize, while the second one does not. That is why setting the initial size of a TreeMap is roughly as meaningless as trying to set the initial size of a LinkedList.

不像HashMap在插入新元素时重新分配它的内部,TreeMap通常不会重新分配它的节点来添加新的节点。差异可以很松散地说明为ArrayList和LinkedList之间的区别:第一个重新分配大小,而第二个则没有。这就是为什么设置TreeMap的初始大小与尝试设置LinkedList的初始大小一样毫无意义。

The speed difference is due to the different time complexity of the two containers: inserting N nodes into a HashMap is O(n), while for the TreeMap it's O(N*LogN), which for 1000000 nodes is roughly 20 times asymptotic difference. Although the difference in asymptotic complexity does not translate directly into the timing difference because of different constants dictated by the individual algorithms, it serves as a good way to decide which algorithm is going to be faster on very large inputs.

速度差异是由于两个容器的时间复杂度不同:在HashMap中插入N个节点是O(N),而TreeMap是O(N*LogN),对于1000000个节点,大约是20倍的渐近差异。虽然渐近复杂性的差异并没有直接转化为时间差,因为不同的算法决定了不同的常数,但它可以作为一种很好的方法来决定在非常大的输入中哪种算法会更快。

#2


4  

Am I wrong to assume a TreeMap's array's initial size should be able to be set?

假设TreeMap数组的初始大小应该可以设置,我错了吗?

Yes. A TreeMap doesn't have an array. A TreeMap uses binary nodes with 2 children.

是的。TreeMap没有数组。TreeMap使用二进制节点和两个子节点。

If you are suggesting that the number of children in a tree node should be a parameter, then you need to figure out how that impacts on search time. And I think that it turns the search time from O(log2N) to O(log2M * log2(N/M)) where N is the number elements and M is the average number of node children. (And I'm making some optimistic assumptions ...) That's not a "win".

如果您认为树节点中的子节点数量应该是一个参数,那么您需要弄清楚这对搜索时间的影响。我认为它把搜索时间从O(log2N)转到O(log2M * log2(N/M)),其中N是number元素,M是节点子节点的平均数量。(我做了一些乐观的假设……)这不是一个“赢”。

Is there a different reason that it is so slow?

是否有不同的原因导致它如此缓慢?

Yes. The reason that a (large) TreeMap is slow relative to a (large) HashMap under optimal circumstances is that lookup using a balanced binary tree requires looking at log2N tree nodes. By constrast, in an optimal HashMap (good load factor and no collision hotspots) a lookup involves 1 hashcode calculation and looking at O(1) hashchain nodes.

是的。在最优情况下,(大型)TreeMap相对于(大型)HashMap的速度较慢的原因是,使用平衡的二叉树的查找需要查看log2N树节点。在一个最优的HashMap(良好的负载因子和无冲突热点)中,查找包含1个hashcode计算并查看O(1) hashchain节点。

Notes:

注:

  1. TreeMap uses a binary tree organization that gives balanced trees, so O(log2N) is the worst case lookup time.
  2. TreeMap使用一个提供平衡树的二叉树组织,所以O(log2N)是最坏的情况查找时间。
  3. HashMap performance depends on the collision rate of the hash function and key space. In the worst case where all keys end up on the same hash chain, a HashMap has O(N) lookup.
  4. HashMap的性能取决于散列函数和键空间的冲突率。在最坏的情况下,所有键都位于相同的散列链上,HashMap有O(N)查找。

#3


3  

A Treemap is always balanced. Every time you add a node to the tree, it must make sure the nodes are all in order by the provided comparator. You don't have a specified size because the treemap is designed for a smooth sorted group of nodes and to traverse through the nodes easily.

Treemap始终是平衡的。每当向树中添加一个节点时,它必须确保所有节点都是由提供的比较器来排序的。您没有指定的大小,因为treemap是为有序的节点组设计的,并且可以轻松地遍历节点。

A Hashmap needs to have a size-able amount of free space for the things that you store in it. My professor has always told me that it needs 5 times the amount of space that the objects or whatever you are storing in that hashmap. So specifying the size from the initial creation of the Hashmap improves the speed of your hashmap. Otherwise, if you have more objects going into a hashmap than you planned for, the hashmap has to "size up".

Hashmap需要为存储在其中的东西提供可度量的*空间。我的教授一直告诉我,它需要5倍的空间,对象或任何你在这个hashmap中存储的东西。因此,从Hashmap的初始创建指定大小可以提高Hashmap的速度。否则,如果您有更多的对象进入hashmap,那么hashmap就必须“增大”。

(edited for spelling)

拼写(编辑)

#4


2  

Am I wrong to assume a TreeMap's array's initial size should be able to be set?

假设TreeMap数组的初始大小应该可以设置,我错了吗?

Yes. It doesn't have an array. It has a tree.

是的。它没有数组。它有一个树。

#1


10  

Unlike HashMap that re-allocates its internals as new ones get inserted, the TreeMap does not generally reallocate its nodes on adding new ones. The difference can be very loosely illustrated as that between an ArrayList and a LinkedList: the first re-allocates to resize, while the second one does not. That is why setting the initial size of a TreeMap is roughly as meaningless as trying to set the initial size of a LinkedList.

不像HashMap在插入新元素时重新分配它的内部,TreeMap通常不会重新分配它的节点来添加新的节点。差异可以很松散地说明为ArrayList和LinkedList之间的区别:第一个重新分配大小,而第二个则没有。这就是为什么设置TreeMap的初始大小与尝试设置LinkedList的初始大小一样毫无意义。

The speed difference is due to the different time complexity of the two containers: inserting N nodes into a HashMap is O(n), while for the TreeMap it's O(N*LogN), which for 1000000 nodes is roughly 20 times asymptotic difference. Although the difference in asymptotic complexity does not translate directly into the timing difference because of different constants dictated by the individual algorithms, it serves as a good way to decide which algorithm is going to be faster on very large inputs.

速度差异是由于两个容器的时间复杂度不同:在HashMap中插入N个节点是O(N),而TreeMap是O(N*LogN),对于1000000个节点,大约是20倍的渐近差异。虽然渐近复杂性的差异并没有直接转化为时间差,因为不同的算法决定了不同的常数,但它可以作为一种很好的方法来决定在非常大的输入中哪种算法会更快。

#2


4  

Am I wrong to assume a TreeMap's array's initial size should be able to be set?

假设TreeMap数组的初始大小应该可以设置,我错了吗?

Yes. A TreeMap doesn't have an array. A TreeMap uses binary nodes with 2 children.

是的。TreeMap没有数组。TreeMap使用二进制节点和两个子节点。

If you are suggesting that the number of children in a tree node should be a parameter, then you need to figure out how that impacts on search time. And I think that it turns the search time from O(log2N) to O(log2M * log2(N/M)) where N is the number elements and M is the average number of node children. (And I'm making some optimistic assumptions ...) That's not a "win".

如果您认为树节点中的子节点数量应该是一个参数,那么您需要弄清楚这对搜索时间的影响。我认为它把搜索时间从O(log2N)转到O(log2M * log2(N/M)),其中N是number元素,M是节点子节点的平均数量。(我做了一些乐观的假设……)这不是一个“赢”。

Is there a different reason that it is so slow?

是否有不同的原因导致它如此缓慢?

Yes. The reason that a (large) TreeMap is slow relative to a (large) HashMap under optimal circumstances is that lookup using a balanced binary tree requires looking at log2N tree nodes. By constrast, in an optimal HashMap (good load factor and no collision hotspots) a lookup involves 1 hashcode calculation and looking at O(1) hashchain nodes.

是的。在最优情况下,(大型)TreeMap相对于(大型)HashMap的速度较慢的原因是,使用平衡的二叉树的查找需要查看log2N树节点。在一个最优的HashMap(良好的负载因子和无冲突热点)中,查找包含1个hashcode计算并查看O(1) hashchain节点。

Notes:

注:

  1. TreeMap uses a binary tree organization that gives balanced trees, so O(log2N) is the worst case lookup time.
  2. TreeMap使用一个提供平衡树的二叉树组织,所以O(log2N)是最坏的情况查找时间。
  3. HashMap performance depends on the collision rate of the hash function and key space. In the worst case where all keys end up on the same hash chain, a HashMap has O(N) lookup.
  4. HashMap的性能取决于散列函数和键空间的冲突率。在最坏的情况下,所有键都位于相同的散列链上,HashMap有O(N)查找。

#3


3  

A Treemap is always balanced. Every time you add a node to the tree, it must make sure the nodes are all in order by the provided comparator. You don't have a specified size because the treemap is designed for a smooth sorted group of nodes and to traverse through the nodes easily.

Treemap始终是平衡的。每当向树中添加一个节点时,它必须确保所有节点都是由提供的比较器来排序的。您没有指定的大小,因为treemap是为有序的节点组设计的,并且可以轻松地遍历节点。

A Hashmap needs to have a size-able amount of free space for the things that you store in it. My professor has always told me that it needs 5 times the amount of space that the objects or whatever you are storing in that hashmap. So specifying the size from the initial creation of the Hashmap improves the speed of your hashmap. Otherwise, if you have more objects going into a hashmap than you planned for, the hashmap has to "size up".

Hashmap需要为存储在其中的东西提供可度量的*空间。我的教授一直告诉我,它需要5倍的空间,对象或任何你在这个hashmap中存储的东西。因此,从Hashmap的初始创建指定大小可以提高Hashmap的速度。否则,如果您有更多的对象进入hashmap,那么hashmap就必须“增大”。

(edited for spelling)

拼写(编辑)

#4


2  

Am I wrong to assume a TreeMap's array's initial size should be able to be set?

假设TreeMap数组的初始大小应该可以设置,我错了吗?

Yes. It doesn't have an array. It has a tree.

是的。它没有数组。它有一个树。