哈希集与字典, w.r。don’不要找时间去寻找一个条目是否存在

时间:2021-07-13 04:11:44
HashSet<T> t = new HashSet<T>();
// add 10 million items


Dictionary<K, V> t = new Dictionary<K, V>();
// add 10 million items.

Whose .Contains method will return quicker?

谁的.Contains方法会更快地返回?

Just to clarify, my requirement is I have 10 million objects (well, strings really) that I need to check if they exist in the data structure. I will NEVER iterate.

澄清一下,我的要求是我有1000万个对象(实际上是字符串),我需要检查它们是否存在于数据结构中。我永远不会重复。

4 个解决方案

#1


124  

HashSet vs List vs Dictionary performance test, taken from here.

HashSet vs List vs Dictionary performance test,取自这里。

Add 1000000 objects (without checking duplicates)

添加1000000对象(不检查副本)

哈希集与字典, w.r。don’不要找时间去寻找一个条目是否存在

Contains check for half the objects of a collection of 10000

包含对10000集合的一半对象的检查

哈希集与字典, w.r。don’不要找时间去寻找一个条目是否存在

Remove half the objects of a collection of 10000

删除10000个集合中的一半对象

哈希集与字典, w.r。don’不要找时间去寻找一个条目是否存在

#2


59  

I assume you mean Dictionary<TKey, TValue> in the second case? HashTable is a non-generic class.

我猜你是说字典 在第二种情况下?HashTable是一个非泛型类。 ,>

You should choose the right collection for the job based on your actual requirements. Do you actually want to map each key to a value? If so, use Dictionary<,>. If you only care about it as a set, use HashSet<>.

你应该根据你的实际需求选择合适的工作。您真的想要将每个键映射到一个值吗?如果是这样的话,使用字典<、>。如果您只关心它作为一个集合,那么使用HashSet<>。

I would expect HashSet<T>.Contains and Dictionary<TKey, TValue>.ContainsKey (which are the comparable operations, assuming you're using your dictionary sensibly) to basically perform the same - they're using the same algorithm, fundamentally. I guess with the entries in Dictionary<,> being larger you end up with a greater likelihood of blowing the cache with Dictionary<,> than with HashSet<>, but I'd expect that to be insignificant compared with the pain of choosing the wrong data type simply in terms of what you're trying to achieve.

我希望HashSet < T >。和字典包含< TKey TValue >。ContainsKey(假设您明智地使用字典,这是可比较的操作)基本上执行了相同的操作——它们基本上使用了相同的算法。我猜在字典条目<、>大你最后用吹的可能性更大的缓存字典比HashSet < >、<、>但我认为是微不足道而选择错误的数据类型的痛苦只是你想实现什么。

#3


4  

These are different data structures. Also there is no generic version of HashTable.

它们是不同的数据结构。此外,哈希表也没有通用版本。

HashSet contains values of type T which HashTable (or Dictionary) contains key-value pairs. So you should choose collection on what data you need to be stored.

HashSet包含类型为T的值,而HashTable(或Dictionary)包含键-值对。因此,您应该根据需要存储的数据选择集合。

#4


3  

From MSDN documentation for Dictionary<TKey,TValue>

来自MSDN文档的字典 ,tvalue>

"Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table."

“使用它的键检索值非常快,接近O(1),因为Dictionary类是作为散列表实现的。”

With a note:

注意:

"The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey"

“检索速度取决于TKey指定类型的散列算法的质量”

I know your question/post is old - but while looking for an answer to a similar question I stumbled across this.

我知道你的问题/帖子已经过时了——但在寻找类似问题的答案时,我偶然发现了这个问题。

Hope this helps. Scroll down to the Remarks section for more details. https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx

希望这个有帮助。向下滚动到备注部分了解更多细节。https://msdn.microsoft.com/en-us/library/xfhwa508(v = vs.110). aspx

#1


124  

HashSet vs List vs Dictionary performance test, taken from here.

HashSet vs List vs Dictionary performance test,取自这里。

Add 1000000 objects (without checking duplicates)

添加1000000对象(不检查副本)

哈希集与字典, w.r。don’不要找时间去寻找一个条目是否存在

Contains check for half the objects of a collection of 10000

包含对10000集合的一半对象的检查

哈希集与字典, w.r。don’不要找时间去寻找一个条目是否存在

Remove half the objects of a collection of 10000

删除10000个集合中的一半对象

哈希集与字典, w.r。don’不要找时间去寻找一个条目是否存在

#2


59  

I assume you mean Dictionary<TKey, TValue> in the second case? HashTable is a non-generic class.

我猜你是说字典 在第二种情况下?HashTable是一个非泛型类。 ,>

You should choose the right collection for the job based on your actual requirements. Do you actually want to map each key to a value? If so, use Dictionary<,>. If you only care about it as a set, use HashSet<>.

你应该根据你的实际需求选择合适的工作。您真的想要将每个键映射到一个值吗?如果是这样的话,使用字典<、>。如果您只关心它作为一个集合,那么使用HashSet<>。

I would expect HashSet<T>.Contains and Dictionary<TKey, TValue>.ContainsKey (which are the comparable operations, assuming you're using your dictionary sensibly) to basically perform the same - they're using the same algorithm, fundamentally. I guess with the entries in Dictionary<,> being larger you end up with a greater likelihood of blowing the cache with Dictionary<,> than with HashSet<>, but I'd expect that to be insignificant compared with the pain of choosing the wrong data type simply in terms of what you're trying to achieve.

我希望HashSet < T >。和字典包含< TKey TValue >。ContainsKey(假设您明智地使用字典,这是可比较的操作)基本上执行了相同的操作——它们基本上使用了相同的算法。我猜在字典条目<、>大你最后用吹的可能性更大的缓存字典比HashSet < >、<、>但我认为是微不足道而选择错误的数据类型的痛苦只是你想实现什么。

#3


4  

These are different data structures. Also there is no generic version of HashTable.

它们是不同的数据结构。此外,哈希表也没有通用版本。

HashSet contains values of type T which HashTable (or Dictionary) contains key-value pairs. So you should choose collection on what data you need to be stored.

HashSet包含类型为T的值,而HashTable(或Dictionary)包含键-值对。因此,您应该根据需要存储的数据选择集合。

#4


3  

From MSDN documentation for Dictionary<TKey,TValue>

来自MSDN文档的字典 ,tvalue>

"Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table."

“使用它的键检索值非常快,接近O(1),因为Dictionary类是作为散列表实现的。”

With a note:

注意:

"The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey"

“检索速度取决于TKey指定类型的散列算法的质量”

I know your question/post is old - but while looking for an answer to a similar question I stumbled across this.

我知道你的问题/帖子已经过时了——但在寻找类似问题的答案时,我偶然发现了这个问题。

Hope this helps. Scroll down to the Remarks section for more details. https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx

希望这个有帮助。向下滚动到备注部分了解更多细节。https://msdn.microsoft.com/en-us/library/xfhwa508(v = vs.110). aspx