.NET字典如何解决冲突?

时间:2021-06-17 06:59:56

I have a problem with a custom object that needs to be keyed for a table. I need to generate a unique numeric key. I'm having collision problems and I'm wondering if I can leverage a dictionary to help me. Assume I have an object like this:

我有一个需要键入表的自定义对象的问题。我需要生成一个唯一的数字键。我遇到了碰撞问题,我想知道我是否可以利用字典来帮助我。假设我有一个这样的对象:

class Thingy
{
    public string Foo;
    public string Bar;
    public string Others;
}

and so on with more fields. Lets say Foo and Bar are my key fields - if they're equal between two Thingys, then the two objects should be considered equal (one may represent an update to the other, with Others fields being updated.) So I have these:

等等有更多领域。让我们说Foo和Bar是我的关键字段 - 如果它们在两个Thingys之间相等,那么这两个对象应该被认为是相等的(一个可能代表对另一个的更新,其他字段正在更新。)所以我有这些:

public override bool Equals(object obj)
{
    Thingy thing = (Thingy)obj; // yes I do type check first
    return (this.Foo == thing.Foo && this.Bar == thing.Bar);
}

public override int GetHashCode()
{
    return (this.Foo + this.Bar).GetHashCode(); // using default string impl
}

so this works for the most part, but there are rare occasions where two Thingys that are actually different have the same hash code.

所以这在大多数情况下都有效,但在极少数情况下,两个实际上不同的Thing具有相同的哈希码。

My question is this: could I use a Dictionary<Thingy, int> where I put in my Thingys, and use a sequential value coming out of the dictionary as my actual key? I'm wondering if the Dictionary, when detecting a rare hash code collision, will call my Equals method, determine that the objects are actually different, and store them differently. I imaging then when looking it up, it would see a bucket for that hash and search for the correct Thingy, again using Equals for comparison.

我的问题是:我可以使用Dictionary 放入我的Thingys中,并使用字典中的顺序值作为我的实际键吗?我想知道,当检测到罕见的哈希代码冲突时,字典是否会调用我的Equals方法,确定对象实际上是不同的,并以不同方式存储它们。我在查找时进行成像,它会看到该哈希的桶并搜索正确的Thingy,再次使用Equals进行比较。 ,int>

Is this the case with dictionary, or does it only resolve collisions where the hash code is different, but (hash % size) is the same? If this won't work, what might?

这是字典的情况,还是仅解决哈希码不同的冲突,但(哈希值大小)是否相同?如果这不起作用,可能会怎样?

3 个解决方案

#1


25  

Hash collisions only affect performance, not integrity.

散列冲突只影响性能,而不影响完整性。

A simple test would be to change GetHashCode() to simply return 1;. You'll note that the dictionary still behaves properly, but with any reasonable dataset, it will perform terribly.

一个简单的测试是将GetHashCode()改为简单地返回1;。你会注意到字典仍然表现得很好,但是对于任何合理的数据集,它都会表现得非常糟糕。

#2


18  

Hash collisions will primarily affect performance - not correctness. So long as Equals() behaves correctly.

散列冲突主要影响性能 - 而不是正确性。只要Equals()行为正确。

Dictionary uses the hash code as a way to organize items into separate "buckets". If too many items share the same hash code, you can run into performance problems. However, as long as Equals() can correctly distinguish between instances, you should get correct results.

Dictionary使用哈希码作为将项目组织成单独的“桶”的方法。如果太多项共享相同的哈希代码,则可能会遇到性能问题。但是,只要Equals()可以正确区分实例,就应该得到正确的结果。

Where hash codes can result in problems is with mutable objects. If your Thingy class allows Foo or Bar to change for an item in the dictionary, you may then fail to find it in a subsequent access attempt. This is because the hash code produced now differs from the one used to store the value in the dictionary.

哈希码可能导致问题的地方是可变对象。如果您的Thingy类允许Foo或Bar更改字典中的项目,则可能无法在后续访问尝试中找到它。这是因为现在生成的哈希码与用于在字典中存储值的哈希码不同。

#3


1  

GetHashCode is designed for use in hash tables, where collisions need to be minimized but not eliminated. If you need to generate a truly unique key, GetHashCode is a reasonable starting point (and not as excessively long as a guid), but you will need to store the key as part of the object and maintain a list of used keys seperately.

GetHashCode设计用于哈希表,其中冲突需要最小化但不能消除。如果你需要生成一个真正唯一的密钥,GetHashCode是一个合理的起点(并不像guid那么长),但你需要将密钥存储为对象的一部分并单独维护一个使用过的密钥列表。

While you may be able to retrieve something that looks usable from the internals of Dictionary, it probably won't work reliably - for example if you add more items than the dictionary was initially allocated to handle, the underlying data structure will get rebuilt and individual items could end up in a completely different part of the dictionary.

虽然您可以从Dictionary的内部检索看起来可用的东西,但它可能无法可靠地工作 - 例如,如果您添加的项目多于最初分配用于处理的字典,则基础数据结构将得到重建和个性化项目最终可能会出现在字典中完全不同的部分。

#1


25  

Hash collisions only affect performance, not integrity.

散列冲突只影响性能,而不影响完整性。

A simple test would be to change GetHashCode() to simply return 1;. You'll note that the dictionary still behaves properly, but with any reasonable dataset, it will perform terribly.

一个简单的测试是将GetHashCode()改为简单地返回1;。你会注意到字典仍然表现得很好,但是对于任何合理的数据集,它都会表现得非常糟糕。

#2


18  

Hash collisions will primarily affect performance - not correctness. So long as Equals() behaves correctly.

散列冲突主要影响性能 - 而不是正确性。只要Equals()行为正确。

Dictionary uses the hash code as a way to organize items into separate "buckets". If too many items share the same hash code, you can run into performance problems. However, as long as Equals() can correctly distinguish between instances, you should get correct results.

Dictionary使用哈希码作为将项目组织成单独的“桶”的方法。如果太多项共享相同的哈希代码,则可能会遇到性能问题。但是,只要Equals()可以正确区分实例,就应该得到正确的结果。

Where hash codes can result in problems is with mutable objects. If your Thingy class allows Foo or Bar to change for an item in the dictionary, you may then fail to find it in a subsequent access attempt. This is because the hash code produced now differs from the one used to store the value in the dictionary.

哈希码可能导致问题的地方是可变对象。如果您的Thingy类允许Foo或Bar更改字典中的项目,则可能无法在后续访问尝试中找到它。这是因为现在生成的哈希码与用于在字典中存储值的哈希码不同。

#3


1  

GetHashCode is designed for use in hash tables, where collisions need to be minimized but not eliminated. If you need to generate a truly unique key, GetHashCode is a reasonable starting point (and not as excessively long as a guid), but you will need to store the key as part of the object and maintain a list of used keys seperately.

GetHashCode设计用于哈希表,其中冲突需要最小化但不能消除。如果你需要生成一个真正唯一的密钥,GetHashCode是一个合理的起点(并不像guid那么长),但你需要将密钥存储为对象的一部分并单独维护一个使用过的密钥列表。

While you may be able to retrieve something that looks usable from the internals of Dictionary, it probably won't work reliably - for example if you add more items than the dictionary was initially allocated to handle, the underlying data structure will get rebuilt and individual items could end up in a completely different part of the dictionary.

虽然您可以从Dictionary的内部检索看起来可用的东西,但它可能无法可靠地工作 - 例如,如果您添加的项目多于最初分配用于处理的字典,则基础数据结构将得到重建和个性化项目最终可能会出现在字典中完全不同的部分。