我应该如何实现Object.GetHashCode()以实现复杂的相等?

时间:2023-01-24 16:11:21

Basically, I have the following so far:

基本上,到目前为止我有以下内容:

class Foo {
    public override bool Equals(object obj)
    {
        Foo d = obj as Foo ;
        if (d == null)
            return false;

        return this.Equals(d);
    }

    #region IEquatable<Foo> Members

    public bool Equals(Foo other)
    {
        if (this.Guid != String.Empty && this.Guid == other.Guid)
            return true;
        else if (this.Guid != String.Empty || other.Guid != String.Empty)
            return false;

        if (this.Title == other.Title &&
            this.PublishDate == other.PublishDate &&
            this.Description == other.Description)
            return true;

        return false;
    }
}

So, the problem is this: I have a non-required field Guid, which is a unique identifier. If this isn't set, then I need to try to determine equality based on less accurate metrics as an attempt at determining if two objects are equal. This works fine, but it make GetHashCode() messy... How should I go about it? A naive implementation would be something like:

所以,问题是这样的:我有一个非必需的字段Guid,它是一个唯一的标识符。如果没有设置,那么我需要尝试根据不太准确的度量确定相等性,以尝试确定两个对象是否相等。这很好用,但它让GetHashCode()变得混乱......我应该怎么做呢?一个天真的实现将是这样的:

public override int GetHashCode() {
    if (this.Guid != String.Empty)
        return this.Guid.GetHashCode();

    int hash = 37;
    hash = hash * 23 + this.Title.GetHashCode();
    hash = hash * 23 + this.PublishDate.GetHashCode();
    hash = hash * 23 + this.Description.GetHashCode();
    return hash;
}

But what are the chances of the two types of hash colliding? Certainly, I wouldn't expect it to be 1 in 2 ** 32. Is this a bad idea, and if so, how should I be doing it?

但是这两种哈希冲突的可能性有多大?当然,我不希望它是1比2 ** 32.这是一个坏主意,如果是这样,我该怎么做呢?

2 个解决方案

#1


I don't think there is a problem with the approach you have chosen to use. Worrying 'too much' about hash collisions is almost always an indication of over-thinking the problem; as long as the hash is highly likely to be different you should be fine.

我不认为您选择使用的方法存在问题。担心“太多”哈希冲突几乎总是表明过度思考问题;只要哈希很可能不同,你应该没问题。

Ultimately you may even want to consider leaving out the Description from your hash anyway if it is reasonable to expect that most of the time objects can be distinguished based on their title and publication date (books?).

最终,如果可以合理地预期大多数时间对象可以根据其标题和出版日期(书籍?)进行区分,您甚至可能会考虑从哈希中省略描述。

You could even consider disregarding the GUID in your hash function altogether, and only use it in the Equals implementation to disambiguate the unlikely(?) case of hash *es.

您甚至可以考虑完全忽略哈希函数中的GUID,并且仅在Equals实现中使用它来消除哈希冲突的不太可能(?)的情况。

#2


A very easy hash code method for custom classes is to bitwise XOR each of the fields' hash codes together. It can be as simple as this:

自定义类的一种非常简单的哈希代码方法是将每个字段的哈希代码按位异或。它可以这么简单:

int hash = 0;
hash ^= this.Title.GetHashCode();
hash ^= this.PublishDate.GetHashCode();
hash ^= this.Description.GetHashCode();
return hash;

From the link above:

从上面的链接:

XOR has the following nice properties:

XOR具有以下不错的属性:

  • It does not depend on order of computation.
  • 它不依赖于计算顺序。

  • It does not “waste” bits. If you change even one bit in one of the components, the final value will change.
  • 它不会“浪费”比特。如果您更改其中一个组件中的一位,则最终值将更改。

  • It is quick, a single cycle on even the most primitive computer.
  • 它是快速的,甚至是最原始的计算机上的单个循环。

  • It preserves uniform distribution. If the two pieces you combine are uniformly distributed so will the combination be. In other words, it does not tend to collapse the range of the digest into a narrower band.
  • 它保持均匀分布。如果你组合的两个部分是均匀分布的,那么组合就是这样。换句话说,它不会将摘要的范围折叠成更窄的范围。

XOR doesn't work well if you expect to have duplicate values in your fields as duplicate values will cancel each other out when XORed. Since you're hashing together three unrelated fields that should not be a problem in this case.

如果您希望在字段中具有重复值,则XOR不能正常工作,因为重复值将在XORed时相互抵消。由于您将三个不相关的字段散列在一起,在这种情况下不应该成为问题。

#1


I don't think there is a problem with the approach you have chosen to use. Worrying 'too much' about hash collisions is almost always an indication of over-thinking the problem; as long as the hash is highly likely to be different you should be fine.

我不认为您选择使用的方法存在问题。担心“太多”哈希冲突几乎总是表明过度思考问题;只要哈希很可能不同,你应该没问题。

Ultimately you may even want to consider leaving out the Description from your hash anyway if it is reasonable to expect that most of the time objects can be distinguished based on their title and publication date (books?).

最终,如果可以合理地预期大多数时间对象可以根据其标题和出版日期(书籍?)进行区分,您甚至可能会考虑从哈希中省略描述。

You could even consider disregarding the GUID in your hash function altogether, and only use it in the Equals implementation to disambiguate the unlikely(?) case of hash *es.

您甚至可以考虑完全忽略哈希函数中的GUID,并且仅在Equals实现中使用它来消除哈希冲突的不太可能(?)的情况。

#2


A very easy hash code method for custom classes is to bitwise XOR each of the fields' hash codes together. It can be as simple as this:

自定义类的一种非常简单的哈希代码方法是将每个字段的哈希代码按位异或。它可以这么简单:

int hash = 0;
hash ^= this.Title.GetHashCode();
hash ^= this.PublishDate.GetHashCode();
hash ^= this.Description.GetHashCode();
return hash;

From the link above:

从上面的链接:

XOR has the following nice properties:

XOR具有以下不错的属性:

  • It does not depend on order of computation.
  • 它不依赖于计算顺序。

  • It does not “waste” bits. If you change even one bit in one of the components, the final value will change.
  • 它不会“浪费”比特。如果您更改其中一个组件中的一位,则最终值将更改。

  • It is quick, a single cycle on even the most primitive computer.
  • 它是快速的,甚至是最原始的计算机上的单个循环。

  • It preserves uniform distribution. If the two pieces you combine are uniformly distributed so will the combination be. In other words, it does not tend to collapse the range of the digest into a narrower band.
  • 它保持均匀分布。如果你组合的两个部分是均匀分布的,那么组合就是这样。换句话说,它不会将摘要的范围折叠成更窄的范围。

XOR doesn't work well if you expect to have duplicate values in your fields as duplicate values will cancel each other out when XORed. Since you're hashing together three unrelated fields that should not be a problem in this case.

如果您希望在字段中具有重复值,则XOR不能正常工作,因为重复值将在XORed时相互抵消。由于您将三个不相关的字段散列在一起,在这种情况下不应该成为问题。