我可以依赖GetHashCode()的值来保持一致吗?

时间:2022-08-15 16:46:27

Is the return value of GetHashCode() guaranteed to be consistent assuming the same string value is being used? (C#/ASP.NET)

假设使用相同的字符串值,GetHashCode()的返回值是否保证一致? (C#/ ASP.NET)

I uploaded my code to a server today and to my surprise I had to reindex some data because my server (win2008 64-bit) was returning different values compared to my desktop computer.

我今天将我的代码上传到服务器,令我惊讶的是我不得不重新索引一些数据,因为我的服务器(win2008 64位)与台式机相比返回了不同的值。

9 个解决方案

#1


31  

If I'm not mistaken, GetHashCode is consistent given the same value, but it is NOT guaranteed to be consistent across different versions of the framework.

如果我没有弄错,GetHashCode在给定相同值的情况下是一致的,但不保证在不同版本的框架中保持一致。

From the MSDN docs on String.GetHashCode():

从String.GetHashCode()上的MSDN文档:

The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode.

GetHashCode的行为取决于其实现,该实现可能从公共语言运行库的一个版本更改为另一个版本。可能发生这种情况的原因是为了提高GetHashCode的性能。

#2


10  

I had a similar problem where I filled a database table with information which was dependent on String.GetHashCode (Not the best idea) and when I upgraded the server I was working on to x64 I noticed the values I was getting from String.GetHashCode were inconsistent with what was already in the table. My solution was to use my own version of GetHashCode which returns the same value as String.GetHashCode on a x86 framework.

我有一个类似的问题,我在数据库表中填充了依赖于String.GetHashCode的信息(不是最好的主意),当我升级服务器时,我正在研究x64,我注意到我从String.GetHashCode得到的值是与表中已有的内容不一致。我的解决方案是使用我自己的GetHashCode版本,它在x86框架上返回与String.GetHashCode相同的值。

Here's the code, don't forget to compile with "Allow unsafe code":

这是代码,不要忘记编译“允许不安全的代码”:

    /// <summary>
    /// Similar to String.GetHashCode but returns the same as the x86 version of String.GetHashCode for x64 and x86 frameworks.
    /// </summary>
    /// <param name="s"></param>
    /// <returns></returns>
    public static unsafe int GetHashCode32(string s)
    {
        fixed (char* str = s.ToCharArray())
        {
            char* chPtr = str;
            int num = 0x15051505;
            int num2 = num;
            int* numPtr = (int*)chPtr;
            for (int i = s.Length; i > 0; i -= 4)
            {
                num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
                if (i <= 2)
                {
                    break;
                }
                num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
                numPtr += 2;
            }
            return (num + (num2 * 0x5d588b65));
        }
    }

#3


5  

The implementation is dependent on the version of the framework but it also depends on the architecture. The implementation of string.GetHashCode() is dfferent in the x86 and x64 versions of the framework even if they have the same version number.

实现取决于框架的版本,但也取决于体系结构。 string.GetHashCode()的实现在框架的x86和x64版本中是不同的,即使它们具有相同的版本号。

#4


1  

    /// <summary>
    /// Default implementation of string.GetHashCode is not consistent on different platforms (x32/x64 which is our case) and frameworks. 
    /// FNV-1a - (Fowler/Noll/Vo) is a fast, consistent, non-cryptographic hash algorithm with good dispersion. (see http://isthe.com/chongo/tech/comp/fnv/#FNV-1a)
    /// </summary>
    private static int GetFNV1aHashCode(string str)
    {
        if (str == null)
            return 0;
        var length = str.Length;
        // original FNV-1a has 32 bit offset_basis = 2166136261 but length gives a bit better dispersion (2%) for our case where all the strings are equal length, for example: "3EC0FFFF01ECD9C4001B01E2A707"
        int hash = length;
        for (int i = 0; i != length; ++i)
            hash = (hash ^ str[i]) * 16777619;
        return hash;
    }

This implementation can be slower than the unsafe one posted before. But much simpler and safe.

此实现可能比之前发布的不安全实现慢。但更简单,更安全。

#5


0  

I wonder if there are differences between 32-bit and 64-bit operating systems, because I am certain both my server and home computer are running the same version of .NET

我想知道32位和64位操作系统之间是否存在差异,因为我确定我的服务器和家用计算机都运行相同版本的.NET

I was always weary of using GetHashCode(), it might be a good idea for me to simply role my own hash algorithm. Well at least I ended up writing a quick re-index .aspx page because of it.

我一直厌倦了使用GetHashCode(),对我来说,简单地使用自己的哈希算法可能是一个好主意。好吧,至少我最终写了一个快速的重新索引.aspx页面因为它。

#6


0  

Are you running Win2008 x86 as your desktop? Because Win2008 includes version 2.0.50727.1434, which is an updated version of 2.0 included in Vista RTM.

您是否正在运行Win2008 x86作为桌面?因为Win2008包含版本2.0.50727.1434,这是Vista RTM中包含的2.0的更新版本。

#7


0  

What we did notice however, when an object is in a hashed collection object (a hashtable, a dictionary etc), when 2 objects are not unique but their hashcodes are, the hashcode is only used as a first option lookup, if there are non-unique hash codes being used, the equality operator is always used as a fall back to detirmine equality.

然而,我们注意到,当一个对象在散列集合对象(散列表,字典等)中时,当2个对象不是唯一但是它们的散列码是,则散列码仅用作第一个选项查找,如果有非使用的是唯一的哈希码,相等运算符总是用作退化等级的平等。

This is the way hash lookups work, right? Each bucket contains a list of items having the same hash code.

这是散列查找的工作方式,对吗?每个桶包含具有相同哈希码的项目列表。

So to find the correct item under these conditions a linear search using value equality comparison takes place.

因此,要在这些条件下找到正确的项目,需要使用值相等比较进行线性搜索。

And if your hashing implementation achieves good distribution, this search is not required, i.e., one item per bucket.

如果您的哈希实现实现了良好的分发,则不需要此搜索,即每个桶一个项目。

Is my understanding correct?

我的理解是否正确?

#8


0  

Not a direct answer to your question, which Jonas has answered well, however this may be of assistance if you are worried about equality testing in hashes

不能直接回答你的问题,Jonas已经回答得很好,但如果你担心哈希中的平等测试,这可能会有所帮助

From our tests, depending on what you are requiring with hashcodes, in C#, hashcodes do not need to be unique for Equality operations. As an example, consider the following:

根据我们的测试,根据您对哈希码的要求,在C#中,对于Equality操作,哈希码不需要是唯一的。例如,请考虑以下事项:

We had a requirement to overload the equals operator, and therefore the GetHashCode function of our objects as they had become volatile and stateless, and sourcing themselves directly from data, so in one place of the application we needed to ensure that an object would be viewed as equal to another object if it was sourced from the same data, not just if it was the same reference. Our unique data identifiers are Guids.

我们需要重载equals运算符,因此我们的对象的GetHashCode函数变得易失和无状态,并直接从数据中获取,因此在应用程序的一个位置我们需要确保查看对象如果它源自相同的数据,则等同于另一个对象,而不仅仅是它是相同的引用。我们唯一的数据标识符是Guids。

The equals operator was easy to cater for as we just checked on the Guid of the record (after checking for null).

我们刚刚检查了记录的Guid(检查为null之后),equals运算符很容易满足。

Unfortuantely the HashCode data size (being an int) depends on the operating system, and on our 32 bit system, the hashcode would be 32 bit. Mathematically, when we override the GetHashCode function, it is impossible to generate a unique hashcode from a guid which is greater than 32 bit (look at it from the converse, how would you translate a 32 bit integer into a guid?).

不幸的是,HashCode数据大小(作为int)取决于操作系统,而在我们的32位系统上,哈希码将是32位。在数学上,当我们覆盖GetHashCode函数时,不可能从大于32位的guid生成唯一的哈希码(从相反的角度来看,如何将32位整数转换为guid?)。

We then did some tests where we took the Guid as a string and returned the HashCode of the Guid, which almost always returns a unique identifier in our tests, but not always.

然后我们做了一些测试,我们将Guid作为一个字符串并返回Guid的HashCode,它几乎总是在我们的测试中返回一个唯一的标识符,但并非总是如此。

What we did notice however, when an object is in a hashed collection object (a hashtable, a dictionary etc), when 2 objects are not unique but their hashcodes are, the hashcode is only used as a first option lookup, if there are non-unique hash codes being used, the equality operator is always used as a fall back to detirmine equality.

然而,我们注意到,当一个对象在散列集合对象(散列表,字典等)中时,当2个对象不是唯一但是它们的散列码是,则散列码仅用作第一个选项查找,如果有非使用的是唯一的哈希码,相等运算符总是用作退化等级的平等。

As I said this may or may not be relevant to your situation, but if it is it's a handy tip.

正如我所说,这可能与您的情况有关,也可能与您的情况无关,但如果它是一个方便的提示。

UPDATE

To demonstrate, we have a Hashtable:

为了演示,我们有一个Hashtable:

Key:Object A (Hashcode 1), value Object A1

密钥:对象A(哈希码1),值对象A1

Key:Object B (Hashcode 1), value Object B1

密钥:对象B(哈希码1),值对象B1

Key:Object C (Hashcode 1), value Object C1

密钥:对象C(哈希码1),值对象C1

Key:Object D (Hashcode 2), value Object D1

密钥:对象D(哈希码2),值对象D1

Key:Object E (Hashcode 3), value Object E1

密钥:对象E(哈希码3),值对象E1

When I call the hashtable for the object with the key of Object A, the object A1 will be returned after 2 steps, a call for hashcode 1, then an equality check on the key object as there is not a unique key with the hashcode 1

当我使用对象A的键调用对象的哈希表时,对象A1将在2步之后返回,调用哈希码1,然后对密钥对象进行相等性检查,因为没有哈希码1的唯一键

When I call the hashtable for the object with the key of Object D, the object D1 will be returned after 1 step, a hash lookup

当我用对象D的键调用对象的哈希表时,对象D1将在1步之后返回,即哈希查找

#9


-1  

I would have to Say...you cannot rely on it. For example if I run file1 through c#'s md5 hash code and copy nd paste the same file to a new directory...the hash code come out different even tough it is he same file. Obviously its the same .net version, same everything. The only thing that changed was the path.

我不得不说......你不能依赖它。例如,如果我通过c#的md5哈希码运行file1并复制nd将相同的文件粘贴到一个新目录......哈希码就会变得与众不同甚至更难,因为它是同一个文件。显然它是相同的.net版本,同样的一切。唯一改变的是路径。

#1


31  

If I'm not mistaken, GetHashCode is consistent given the same value, but it is NOT guaranteed to be consistent across different versions of the framework.

如果我没有弄错,GetHashCode在给定相同值的情况下是一致的,但不保证在不同版本的框架中保持一致。

From the MSDN docs on String.GetHashCode():

从String.GetHashCode()上的MSDN文档:

The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode.

GetHashCode的行为取决于其实现,该实现可能从公共语言运行库的一个版本更改为另一个版本。可能发生这种情况的原因是为了提高GetHashCode的性能。

#2


10  

I had a similar problem where I filled a database table with information which was dependent on String.GetHashCode (Not the best idea) and when I upgraded the server I was working on to x64 I noticed the values I was getting from String.GetHashCode were inconsistent with what was already in the table. My solution was to use my own version of GetHashCode which returns the same value as String.GetHashCode on a x86 framework.

我有一个类似的问题,我在数据库表中填充了依赖于String.GetHashCode的信息(不是最好的主意),当我升级服务器时,我正在研究x64,我注意到我从String.GetHashCode得到的值是与表中已有的内容不一致。我的解决方案是使用我自己的GetHashCode版本,它在x86框架上返回与String.GetHashCode相同的值。

Here's the code, don't forget to compile with "Allow unsafe code":

这是代码,不要忘记编译“允许不安全的代码”:

    /// <summary>
    /// Similar to String.GetHashCode but returns the same as the x86 version of String.GetHashCode for x64 and x86 frameworks.
    /// </summary>
    /// <param name="s"></param>
    /// <returns></returns>
    public static unsafe int GetHashCode32(string s)
    {
        fixed (char* str = s.ToCharArray())
        {
            char* chPtr = str;
            int num = 0x15051505;
            int num2 = num;
            int* numPtr = (int*)chPtr;
            for (int i = s.Length; i > 0; i -= 4)
            {
                num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
                if (i <= 2)
                {
                    break;
                }
                num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
                numPtr += 2;
            }
            return (num + (num2 * 0x5d588b65));
        }
    }

#3


5  

The implementation is dependent on the version of the framework but it also depends on the architecture. The implementation of string.GetHashCode() is dfferent in the x86 and x64 versions of the framework even if they have the same version number.

实现取决于框架的版本,但也取决于体系结构。 string.GetHashCode()的实现在框架的x86和x64版本中是不同的,即使它们具有相同的版本号。

#4


1  

    /// <summary>
    /// Default implementation of string.GetHashCode is not consistent on different platforms (x32/x64 which is our case) and frameworks. 
    /// FNV-1a - (Fowler/Noll/Vo) is a fast, consistent, non-cryptographic hash algorithm with good dispersion. (see http://isthe.com/chongo/tech/comp/fnv/#FNV-1a)
    /// </summary>
    private static int GetFNV1aHashCode(string str)
    {
        if (str == null)
            return 0;
        var length = str.Length;
        // original FNV-1a has 32 bit offset_basis = 2166136261 but length gives a bit better dispersion (2%) for our case where all the strings are equal length, for example: "3EC0FFFF01ECD9C4001B01E2A707"
        int hash = length;
        for (int i = 0; i != length; ++i)
            hash = (hash ^ str[i]) * 16777619;
        return hash;
    }

This implementation can be slower than the unsafe one posted before. But much simpler and safe.

此实现可能比之前发布的不安全实现慢。但更简单,更安全。

#5


0  

I wonder if there are differences between 32-bit and 64-bit operating systems, because I am certain both my server and home computer are running the same version of .NET

我想知道32位和64位操作系统之间是否存在差异,因为我确定我的服务器和家用计算机都运行相同版本的.NET

I was always weary of using GetHashCode(), it might be a good idea for me to simply role my own hash algorithm. Well at least I ended up writing a quick re-index .aspx page because of it.

我一直厌倦了使用GetHashCode(),对我来说,简单地使用自己的哈希算法可能是一个好主意。好吧,至少我最终写了一个快速的重新索引.aspx页面因为它。

#6


0  

Are you running Win2008 x86 as your desktop? Because Win2008 includes version 2.0.50727.1434, which is an updated version of 2.0 included in Vista RTM.

您是否正在运行Win2008 x86作为桌面?因为Win2008包含版本2.0.50727.1434,这是Vista RTM中包含的2.0的更新版本。

#7


0  

What we did notice however, when an object is in a hashed collection object (a hashtable, a dictionary etc), when 2 objects are not unique but their hashcodes are, the hashcode is only used as a first option lookup, if there are non-unique hash codes being used, the equality operator is always used as a fall back to detirmine equality.

然而,我们注意到,当一个对象在散列集合对象(散列表,字典等)中时,当2个对象不是唯一但是它们的散列码是,则散列码仅用作第一个选项查找,如果有非使用的是唯一的哈希码,相等运算符总是用作退化等级的平等。

This is the way hash lookups work, right? Each bucket contains a list of items having the same hash code.

这是散列查找的工作方式,对吗?每个桶包含具有相同哈希码的项目列表。

So to find the correct item under these conditions a linear search using value equality comparison takes place.

因此,要在这些条件下找到正确的项目,需要使用值相等比较进行线性搜索。

And if your hashing implementation achieves good distribution, this search is not required, i.e., one item per bucket.

如果您的哈希实现实现了良好的分发,则不需要此搜索,即每个桶一个项目。

Is my understanding correct?

我的理解是否正确?

#8


0  

Not a direct answer to your question, which Jonas has answered well, however this may be of assistance if you are worried about equality testing in hashes

不能直接回答你的问题,Jonas已经回答得很好,但如果你担心哈希中的平等测试,这可能会有所帮助

From our tests, depending on what you are requiring with hashcodes, in C#, hashcodes do not need to be unique for Equality operations. As an example, consider the following:

根据我们的测试,根据您对哈希码的要求,在C#中,对于Equality操作,哈希码不需要是唯一的。例如,请考虑以下事项:

We had a requirement to overload the equals operator, and therefore the GetHashCode function of our objects as they had become volatile and stateless, and sourcing themselves directly from data, so in one place of the application we needed to ensure that an object would be viewed as equal to another object if it was sourced from the same data, not just if it was the same reference. Our unique data identifiers are Guids.

我们需要重载equals运算符,因此我们的对象的GetHashCode函数变得易失和无状态,并直接从数据中获取,因此在应用程序的一个位置我们需要确保查看对象如果它源自相同的数据,则等同于另一个对象,而不仅仅是它是相同的引用。我们唯一的数据标识符是Guids。

The equals operator was easy to cater for as we just checked on the Guid of the record (after checking for null).

我们刚刚检查了记录的Guid(检查为null之后),equals运算符很容易满足。

Unfortuantely the HashCode data size (being an int) depends on the operating system, and on our 32 bit system, the hashcode would be 32 bit. Mathematically, when we override the GetHashCode function, it is impossible to generate a unique hashcode from a guid which is greater than 32 bit (look at it from the converse, how would you translate a 32 bit integer into a guid?).

不幸的是,HashCode数据大小(作为int)取决于操作系统,而在我们的32位系统上,哈希码将是32位。在数学上,当我们覆盖GetHashCode函数时,不可能从大于32位的guid生成唯一的哈希码(从相反的角度来看,如何将32位整数转换为guid?)。

We then did some tests where we took the Guid as a string and returned the HashCode of the Guid, which almost always returns a unique identifier in our tests, but not always.

然后我们做了一些测试,我们将Guid作为一个字符串并返回Guid的HashCode,它几乎总是在我们的测试中返回一个唯一的标识符,但并非总是如此。

What we did notice however, when an object is in a hashed collection object (a hashtable, a dictionary etc), when 2 objects are not unique but their hashcodes are, the hashcode is only used as a first option lookup, if there are non-unique hash codes being used, the equality operator is always used as a fall back to detirmine equality.

然而,我们注意到,当一个对象在散列集合对象(散列表,字典等)中时,当2个对象不是唯一但是它们的散列码是,则散列码仅用作第一个选项查找,如果有非使用的是唯一的哈希码,相等运算符总是用作退化等级的平等。

As I said this may or may not be relevant to your situation, but if it is it's a handy tip.

正如我所说,这可能与您的情况有关,也可能与您的情况无关,但如果它是一个方便的提示。

UPDATE

To demonstrate, we have a Hashtable:

为了演示,我们有一个Hashtable:

Key:Object A (Hashcode 1), value Object A1

密钥:对象A(哈希码1),值对象A1

Key:Object B (Hashcode 1), value Object B1

密钥:对象B(哈希码1),值对象B1

Key:Object C (Hashcode 1), value Object C1

密钥:对象C(哈希码1),值对象C1

Key:Object D (Hashcode 2), value Object D1

密钥:对象D(哈希码2),值对象D1

Key:Object E (Hashcode 3), value Object E1

密钥:对象E(哈希码3),值对象E1

When I call the hashtable for the object with the key of Object A, the object A1 will be returned after 2 steps, a call for hashcode 1, then an equality check on the key object as there is not a unique key with the hashcode 1

当我使用对象A的键调用对象的哈希表时,对象A1将在2步之后返回,调用哈希码1,然后对密钥对象进行相等性检查,因为没有哈希码1的唯一键

When I call the hashtable for the object with the key of Object D, the object D1 will be returned after 1 step, a hash lookup

当我用对象D的键调用对象的哈希表时,对象D1将在1步之后返回,即哈希查找

#9


-1  

I would have to Say...you cannot rely on it. For example if I run file1 through c#'s md5 hash code and copy nd paste the same file to a new directory...the hash code come out different even tough it is he same file. Obviously its the same .net version, same everything. The only thing that changed was the path.

我不得不说......你不能依赖它。例如,如果我通过c#的md5哈希码运行file1并复制nd将相同的文件粘贴到一个新目录......哈希码就会变得与众不同甚至更难,因为它是同一个文件。显然它是相同的.net版本,同样的一切。唯一改变的是路径。