DB表键的其他Hash Key相关功能有哪些?

时间:2022-10-08 18:23:26

We are using Hash-Key function for one of the source tables to create a unique identifier key. But Hash-Key function has some limitations with respective to 32 bit integer. We tried using MD5 but we don't want to use Char based key for Char based data.

我们在其中一个源表中使用Hash-Key函数来创建唯一标识符键。但是Hash-Key函数对32位整数有一些限制。我们尝试使用MD5,但我们不想对基于Char的数据使用基于Char的密钥。

1 个解决方案

#1


You might find this question I asked interesting for further reading. One of the answers links to this MySQL documentation page which suggests using a VARBINARY field for strings of arbitrary byte values. You haven't tagged your question, so I will phrase the rest of this answer in terms of MySQL; hopefully your RDBMS of choice isn't too hard to translate to.

您可能会发现这个问题我有兴趣进一步阅读。其中一个答案链接到此MySQL文档页面,该页面建议对任意字节值的字符串使用VARBINARY字段。你没有标记你的问题,所以我将用MySQL来表达这个答案的其余部分;希望您选择的RDBMS不难翻译。

Many encryption and compression functions return strings for which the result might contain arbitrary byte values. If you want to store these results, use a column with a VARBINARY or BLOB binary string data type. This will avoid potential problems with trailing space removal or character set conversion that would change data values, such as may occur if you use a nonbinary string data type (CHAR, VARCHAR, TEXT).

许多加密和压缩函数返回结果可能包含任意字节值的字符串。如果要存储这些结果,请使用具有VARBINARY或BLOB二进制字符串数据类型的列。这将避免可能会更改数据值的尾随空格删除或字符集转换的潜在问题,例如,如果使用非二进制字符串数据类型(CHAR,VARCHAR,TEXT),则可能会出现此问题。

A hash function output is basically a very long number. You frequently see them as strings because many code libraries will display them as some encoded format (hexadecimal or Base32). As your question says, putting these into a nonbinary string fields is a bad idea and a waste of space and lookup time. So get your application to convert the output of the hash to binary data (most frequently a byte[]) and store them in a VARBINARY column.

哈希函数输出基本上是一个非常长的数字。您经常将它们视为字符串,因为许多代码库会将它们显示为某种编码格式(十六进制或Base32)。正如你的问题所说,将这些放入非二进制字符串字段是一个坏主意,浪费空间和查找时间。因此,让应用程序将哈希的输出转换为二进制数据(最常见的是byte [])并将它们存储在VARBINARY列中。

Another option is to leave it as a string and encode it Base32 (5 bits per byte) which wastes significantly less space than hexadecimal (4 bits per byte) - 25% less to be exact. The chief advantage of this is that the strings remain human readable and transmittable over common protocols without further encoding. This makes it easier to match your database to the web visible data, which can save a great deal of development and debugging time. Then set the column to use a _bin collation type, which speeds up the comparison at the cost of losing case sensitivity.

另一个选择是将其保留为字符串并将其编码为Base32(每字节5位),这比十六进制(每字节4位)浪费的空间要少得多 - 确切地说要少25%。这样做的主要优点是字符串保持人类可读性,并且可以通过通用协议传输而无需进一步编码。这样可以更轻松地将数据库与Web可见数据进行匹配,从而节省大量的开发和调试时间。然后将列设置为使用_bin排序规则类型,这会以降低区分大小写的代价加速比较。

Note that you can't use this trick with Base64 encoding (6 bits per byte) because the base64 output is itself case sensitive.

请注意,您不能将此技巧与Base64编码(每字节6位)一起使用,因为base64输出本身区分大小写。

#1


You might find this question I asked interesting for further reading. One of the answers links to this MySQL documentation page which suggests using a VARBINARY field for strings of arbitrary byte values. You haven't tagged your question, so I will phrase the rest of this answer in terms of MySQL; hopefully your RDBMS of choice isn't too hard to translate to.

您可能会发现这个问题我有兴趣进一步阅读。其中一个答案链接到此MySQL文档页面,该页面建议对任意字节值的字符串使用VARBINARY字段。你没有标记你的问题,所以我将用MySQL来表达这个答案的其余部分;希望您选择的RDBMS不难翻译。

Many encryption and compression functions return strings for which the result might contain arbitrary byte values. If you want to store these results, use a column with a VARBINARY or BLOB binary string data type. This will avoid potential problems with trailing space removal or character set conversion that would change data values, such as may occur if you use a nonbinary string data type (CHAR, VARCHAR, TEXT).

许多加密和压缩函数返回结果可能包含任意字节值的字符串。如果要存储这些结果,请使用具有VARBINARY或BLOB二进制字符串数据类型的列。这将避免可能会更改数据值的尾随空格删除或字符集转换的潜在问题,例如,如果使用非二进制字符串数据类型(CHAR,VARCHAR,TEXT),则可能会出现此问题。

A hash function output is basically a very long number. You frequently see them as strings because many code libraries will display them as some encoded format (hexadecimal or Base32). As your question says, putting these into a nonbinary string fields is a bad idea and a waste of space and lookup time. So get your application to convert the output of the hash to binary data (most frequently a byte[]) and store them in a VARBINARY column.

哈希函数输出基本上是一个非常长的数字。您经常将它们视为字符串,因为许多代码库会将它们显示为某种编码格式(十六进制或Base32)。正如你的问题所说,将这些放入非二进制字符串字段是一个坏主意,浪费空间和查找时间。因此,让应用程序将哈希的输出转换为二进制数据(最常见的是byte [])并将它们存储在VARBINARY列中。

Another option is to leave it as a string and encode it Base32 (5 bits per byte) which wastes significantly less space than hexadecimal (4 bits per byte) - 25% less to be exact. The chief advantage of this is that the strings remain human readable and transmittable over common protocols without further encoding. This makes it easier to match your database to the web visible data, which can save a great deal of development and debugging time. Then set the column to use a _bin collation type, which speeds up the comparison at the cost of losing case sensitivity.

另一个选择是将其保留为字符串并将其编码为Base32(每字节5位),这比十六进制(每字节4位)浪费的空间要少得多 - 确切地说要少25%。这样做的主要优点是字符串保持人类可读性,并且可以通过通用协议传输而无需进一步编码。这样可以更轻松地将数据库与Web可见数据进行匹配,从而节省大量的开发和调试时间。然后将列设置为使用_bin排序规则类型,这会以降低区分大小写的代价加速比较。

Note that you can't use this trick with Base64 encoding (6 bits per byte) because the base64 output is itself case sensitive.

请注意,您不能将此技巧与Base64编码(每字节6位)一起使用,因为base64输出本身区分大小写。