MySQL:将SHA1哈希加载到BINARY(20)列中

时间:2022-10-16 00:31:10

I'm going to be loading a billion rows into a mySQL table, one column of which - BINARY(20) - is the SHA1 hash of several other columns, concatenated. Offhand I don't see how to use the LOAD command to load binary values, because it seems to rely upon delimiters.

我要将十亿行加载到mySQL表中,其中一列--BINARY(20) - 是其他几列的SHA1哈希值,连接起来。我不知道如何使用LOAD命令加载二进制值,因为它似乎依赖于分隔符。

Obviously, speed is important here, which is why I want to use LOAD. Does anyone know how to load a fixed-length binary value with LOAD? Is this perhaps a job for a trigger? (I've never used triggers before.) Or can I invoke a function (e.g. UNHEX) in the LOAD command?

显然,速度在这里很重要,这就是我想使用LOAD的原因。有谁知道如何用LOAD加载固定长度的二进制值?这可能是触发器的工作吗? (我以前从未使用过触发器。)或者我可以在LOAD命令中调用一个函数(例如UNHEX)吗?

(Since it seems to be a common question: no, I don't want to store it in base64 or hex notation. BINARY(20) is a requirement.)

(因为它似乎是一个常见的问题:不,我不想将它存储在base64或hex表示法中.BINARY(20)是必需的。)

2 个解决方案

#1


0  

Binary data and LOAD DATA INFILE are not friends. The file format specifiers need a delimiter, and arbitrary binary data is length delimited, not field delimited.

二进制数据和LOAD DATA INFILE不是朋友。文件格式说明符需要分隔符,并且任意二进制数据是长度分隔的,而不是字段分隔的。

Your best bet is to use large multi-INSERT statements and tough it out. These can handle having hex-encoded strings decoded and dropped into BINARY columns automatically.

您最好的选择是使用大型多INSERT语句并强制执行。这些可以处理十六进制编码的字符串解码并自动放入BINARY列。

I'm not sure why anyone would wish this misery upon themselves, though. Saving twenty bytes a row versus standard hex notation is not worth the trouble.

不过,我不确定为什么有人会希望自己有这种痛苦。与标准十六进制表示法相比,每行节省20个字节是不值得的。

If you really need to load in kajillions of rows, maybe MySQL is not the best platform to do it on. What you should be doing is either sharding that data into multiple tables or databases, or using a NoSQL store to split it up more effectively.

如果你真的需要加载kajillions的行,那么MySQL可能不是最好的平台。您应该做的是将数据分片到多个表或数据库,或使用NoSQL存储更有效地分割它。

#2


0  

This seems to be a reasonable approach: to use the SET form of LOAD, using variables and invoking functions such as UNHEX and CONCAT.

这似乎是一种合理的方法:使用LOAD的SET形式,使用变量和调用UNHEX和CONCAT等函数。

For example:

Suppose mytable has four columns:

假设mytable有四列:

mysha1  BINARY(20)
a       VARCHAR(20)
b       VARCHAR(20)
c       VARCHAR(20)

Column mysha1 is the sha1 hash of a, b, and c concatenated with '|' as a separator.

列mysha1是a,b和c的sha1哈希与'|'连接作为分隔符。

And suppose the input file is tab-delimited text lines of three fields apiece:

并假设输入文件是三个字段的制表符分隔的文本行:

abel\tbaker\tcharlie\t\n
dog\teasy\tfor\t\n
etc\tetc\tetc\t\n

Here's how I'd load the table

这是我加载表格的方式

LOAD DATA INFILE '/foo/bar/input.txt' INTO TABLE mytable 
FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
(@f1, @f2, @f3) SET mysha1 = UNHEX(SHA1(CONCAT_WS('|', @f1, @f2, @f3))), 
a=@f1, b=@f2, c=@f3;

UPDATE: in the general case, for the arbitrary binary value that can't be computed with a builtin function such as SHA1, the binary value must be expressed in the INFILE as a displayable-hex string, read into an @variable, and then converted into binary with the UNHEX function. E.g.:

更新:在一般情况下,对于无法使用内置函数(如SHA1)计算的任意二进制值,二进制值必须在INFILE中表示为可显示的十六进制字符串,读入@variable,然后使用UNHEX函数转换为二进制。例如。:

mytable:

mybin8    BINARY(8)
a         VARCHAR(20)
b         VARCHAR(20)
c         VARCHAR(20)

input file:

abel\tbaker\tcharlie\t0123456789abcdef\n
dog\teasy\tfox\t2468ace13579bdf\n
etc\tetc\tetc\t0000000000000000\n

load command:

LOAD DATA INFILE '/foo/bar/input.txt' INTO TABLE mytable 
FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
(a, b, c, @myhex) SET mybin8 = UNHEX(@myhex);

#1


0  

Binary data and LOAD DATA INFILE are not friends. The file format specifiers need a delimiter, and arbitrary binary data is length delimited, not field delimited.

二进制数据和LOAD DATA INFILE不是朋友。文件格式说明符需要分隔符,并且任意二进制数据是长度分隔的,而不是字段分隔的。

Your best bet is to use large multi-INSERT statements and tough it out. These can handle having hex-encoded strings decoded and dropped into BINARY columns automatically.

您最好的选择是使用大型多INSERT语句并强制执行。这些可以处理十六进制编码的字符串解码并自动放入BINARY列。

I'm not sure why anyone would wish this misery upon themselves, though. Saving twenty bytes a row versus standard hex notation is not worth the trouble.

不过,我不确定为什么有人会希望自己有这种痛苦。与标准十六进制表示法相比,每行节省20个字节是不值得的。

If you really need to load in kajillions of rows, maybe MySQL is not the best platform to do it on. What you should be doing is either sharding that data into multiple tables or databases, or using a NoSQL store to split it up more effectively.

如果你真的需要加载kajillions的行,那么MySQL可能不是最好的平台。您应该做的是将数据分片到多个表或数据库,或使用NoSQL存储更有效地分割它。

#2


0  

This seems to be a reasonable approach: to use the SET form of LOAD, using variables and invoking functions such as UNHEX and CONCAT.

这似乎是一种合理的方法:使用LOAD的SET形式,使用变量和调用UNHEX和CONCAT等函数。

For example:

Suppose mytable has four columns:

假设mytable有四列:

mysha1  BINARY(20)
a       VARCHAR(20)
b       VARCHAR(20)
c       VARCHAR(20)

Column mysha1 is the sha1 hash of a, b, and c concatenated with '|' as a separator.

列mysha1是a,b和c的sha1哈希与'|'连接作为分隔符。

And suppose the input file is tab-delimited text lines of three fields apiece:

并假设输入文件是三个字段的制表符分隔的文本行:

abel\tbaker\tcharlie\t\n
dog\teasy\tfor\t\n
etc\tetc\tetc\t\n

Here's how I'd load the table

这是我加载表格的方式

LOAD DATA INFILE '/foo/bar/input.txt' INTO TABLE mytable 
FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
(@f1, @f2, @f3) SET mysha1 = UNHEX(SHA1(CONCAT_WS('|', @f1, @f2, @f3))), 
a=@f1, b=@f2, c=@f3;

UPDATE: in the general case, for the arbitrary binary value that can't be computed with a builtin function such as SHA1, the binary value must be expressed in the INFILE as a displayable-hex string, read into an @variable, and then converted into binary with the UNHEX function. E.g.:

更新:在一般情况下,对于无法使用内置函数(如SHA1)计算的任意二进制值,二进制值必须在INFILE中表示为可显示的十六进制字符串,读入@variable,然后使用UNHEX函数转换为二进制。例如。:

mytable:

mybin8    BINARY(8)
a         VARCHAR(20)
b         VARCHAR(20)
c         VARCHAR(20)

input file:

abel\tbaker\tcharlie\t0123456789abcdef\n
dog\teasy\tfox\t2468ace13579bdf\n
etc\tetc\tetc\t0000000000000000\n

load command:

LOAD DATA INFILE '/foo/bar/input.txt' INTO TABLE mytable 
FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
(a, b, c, @myhex) SET mybin8 = UNHEX(@myhex);