I have a website where users can upload their files; these are stored on the server and their metadata recorded in a database. I'm implementing some simple integrity checks, i.e. "is the content of this file now byte-for-byte identical as when it was uploaded?"
我有一个网站,用户可以上传他们的文件;这些存储在服务器上,其元数据记录在数据库中。我正在实现一些简单的完整性检查,即“这个文件的内容现在是逐字节的,与上传时相同吗?”
An example: for content of userfile.jpg
, MD5 hash is 39f9031a154dc7ba105eb4f76f1a0fd4
and SHA-1 hash is 878d8d667721e356bf6646bd2ec21fff50cdd4a9
. If this file's content changes, but has the same MD5 hash before and after, is it probable that the SHA-1 hash will also stay the same? (With hashing, sometimes you can get a hash collision - could this happen with two different hashing algorithms at once?)
例如:对于userfile.jpg的内容,MD5哈希是39f9031a154dc7ba105eb4f76f1a0fd4,SHA-1哈希是878d8d667721e356bf6646bd2ec21fff50cdd4a9。如果此文件的内容发生更改,但前后具有相同的MD5哈希值,则SHA-1哈希值是否也可能保持不变? (使用散列,有时你可以得到哈希冲突 - 这可能会同时发生两种不同的哈希算法吗?)
Or is computing two different hashes for a file pointless (and I should try some other mechanism for verifying integrity)?
或者为一个文件计算两个不同的哈希值没有意义(我应该尝试其他一些机制来验证完整性)?
Edit: I'm not really worried about accidental corruption, but I'm supposed to prevent users changing the file unnoticed (birthday attack and friends).
编辑:我并不是真的担心意外腐败,但我应该阻止用户不加注意地改变文件(生日攻击和朋友)。
I'll probably go with one hash, SHA-512 - the checks don't happen that often to be a performance bottleneck and anyway, "As Bruce Schneier says, there's enough fast, insecure systems out there already. –@MichaelGG in the comments".
我可能会选择一个哈希值SHA-512 - 这种检查通常不会成为性能瓶颈,无论如何,“正如布鲁斯施奈尔所说,已经存在足够快速,不安全的系统。 - @ MichaelGG评论”。
6 个解决方案
#1
MD5 is probably safe for what you're doing, but there's no reason to continue to use a hash with known flaws. In fact, there's no reason you shouldn't be usign SHA256 or SHA512, unless you have some known major performance bottleneck.
MD5对你正在做的事情可能是安全的,但是没有理由继续使用已知缺陷的哈希。事实上,没有理由你不应该使用SHA256或SHA512,除非你有一些已知的主要性能瓶颈。
Edit: To clarify, there's no reason to use two algorithms; just use one that fits what you need. If you're worried about people doing MD5 collisions on you (as in, is this a security threat?), then use an algorithm that isn't as weak, such as SHA256.
编辑:澄清一下,没有理由使用两种算法;只需使用符合您需要的产品。如果您担心人们会对您进行MD5冲突(例如,这是一种安全威胁吗?),那么请使用一种不那么弱的算法,例如SHA256。
Edit 2: To address an apparently still common misunderstanding: Finding a random collision on a hash is not a 1/2^n probability. It's closer to 1/2^(n/2). So a 128-bit hash can probably be collided with 2^64 attempts. See birthday attack for details.
编辑2:解决一个显然仍然常见的误解:在哈希上找到随机碰撞的概率不是1/2 ^ n。它接近1/2 ^(n / 2)。因此,128位散列可能会与2 ^ 64次尝试发生冲突。有关详情,请参阅生日攻击
#2
Checking the MD5 hash by itself is sufficient for most purposes. Although if you must, there is no harm in checking the SHA1 in addition. Keep in mind the possibility of catching something you would miss with just the MD5 check is extremely remote.
对于大多数用途,检查MD5哈希本身就足够了。虽然如果必须,但另外检查SHA1没有坏处。请记住,只有MD5检查才能捕捉到您想念的东西,这是非常遥远的。
Note that in terms of scalability, the additional check adds unnecessary load on your server.
请注意,就可伸缩性而言,附加检查会在服务器上增加不必要的负载。
#3
For file integrity (e.g. accidental/random corruption), one hash should suffice. 128 bits = 2-128 probability of an undetected error, which is for all practical purposes small enough.
对于文件完整性(例如,意外/随机损坏),一个哈希就足够了。 128位=未检测到错误的概率为2-128,这对于所有实际目的来说都足够小。
For file cryptographic integrity (e.g. assurance that someone hasn't maliciously substituted an alternate file), I think you're talking about a belt-and-suspenders approach.
对于文件加密完整性(例如,保证某人没有恶意替换备用文件),我认为您正在谈论一种腰带和吊带方法。
MD5 is considered "weak" in the sense that it is possible to construct two documents with the same hash with a much lower amount of CPU time needed than it would take for a brute-force search ("collision resistance" of MD5 has been broken).
MD5被认为是“弱”的,因为它可以构造两个具有相同哈希值的文档,所需的CPU时间比蛮力搜索所需的CPU时间少得多(MD5的“抗冲击性”已被打破) )。
But it's not (as far as I know) "weak" from the standpoint of, if you have an arbitrary document X, someone else can create a document Y with the same hash with a much easier time than a brute-force search (MD5 still has "preimage resistance"). (The distinction is like the difference between going to a party and finding two people with the same birthday, vs. finding another person with the same birthday as yours.)
但是,据我所知,它不是“弱”,如果你有一个任意文档X,其他人可以使用相同的哈希创建一个文档Y,比蛮力搜索更容易一些(MD5)仍然有“原像抗性”)。 (这种区别就像参加派对和找到同一个生日的两个人之间的区别,而不是找到与你的生日相同的另一个人。)
Even if MD5 is broken in that regard, it's improbable that someone can come up with an algorithm to create documents to match an arbitrary MD5 hash and an arbritrary SHA1 hash.
即使MD5在这方面被打破,也不可能有人能够提出一种算法来创建文档以匹配任意MD5哈希和arbritrary SHA1哈希。
This sounds kind of like the tension between the two maxims "don't put all your eggs in one basket" vs. "put all your eggs in one basket, and watch the basket". Or like spending money on two deadbolt locks vs. one deadbolt lock which is twice as good and costs twice as much. Ideally it would be best to spend CPU time calculating one secure 256-bit hash instead of two less secure 128-bit hashes using different algorithms. (yes I know SHA1 is 160bit, this is just an illustration) You're more likely to get better performance this way for a desired level of security -- that is, if the 256-bit hash isn't broken. If it is broken, you may be better off with the two-algorithm approach just to hedge your bets.
这听起来有点像两个格言之间的紧张关系“不要把所有鸡蛋放在一个篮子里”而不是“把所有的鸡蛋放在一个篮子里,然后看篮子”。或者喜欢在两个电子锁上花钱而不是一个电子锁,这个电阻是两倍,并且成本是原来的两倍。理想情况下,最好花费CPU时间来计算一个安全的256位散列,而不是使用不同算法的两个不太安全的128位散列。 (是的,我知道SHA1是160位,这只是一个例子)你更有可能以这种方式获得更好的性能以获得所需的安全级别 - 也就是说,如果256位散列没有被破坏。如果它被打破了,你可能会更好地使用双算法来对冲你的赌注。
But again if this is just integrity to protect against errors, one MD5 hash is fine.
但是,如果这只是保护错误的完整性,那么一个MD5哈希就可以了。
edit: to cite some useful sources: 1 2 3, "MD5 considered harmful today", RFC4270, NIST's latest update on the SHA-3 competition, and "The SHA-3 Zoo".
编辑:引用一些有用的资料来源:1 2 3,“MD5今天被认为有害”,RFC4270,NIST最新的SHA-3竞赛更新,以及“SHA-3动物园”。
#4
In general, if the MD5 hashes don't match, the SHA1 (or any other similar hash) won't match either. I'm not going to say there aren't possible cases where it couldn't happen (because we all know there are collisions in both algorithms), but I would say it will probably never happen in your situation.
通常,如果MD5哈希值不匹配,则SHA1(或任何其他类似哈希)也不匹配。我不会说没有可能发生的情况(因为我们都知道这两种算法都存在冲突),但我会说它可能永远不会发生在你的情况下。
My thoughts are that providing one hash is probably sufficient enough; more than one hash becomes arduous to verify (having to verify one is bad enough, depending upon available utilities for the platform), and I seriously doubt you're going to see such amazing corruption of a file as to lead to a perfect collision.
我的想法是提供一个哈希值就足够了;不止一个哈希变得很难验证(必须验证一个很糟糕,取决于平台的可用实用程序),我严重怀疑你会看到文件的这种惊人的损坏导致完美的冲突。
Note: Ignore the stuff about verification being a pain; upon re-reading the question, I revised this - I took the original meaning to be hash verification for users downloading the file. If, of course, that is what was meant, then what I said still applies, I think.
注意:忽略验证令人痛苦的事情;在重新阅读这个问题后,我对此进行了修改 - 我将原来的意思用于下载文件的用户的哈希验证。当然,如果这就是我的意思,那么我所说的仍然适用。
#5
Because the two hashes are calculated differently, two files with the same MD5 hash are no more likely to have the same SHA-1 hash than two random files. If your chance of random collision with either hash is (ballpark) 2^128, your chance of random collision in both will be 2^256.
由于两个哈希值的计算方式不同,因此具有相同MD5哈希值的两个文件不太可能具有与两个随机文件相同的SHA-1哈希值。如果你随机碰撞任何一个哈希的机会是(球场)2 ^ 128,你在两者中随机碰撞的几率将是2 ^ 256。
In effect, you go from extremely low to extremely, extremely low.
实际上,你从极低到极低,极低。
It's the equivilent of going from 128-bit to 256-bit encryption in order to avoid having someone randomly guess your 128-bit key.
它是从128位加密到256位加密的等效,以避免让某人随机猜测你的128位密钥。
#6
As a rough estimate, chance for a MD5 false positive is 1/(2^128), chance for a SHA-1 false positive is 1/(2^160), so the chance for a false positive for both algorithms is between 1/(2^128) and 1/(2^288), but you can be pretty sure that it is near 1/(2^288) as both algorithms have been thoroughly tested statistically.
粗略估计,MD5误报的几率为1 /(2 ^ 128),SHA-1误报的几率为1 /(2 ^ 160),因此两种算法的误报概率在1之间/(2 ^ 128)和1 /(2 ^ 288),但你可以非常肯定它接近1 /(2 ^ 288),因为两种算法都经过了统计学上的彻底测试。
At least, when using two different hashes, you are protected very well against intentional attacks at one of the algorithms.
至少,当使用两个不同的哈希时,您可以很好地保护其中一个算法的故意攻击。
EDIT: After some research, I stumbled upon this Wikipedia Note that MD5 birthday attacks can be done in under 1 minute, so it seems better to use a different algorithm as MD5 together with SHA-1 here. Birthday attacks for SHA-1 take 2^69 operations at the moment.
编辑:经过一些研究,我偶然发现这个*注意到MD5生日攻击可以在1分钟内完成,所以在这里使用不同的算法MD5和SHA-1似乎更好。 SHA-1的生日攻击此刻需要2 ^ 69次操作。
#1
MD5 is probably safe for what you're doing, but there's no reason to continue to use a hash with known flaws. In fact, there's no reason you shouldn't be usign SHA256 or SHA512, unless you have some known major performance bottleneck.
MD5对你正在做的事情可能是安全的,但是没有理由继续使用已知缺陷的哈希。事实上,没有理由你不应该使用SHA256或SHA512,除非你有一些已知的主要性能瓶颈。
Edit: To clarify, there's no reason to use two algorithms; just use one that fits what you need. If you're worried about people doing MD5 collisions on you (as in, is this a security threat?), then use an algorithm that isn't as weak, such as SHA256.
编辑:澄清一下,没有理由使用两种算法;只需使用符合您需要的产品。如果您担心人们会对您进行MD5冲突(例如,这是一种安全威胁吗?),那么请使用一种不那么弱的算法,例如SHA256。
Edit 2: To address an apparently still common misunderstanding: Finding a random collision on a hash is not a 1/2^n probability. It's closer to 1/2^(n/2). So a 128-bit hash can probably be collided with 2^64 attempts. See birthday attack for details.
编辑2:解决一个显然仍然常见的误解:在哈希上找到随机碰撞的概率不是1/2 ^ n。它接近1/2 ^(n / 2)。因此,128位散列可能会与2 ^ 64次尝试发生冲突。有关详情,请参阅生日攻击
#2
Checking the MD5 hash by itself is sufficient for most purposes. Although if you must, there is no harm in checking the SHA1 in addition. Keep in mind the possibility of catching something you would miss with just the MD5 check is extremely remote.
对于大多数用途,检查MD5哈希本身就足够了。虽然如果必须,但另外检查SHA1没有坏处。请记住,只有MD5检查才能捕捉到您想念的东西,这是非常遥远的。
Note that in terms of scalability, the additional check adds unnecessary load on your server.
请注意,就可伸缩性而言,附加检查会在服务器上增加不必要的负载。
#3
For file integrity (e.g. accidental/random corruption), one hash should suffice. 128 bits = 2-128 probability of an undetected error, which is for all practical purposes small enough.
对于文件完整性(例如,意外/随机损坏),一个哈希就足够了。 128位=未检测到错误的概率为2-128,这对于所有实际目的来说都足够小。
For file cryptographic integrity (e.g. assurance that someone hasn't maliciously substituted an alternate file), I think you're talking about a belt-and-suspenders approach.
对于文件加密完整性(例如,保证某人没有恶意替换备用文件),我认为您正在谈论一种腰带和吊带方法。
MD5 is considered "weak" in the sense that it is possible to construct two documents with the same hash with a much lower amount of CPU time needed than it would take for a brute-force search ("collision resistance" of MD5 has been broken).
MD5被认为是“弱”的,因为它可以构造两个具有相同哈希值的文档,所需的CPU时间比蛮力搜索所需的CPU时间少得多(MD5的“抗冲击性”已被打破) )。
But it's not (as far as I know) "weak" from the standpoint of, if you have an arbitrary document X, someone else can create a document Y with the same hash with a much easier time than a brute-force search (MD5 still has "preimage resistance"). (The distinction is like the difference between going to a party and finding two people with the same birthday, vs. finding another person with the same birthday as yours.)
但是,据我所知,它不是“弱”,如果你有一个任意文档X,其他人可以使用相同的哈希创建一个文档Y,比蛮力搜索更容易一些(MD5)仍然有“原像抗性”)。 (这种区别就像参加派对和找到同一个生日的两个人之间的区别,而不是找到与你的生日相同的另一个人。)
Even if MD5 is broken in that regard, it's improbable that someone can come up with an algorithm to create documents to match an arbitrary MD5 hash and an arbritrary SHA1 hash.
即使MD5在这方面被打破,也不可能有人能够提出一种算法来创建文档以匹配任意MD5哈希和arbritrary SHA1哈希。
This sounds kind of like the tension between the two maxims "don't put all your eggs in one basket" vs. "put all your eggs in one basket, and watch the basket". Or like spending money on two deadbolt locks vs. one deadbolt lock which is twice as good and costs twice as much. Ideally it would be best to spend CPU time calculating one secure 256-bit hash instead of two less secure 128-bit hashes using different algorithms. (yes I know SHA1 is 160bit, this is just an illustration) You're more likely to get better performance this way for a desired level of security -- that is, if the 256-bit hash isn't broken. If it is broken, you may be better off with the two-algorithm approach just to hedge your bets.
这听起来有点像两个格言之间的紧张关系“不要把所有鸡蛋放在一个篮子里”而不是“把所有的鸡蛋放在一个篮子里,然后看篮子”。或者喜欢在两个电子锁上花钱而不是一个电子锁,这个电阻是两倍,并且成本是原来的两倍。理想情况下,最好花费CPU时间来计算一个安全的256位散列,而不是使用不同算法的两个不太安全的128位散列。 (是的,我知道SHA1是160位,这只是一个例子)你更有可能以这种方式获得更好的性能以获得所需的安全级别 - 也就是说,如果256位散列没有被破坏。如果它被打破了,你可能会更好地使用双算法来对冲你的赌注。
But again if this is just integrity to protect against errors, one MD5 hash is fine.
但是,如果这只是保护错误的完整性,那么一个MD5哈希就可以了。
edit: to cite some useful sources: 1 2 3, "MD5 considered harmful today", RFC4270, NIST's latest update on the SHA-3 competition, and "The SHA-3 Zoo".
编辑:引用一些有用的资料来源:1 2 3,“MD5今天被认为有害”,RFC4270,NIST最新的SHA-3竞赛更新,以及“SHA-3动物园”。
#4
In general, if the MD5 hashes don't match, the SHA1 (or any other similar hash) won't match either. I'm not going to say there aren't possible cases where it couldn't happen (because we all know there are collisions in both algorithms), but I would say it will probably never happen in your situation.
通常,如果MD5哈希值不匹配,则SHA1(或任何其他类似哈希)也不匹配。我不会说没有可能发生的情况(因为我们都知道这两种算法都存在冲突),但我会说它可能永远不会发生在你的情况下。
My thoughts are that providing one hash is probably sufficient enough; more than one hash becomes arduous to verify (having to verify one is bad enough, depending upon available utilities for the platform), and I seriously doubt you're going to see such amazing corruption of a file as to lead to a perfect collision.
我的想法是提供一个哈希值就足够了;不止一个哈希变得很难验证(必须验证一个很糟糕,取决于平台的可用实用程序),我严重怀疑你会看到文件的这种惊人的损坏导致完美的冲突。
Note: Ignore the stuff about verification being a pain; upon re-reading the question, I revised this - I took the original meaning to be hash verification for users downloading the file. If, of course, that is what was meant, then what I said still applies, I think.
注意:忽略验证令人痛苦的事情;在重新阅读这个问题后,我对此进行了修改 - 我将原来的意思用于下载文件的用户的哈希验证。当然,如果这就是我的意思,那么我所说的仍然适用。
#5
Because the two hashes are calculated differently, two files with the same MD5 hash are no more likely to have the same SHA-1 hash than two random files. If your chance of random collision with either hash is (ballpark) 2^128, your chance of random collision in both will be 2^256.
由于两个哈希值的计算方式不同,因此具有相同MD5哈希值的两个文件不太可能具有与两个随机文件相同的SHA-1哈希值。如果你随机碰撞任何一个哈希的机会是(球场)2 ^ 128,你在两者中随机碰撞的几率将是2 ^ 256。
In effect, you go from extremely low to extremely, extremely low.
实际上,你从极低到极低,极低。
It's the equivilent of going from 128-bit to 256-bit encryption in order to avoid having someone randomly guess your 128-bit key.
它是从128位加密到256位加密的等效,以避免让某人随机猜测你的128位密钥。
#6
As a rough estimate, chance for a MD5 false positive is 1/(2^128), chance for a SHA-1 false positive is 1/(2^160), so the chance for a false positive for both algorithms is between 1/(2^128) and 1/(2^288), but you can be pretty sure that it is near 1/(2^288) as both algorithms have been thoroughly tested statistically.
粗略估计,MD5误报的几率为1 /(2 ^ 128),SHA-1误报的几率为1 /(2 ^ 160),因此两种算法的误报概率在1之间/(2 ^ 128)和1 /(2 ^ 288),但你可以非常肯定它接近1 /(2 ^ 288),因为两种算法都经过了统计学上的彻底测试。
At least, when using two different hashes, you are protected very well against intentional attacks at one of the algorithms.
至少,当使用两个不同的哈希时,您可以很好地保护其中一个算法的故意攻击。
EDIT: After some research, I stumbled upon this Wikipedia Note that MD5 birthday attacks can be done in under 1 minute, so it seems better to use a different algorithm as MD5 together with SHA-1 here. Birthday attacks for SHA-1 take 2^69 operations at the moment.
编辑:经过一些研究,我偶然发现这个*注意到MD5生日攻击可以在1分钟内完成,所以在这里使用不同的算法MD5和SHA-1似乎更好。 SHA-1的生日攻击此刻需要2 ^ 69次操作。