如何保护文件数据免受磁盘损坏？

Recently, I read an article entitled "SATA vs. SCSI reliability". It mostly discusses the very high rate bit flipping in consumer SATA drives and concludes "A 56% chance that you can't read all the data from a particular disk now". Even Raid-5 can't save us as it must be constantly scanned for problems and if a disk does die you are pretty much guaranteed to have some flipped bits on your rebuilt file system.

最近,我读了一篇题为“SATA与SCSI可靠性”的文章。它主要讨论消费类SATA驱动器中非常高速率的位翻转,并得出结论“现在有56%的可能无法从特定磁盘读取所有数据”。即使Raid-5也无法保存,因为必须不断扫描问题,如果磁盘坏了,你几乎可以确保在重建的文件系统上有一些翻转位。

Considerations:

I've heard great things about Sun's ZFS with Raid-Z but the Linux and BSD implementations are still experimental. I'm not sure it's ready for prime time yet.

我用Raid-Z听说过Sun的ZFS很棒的东西,但Linux和BSD实现仍然是实验性的。我不确定它是否准备好迎接黄金时段。

I've also read quite a bit about the Par2 file format. It seems like storing some extra % parity along with each file would allow you to recover from most problems. However, I am not aware of a file system that does this internally and it seems like it could be hard to manage the separate files.

我还读了很多关于Par2文件格式的内容。似乎存储一些额外的%奇偶校验以及每个文件将允许您从大多数问题中恢复。但是,我不知道文件系统在内部执行此操作,似乎很难管理单独的文件。

Backups (Edit):

I understand that backups are paramount. However, without some kind of check in place you could easily be sending bad data to people without even knowing it. Also figuring out which backup has a good copy of that data could be difficult.

我知道备份是最重要的。但是,如果没有某种检查,您很容易就会在不知情的情况下向人们发送错误的数据。同时确定哪个备份具有该数据的良好副本可能是困难的。

For instance, you have a Raid-5 array running for a year and you find a corrupted file. Now you have to go back checking your backups until you find a good copy. Ideally you would go to the first backup that included the file but that may be difficult to figure out, especially if the file has been edited many times. Even worse, consider if that file was appended to or edited after the corruption occurred. That alone is reason enough for block-level parity such as Par2.

例如,你有一个运行一年的Raid-5阵列,你发现一个损坏的文件。现在,您需要返回检查备份,直到找到好的副本。理想情况下,您将转到包含该文件的第一个备份,但可能很难弄清楚,特别是如果文件已被多次编辑。更糟糕的是,考虑是否在发生损坏后附加或编辑了该文件。仅这一点就足以成为像Par2这样的块级奇偶校验。

3 个解决方案

#1

ZFS is a start. Many storage vendors provide 520B drives with extra data protection available as well. However, this only protects your data as soon as it enters the storage fabric. If it was corrupted at the host level, then you are hosed anyway.

ZFS是一个开始。许多存储供应商也提供520B驱动器,并提供额外的数据保护。但是,这只会在数据进入存储结构后立即对其进行保护。如果它在主机级别被破坏,那么无论如何你都被冲洗了。

On the horizon are some promising standards-based solutions to this very problem. End-to-end data protection.

即将出现的是针对这一问题的一些有前途的基于标准的解决方案。端到端数据保护。

Consider T10 DIF (Data Integrity Field). This is an emerging standard (it was drafted 5 years ago) and a new technology, but it has the lofty goal of solving the problem of data corruption.

考虑T10 DIF(数据完整性字段)。这是一个新兴的标准(它是在5年前起草的)和一项新技术,但它有解决数据损坏问题的崇高目标。

#2

That article significantly exaggerates the problem by misunderstanding the source. It assumes that data loss events are independent, ie that if I take a thousand disks, and get five hundred errors, that's likely to be one each on five hundred of the disks. But actually, as anyone who has had disk trouble knows, it's probably five hundred errors on one disk (still a tiny fraction of the disk's total capacity), and the other nine hundred and ninety-nine were fine. Thus, in practice it's not that there's a 56% chance that you can't read all of your disk, rather, it's probably more like 1% or less, but most of the people in that 1% will find they've lost dozens or hundreds of sectors even though the disk as a whole hasn't failed.

那篇文章通过误解来源大大夸大了这个问题。它假定数据丢失事件是独立的,即,如果我需要一千个磁盘,并且得到五百个错误,那么可能是五百个磁盘中的一个。但实际上,任何遇到磁盘故障的人都知道,一个磁盘上可能有500个错误(仍然是磁盘总容量的一小部分),而另外9,99个错误。因此,在实践中并不是说有56%的可能性你无法读取所有磁盘,而是可能更像是1%或更少,但是1%的人中的大多数会发现他们已经失去了几十个或者数百个扇区,即使整个磁盘没有发生故障。

Sure enough, practical experiments reflect this understanding, not the one offered in the article.

果然,实际的实验反映了这种理解,而不是文章中提供的理解。

Basically this is an example of "Chinese whispers". The article linked here refers to another article, which in turn refers indirectly to a published paper. The paper says that of course these events are not independent but that vital fact disappears on the transition to easily digested blog format.

基本上这是“中国私语”的一个例子。这里链接的文章是指另一篇文章,后者又间接地引用了已发表的论文。该报告说,当然这些事件并不是独立的,但这一重要事实在转向易于消化的博客格式时消失了。

#3

56% chance I can't read something, I doubt it. I run a mix of RAID 5 and other goodies and just good backup practices but with Raid 5 and a hot spare I haven't ever had data loss so I'm not sure what all the fuss is about. If you're storing parity information ... well you're creating a RAID system using software, a disk failure in R5 results in a parity like check to get back the lost disk data so ... it is already there.

有56%的几率我无法阅读,我对此表示怀疑。我运行RAID 5和其他好东西以及良好的备份实践,但是使用Raid 5和热备份我没有数据丢失,所以我不确定所有的大惊小怪。如果您正在存储奇偶校验信息......那么您正在使用软件创建RAID系统,R5中的磁盘故障会导致像检查这样的奇偶校验以获取丢失的磁盘数据,因此......它已经存在。

Run Raid, backup your data, you be fine :)

运行Raid,备份你的数据,你没事:)

#1