普通块设备上的类似LVM的快照

Skip to the question if not interested by the story

如果对故事不感兴趣,请跳过问题

After an unfortunate lost of 2 disk on a 4 disk raid 5 array I got into some voodoo to take as much data back as I can.

在一个4磁盘raid 5阵列上不幸丢失了2个磁盘之后,我进入了一些伏都教,尽可能多地获取数据。

The first drive giving signs of weaknesses was replaces, and during the rebuild (~80% through) the second drive failed on some dead sectors.

第一个带有弱点迹象的驱动器被替换,并且在重建期间(约80%通过),第二个驱动器在一些死区域上失败。

Long story short, now I have two drives (2 and 4) on a consistent state. One (3) is synced but with bad sectors in the middle. The last (1) is only part synced due to the rebuild process. Disks are 1.5 Tb for a total 4.1 Tb array.

长话短说,现在我有两个驱动器(2和4)处于一致状态。一(3)个同步,但中间有坏扇区。由于重建过程,最后一个(1)仅部分同步。对于总共4.1 Tb阵列,磁盘为1.5 Tb。

After having trying all the read-only voodoo on the 234, 124 and the 1234 configurations I've been able to get a good portion of the important data. (Which is only 100 Gb on the 4.1 Tb whole.)

在234,124和1234配置上尝试了所有只读voodoo之后,我已经能够获得很大一部分重要数据。 (4.1 Tb整体上只有100 Gb。)

Now the next step is to try some file system (reiserfs) rebuild to see if I'm able to get more data. Theses operations are destructive. And I have three working disks setups on different states to try.

现在下一步是尝试一些文件系统(reiserfs)重建,看看我是否能够获得更多数据。这些操作具有破坏性。我有三个工作磁盘设置在不同的状态尝试。

So to the question

所以问题

Is there any way to make a snapshot of any kind of the md block device and work on this without altering the md device?

有没有办法制作任何类型的md块设备的快照,并在不改变md设备的情况下处理这个问题?

Some points:

No matters the performance, very slow is acceptable.

无论性能如何,非常慢是可以接受的。
I have 2 * 2 Tb of storage on external temporary drives usable for the "changelog" of the snapshot.

我在外部临时驱动器上有2 * 2 Tb的存储空间,可用于快照的“更改日志”。
I do not have enough storage to copy the whole device (md) to another place and make a lvm volume with it. (Nor the space to image each drive separately)

我没有足够的存储空间将整个设备(md)复制到另一个地方并用它制作一个lvm卷。 (也不是分别为每个驱动器成像的空间)
No need to be reliable in time, the snapshot will be deleted after data recovery (if any).

无需及时可靠,数据恢复后将删除快照(如果有)。
Well, I think it is clear: I only need to to stuff on the read only md and then throw the changes away.

嗯,我认为很清楚:我只需要在只读md上填充然后抛弃更改。

Any ideas?

Thanks!

2 个解决方案

#1

Use the device-mapper snapshot target. Just be warned that it will not mask I/O errors from the underlying bad disk(s) so this is best suited to good disks with corrupted filesystems.

使用device-mapper快照目标。请注意,它不会掩盖底层坏磁盘的I / O错误,因此这最适合于文件系统损坏的好磁盘。

tl;dr - Skip the following three paragraphs of my backstory.

tl; dr - 跳过我背景故事的以下三段。

The most recent incident I dealt with also involved a RAID5 with 4 disks but in a USB-enclosure. It was formatted with NTFS and ironically held a 640GB disk image that was recovered from a failing laptop disk using gddrescue, during which the box reported a disk failure 300GB through. I did not perform the ddrescue, so the bad laptop disk was sent in for a replacement before I was asked to help.

我处理的最新事件还涉及带有4个磁盘但位于USB机箱中的RAID5。它使用NTFS格式化,具有讽刺意味的是使用gddrescue从失败的笔记本电脑磁盘恢复的640GB磁盘映像,在此期间盒子报告磁盘故障300GB。我没有执行ddrescue,所以在我被要求帮助之前,坏的笔记本电脑磁盘被送去替换。

I arrived and had to find a way to retrieve as much of the image file as possible in the limited time that I had access to the RAID box. (It was borrowed and I was visiting from out of town.) The enclosure had a flaw where upon a power cycle it forgot about the disk failure, so the RAID probably operated out of sync for days silently corrupting the NTFS, thus ntfs-3g refused to mount it. I managed to recover 300GB and no more, however that was sufficient to recover many otherwise lost files contained in the image. (I ran testdisk, scrounge-ntfs, and ntfsundelete but I chose not to use photorec.) I ended up using testdisk to read the image file out of the NTFS, but I also tried things like using testdisk to repair the NTFS enough to make ntfs-3g cooperate, and even running chkdsk in VirtualBox which only managed to truncate the image to zero bytes.

我到了,必须找到一种方法,在我访问RAID盒的有限时间内尽可能多地检索图像文件。 (这是借来的,我是从镇外来的。)外壳有一个缺陷,在电源循环时它忘记了磁盘故障,因此RAID可能会在几天内无法同步,无声地破坏NTFS,因此ntfs-3g拒绝登上它。我设法恢复300GB而不是更多,但这足以恢复图像中包含的许多其他丢失的文件。 (我运行了testdisk,scrounge-ntfs和ntfsundelete,但我选择不使用photorec。)我最终使用testdisk从NTFS读取图像文件,但我也尝试过使用testdisk来修复NTFS足以制作ntfs-3g合作,甚至在VirtualBox中运行chkdsk,它只能将图像截断为零字节。

I found it extremely valuable to try several mutually exclusive destructive methods in order to find the best solution.

我发现尝试几种互斥的破坏性方法是非常有价值的,以便找到最佳解决方案。

The device-mapper snapshot target makes use of the dm-snapshot kernel module which performs copy-on-write on a block level. In my steps, I will operate on the failing disk /dev/failing. You will need to supply a block device large enough to store your changes that I will call /dev/cow. It is important that you do not reuse the snapshot exception store for other copy-on-write devices that you create.

设备映射器快照目标使用dm-snapshot内核模块,该模块在块级别上执行写时复制。在我的步骤中,我将操作失败的磁盘/ dev /失败。您需要提供足够大的块设备来存储我将调用/ dev / cow的更改。请勿将快照例外存储重用于您创建的其他写时复制设备,这一点非常重要。

 # Make it much harder to accidentally overwrite anything
 # Run on all partition sub-devices as well, if applicable
1. blockdev --setro /dev/failing

 # Create /dev/mapper/top
2. echo 0 `blockdev --getsz /dev/failing` snapshot /dev/failing /dev/cow p 4 | dmsetup create top

 # Manipulate /dev/mapper/top as you wish

 # Tear-down
3. dmsetup remove top

I provide two alternatives to creating /dev/cow:

我提供了两种创建/ dev / cow的替代方法:

A. Using a sparse file

A.使用稀疏文件

 # Create a sparse file
1. dd if=/dev/zero bs=1048576 count=0 seek=size_in_MB of=tempfile

 # Print name of next unused loop device
2. losetup -f

 # Associate the file with a loop device
3. losetup -f tempfile

 # Use as /dev/cow

 # Use the name from #2 here
4. losetup -d /dev/loopX

5. rm tempfile

B. Using the zram kernel module (see documentation if adapting to ramzswap or compcache!)

B.使用zram内核模块(如果适应ramzswap或compcache,请参阅文档!)

 # Create 4 of them - zram0-3 (you may run into a need for more than one)
1. modprobe zram num_devices=4

 # Set size
2. echo $((1048576*size_in_MB)) > /sys/block/zram0/disksize

 # Associate with a loop device (dmsetup will fail with zramX but not loopX!)
3. losetup -f
4. losetup -f /dev/zram0

 # Use as /dev/cow

 # Use the name from #3 here
5. losetup -d /dev/loopX

6. echo reset > /sys/block/zram0

In my time-limited situation I needed to copy the 300GB image somewhere but I did not have the space for it, so I compressed it (to 25GB).

在我有时间限制的情况下,我需要在某处复制300GB图像,但我没有空间,所以我压缩它(到25GB)。

If you ever need to store a compressed read-only copy of a block device for later use without creating intermediate files, I suggest using squashfs. Break up the device into 4GB chunks using (un)chunkfs (requires FUSE) and run mksquashfs on each chunk individually. That way it can be stored on FAT32 volumes, or on NTFS without high CPU usage from ntfs-3g creating large files. I recommend checksumming the resulting files, and maybe try par2 if you want to add redundancy.

如果您需要存储块设备的压缩只读副本以供以后使用而不创建中间文件,我建议使用squashfs。使用(un)chunkfs(需要FUSE)将设备分解为4GB块,并在每个块上单独运行mksquashfs。这样它就可以存储在FAT32卷上,也可以存储在NTFS上,而不会从ntfs-3g创建大文件的高CPU使用率。我建议校验和生成的文件,如果要添加冗余,可以尝试使用par2。

In order to reassemble the device content, you will most likely need more than the default 8 loop devices. To do this, modprobe loop max_loop=2048 or if it's compiled into your kernel then add max_loop=2048 to your kernel command line. Mount each squashfs and associate the files within to loop devices. Finally, use dmsetup to concatenate them using the linear target. (Read man dmsetup and preferably remember the -r switch, otherwise writes will be dropped instead of failing immediately.)

为了重新组装设备内容,您很可能需要超过默认的8个循环设备。为此,modprobe循环max_loop = 2048或者如果它被编译到你的内核中,那么将max_loop = 2048添加到你的内核命令行。挂载每个squashfs并将文件关联到循环设备。最后,使用dmsetup使用线性目标连接它们。 (阅读man dmsetup并最好记住-r开关,否则写入将被丢弃而不是立即失败。)

#2

If you got enough space on some other storage, I'd just image the drives with dd.

如果你在其他存储空间有足够的空间,我只需用dd对驱动器进行映像。

#1