Recovery of corrupt XFS partition.
Posted on 2016-05-10 10:59:53
by Geert Vandeweyer
We had a large XFS partition that went corrupt after a memory hardware failure. Containing 45Tb of data and no backups, we were eager to retrieve the data in some way.
The situation could be summarized as follows:
- hardware RAID5 was fine, all disks were fine ( tw_cli /c0 show all )
- LVM2 diagnostics showed no problems
- mounting possible, but all commands on the partition gave I/O errors
- XFS filesystem was located at : /dev/mapper/vg_data-lv_data0
1. xfs_check
[root@fs01]# xfs_check /dev/mapper/vg_data-lv_data0
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_check. If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
2. xfs_repair
[root@fs01]# xfs_repair /dev/mapper/vg_data-lv_data0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
So, both regular xfs_check and repair fail to handle the possibly corrupt LOG of the XFS partition. Due to the high risk of using the -L tag, a bit of simulation would be nice.
The default simulation option "xfs_repair -L -n /dev/mapper/vg_data-lv_data0" lists a lot of inode numbers that would be cleared, but doesn't give information on the actual files that get lost. That's when we turned to the #XFS channel on irc.freenode.org for some advice . The result is a way to actually identify the files you'll lose by running this command.
3. Making a sparse dump of the XFS partition
I was adviced to generate a sparse dump of the partition to actually test the "xfs_repair -L" command and evaluate the effect. The following command dumps all the meta-data from the XFS partition and recreates a disk image that you can mount and play around with. The image does look identical to the original FS but is a lot smaller and the files are empty. When creating, take into account that it *might* grow to 10s of Gbs in size, so make sure your target filesystem can handle that (no EXT3/EXT4/VFAT). In my case, the img was just 700Mb for a 45Tb partition and I've put it on a network share.
[root@fs01]# xfs_metadump -owg /dev/mapper/vg_data-lv_data0 - | xfs_mdrestore - /media/gluster/gvandeweyer/xfs.meta.img
the parameters mean:
- -o : disable obfuscation of file names and attributes : or, do not garble filenames so you can identify them.
- -w : print warning of bad metadata.
- -g : show progress.
4. Execute the xfs_repair
Now we can safely try-out the "xfs_repair -L" to see what happens.
[root@fs01]# xfs_repair -L /media/gluster/gvandeweyer/xfs.meta.img
This will hopefully run the repair and place some/a lot of files into the lost+found folder in the image.
5. Identify the lost files
When will now mount the repaired meta_partition, and the original broken partition and start listing the missing files.
# mount the meta_partition
mount -t xfs -o loop /media/gluster/gvandeweyer/xfs.meta.img /media/rescued_meta
# mount the original partition read-only and don't replay the log
mount -t xfs -o ro,norecovery /dev/mapper/vg_data-lv_data0 /media/original_partition
if you look into the /media/rescued_meta/lost+found folder, you see the items that could not be rescued. They are named by their inode number, which is sort of the physical location on the disk.
The following perl snippet then extracts the original file paths from the broken partition for each inode number listed in the recovered partition and writes them to a file. The final result is the complete list of files you'll be missing after you run the "xfs_repair -L" on the physical partition.
#!/usr/bin/perl
my @inodes = `cd '/media/rescued_meta/lost+found/' ; ls`;
chomp(@inodes);
my $nr = scalar(@inodes);
my $inode_idx = 0;
open OUT, ">/root/inode.scan.results.txt";
foreach my $inode (@inodes) {
$inode_idx++;
print "$inode_idx/$nr : inode $inode\n";
my @items = `cd /media/original_partition; find . -inum $inode -print`;
chomp(@items);
foreach my $item (@items) {
$item =~ s/find: .\/(.*):\s.*/$1/;
if (-e "/media/rescued_meta/$item") {
print OUT "$inode\tEXISTS\t$item\n";
}
else {
print OUT "$inode\tMISSING\t$item\n";
}
}
}
Now review the list in "/root/inode.scan.results.txt" and decide wether or not you can live with the loss.
BASH, Perl, Recovery, XFS
Comments
Loading Comments