Friday, March 20, 2009

LVM badblock recovery

Symtoms: High server load, swap usage skyrocketing. Smartd errors.

What worked for me:

root@freedom [/home/santolo]# sfdisk -luS /dev/sda

Disk /dev/sda: 60801 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/sda1 * 63 208844 208782 83 Linux
/dev/sda2 208845 976768064 976559220 8e Linux LVM
/dev/sda3 0 - 0 0 Empty
/dev/sda4 0 - 0 0 Empty
root@freedom [/home/santolo]#

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 3596 7658887
# 2 Extended offline Completed without error 00% 3003 -
# 3 Extended offline Completed without error 00% 1003 -


(7658887 - 208844) = 7450043
part=/dev/sda2 ; pvdisplay -c $part | awk -F: '{print $8}' = 32768

(32768 * 2) = 65536

7450043 / 65536 = 113.6786346435546875

# lvdisplay --maps |egrep 'Physical|LV Nam
LV Name /dev/VolGroup00/LogVol00
Type linear
Physical volume /dev/sda2
Physical extents 0 to 14806
LV Name /dev/VolGroup00/LogVol01
Type linear
Physical volume /dev/sda2
Physical extents 14807 to 14900
root@freedom [/home/santolo]#

7450043 - 384 = 7449659

7449659 / 8 = 931207.375

dd if=/dev/VolGroup00/LogVol00 of=block931207 bs=4096 count=1 skip=931207


debugfs 1.39 (29-May-2006)
debugfs: open /dev/VolGroup00/LogVol00
debugfs: icheck 931207
Block Inode number
931207 917883

root@freedom [/home/cpbackuptmp/cpbackup/weekly]# debugfs
debugfs 1.39 (29-May-2006)
debugfs: open /dev/VolGroup00/LogVol00
debugfs: ncheck 917883
Inode Pathname
917883 /var/lib/mysql/monjas_evoblue/evo_sessions.MYD
debugfs:


dd if=/dev/zero of=/dev/VolGroup00/LogVol00 count=1 bs=4096 seek=931207


SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 3596 7658887
# 2 Extended offline Completed without error 00% 3003 -
# 3 Extended offline Completed without error 00% 1003 -


I used this guide: http://smartmontools.sourceforge.net/badblockhowto.html#lvm

LVM repairs

This section was written by Frederic BOITEUX. It was titled: "HOW TO LOCATE AND REPAIR BAD BLOCKS ON AN LVM VOLUME".

Smartd reports an error in a short test :
# smartctl -a /dev/hdb
...
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 66 37383668

So the disk has a bad block located in LBA block 37383668

In which physical partition is the bad block ?
# sfdisk -luS /dev/hdb # or 'fdisk -ul /dev/hdb'

Disk /dev/hdb: 9729 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/hdb1 63 996029 995967 82 Linux swap / Solaris
/dev/hdb2 * 996030 1188809 192780 83 Linux
/dev/hdb3 1188810 156296384 155107575 8e Linux LVM
/dev/hdb4 0 - 0 0 Empty

It's in the /dev/hdb3 partition, a LVM2 partition. From the LVM2 partition beginning, the bad block has an offset of
(37383668 - 1188810) = 36194858


We have to find in which LVM2 logical partition the block belongs to.

In which logical partition is the bad block ?

IMPORTANT : LVM2 can use different schemes dividing its physical partitions to logical ones : linear, striped, contiguous or not... The following example assumes that allocation is linear !

The physical partition used by LVM2 is divided in PE (Physical Extent) units of the same size, starting at pe_start' 512 bytes blocks from the beginning of the physical partition.

The 'pvdisplay' command gives the size of the PE (in KB) of the LVM partition :
# part=/dev/hdb3 ; pvdisplay -c $part | awk -F: '{print $8}'
4096


To get its size in LBA block size (512 bytes or 0.5 KB), we multiply this number by 2 : 4096 * 2 = 8192 blocks for each PE.

To find the offset from the beginning of the physical partition is a bit more difficult : if you have a recent LVM2 version, try :
# pvs -o+pe_start $part


Either, you can look in /etc/lvm/backup :
# grep pe_start $(grep -l $part /etc/lvm/backup/*)
pe_start = 384


Then, we search in which PE is the badblock, calculating the PE rank in which the faulty block of the partition is : physical partition's bad block number / sizeof(PE) =
36194858 / 8192 = 4418.3176


So we have to find in which LVM2 logical partition is used the PE number 4418 (count starts from 0) :
# lvdisplay --maps |egrep 'Physical|LV Name|Type'
LV Name /dev/WDC80Go/racine
Type linear
Physical volume /dev/hdb3
Physical extents 0 to 127
LV Name /dev/WDC80Go/usr
Type linear
Physical volume /dev/hdb3
Physical extents 128 to 1407
LV Name /dev/WDC80Go/var
Type linear
Physical volume /dev/hdb3
Physical extents 1408 to 1663
LV Name /dev/WDC80Go/tmp
Type linear
Physical volume /dev/hdb3
Physical extents 1664 to 1791
LV Name /dev/WDC80Go/home
Type linear
Physical volume /dev/hdb3
Physical extents 1792 to 3071
LV Name /dev/WDC80Go/ext1
Type linear
Physical volume /dev/hdb3
Physical extents 3072 to 10751
LV Name /dev/WDC80Go/ext2
Type linear
Physical volume /dev/hdb3
Physical extents 10752 to 18932


So the PE #4418 is in the /dev/WDC80Go/ext1 LVM logical partition.

Size of logical block of file system on /dev/WDC80Go/ext1 :

It's a ext3 fs, so I get it like this :
# dumpe2fs /dev/WDC80Go/ext1 | grep 'Block size'
dumpe2fs 1.37 (21-Mar-2005)
Block size: 4096


bad block number for the file system :

The logical partition begins on PE 3072 :
(# PE's start of partition * sizeof(PE)) + parttion offset[pe_start] =
(3072 * 8192) + 384 = 25166208

512b block of the physical partition, so the bad block number for the file system is :
(36194858 - 25166208) / (sizeof(fs block) / 512)
= 11028650 / (4096 / 512) = 1378581.25


Test of the fs bad block :
dd if=/dev/WDC80Go/ext1 of=block1378581 bs=4096 count=1 skip=1378581


If this dd command succeeds, without any error message in console or syslog, then the block number calculation is probably wrong ! *Don't* go further, re-check it and if you don't find the error, please renounce !

Search / correction follows the same scheme as for simple partitions :

find possible impacted files with debugfs (icheck , then ncheck ).

reallocate bad block writing zeros in it, *using the fs block size* :


dd if=/dev/zero of=/dev/WDC80Go/ext1 count=1 bs=4096 seek=1378581


Et voilĂ  !

Today I am learning a useful utility program named 'debugfs'. It is part of e2fsprogs package, an essential package containing axillary programs for ext2 and ext3 file system under Linux.

For a regular file, 'debugfs' can help you find an inode by any data block the file or dir entry is using. Then you can turn around and ask for the name of the inode. This could be handy when some mysterious files causing df and du to disagree whether the filie system is full, or the file system is corrupted or can't mounted to be accessed as usual. More advanced file system features are available too.

# to find what inode is claiming a given data block
# debugfs -R "icheck 12345" /dev/hda1
debugfs 1.35 (28-Feb-2004) Block Inode number 12345 340

# to find the file name given the inode number
# debugfs -R "ncheck 49153" /dev/hda1 debugfs 1.35 (28-Feb-2004) Inode Pathname
49153 /usr/share/locale/ar/LC_MESSAGES/libbonobo-2.0.mo

# Print the location of the inode data structure
# debugfs -R "imap /boot/vmlinuz-2.6.9-42.0.2.EL" /dev/hda1 debugfs 1.35 (28-Feb-2004)
Inode 557516 is part of block group 34
located at block 1114128, offset 0x0580

# to dump the direntry (filespec, per man page)
debugfs -R "dump -p /boot/vmlinuz- 2.6.9-42.0.2.EL /tmp/vmlinuz_dumped" /dev/hda1
# md5sum /boot/vmlinuz-2.6.9-42.0.2.EL /tmp/vmlinuz_dumped e5c536b539b5ffcaa03b22bd7fcc164a /boot/vmlinuz-2.6.9-42.0.2.EL e5c536b539b5ffcaa03b22bd7fcc164a /tmp/vmlinuz_dumped

# to get the contents of a file, assume the fs can't be mounted and accessed the usually way.
# debugfs -R "cat /etc/redhat-release" /dev/hda1
debugfs 1.35 (28-Feb-2004) CentOS release 4.4 (Final)

Noteworthy is, for files under /selinux ( a pseudo fs), it can find inode number associated with a data block. However, it couldn't find the file name for the very inode number.
# debugfs -R "ncheck 8" /dev/hda1 debugfs 1.35 (28-Feb-2004)
Inode Pathname 8
# find / -inum 8
/selinux/relabel
# ls -id /selinux/relabel
8 /selinux/relabel
# debugfs -R "icheck 4567" /dev/hda1 debugfs 1.35 (28-Feb-2004) Block Inode number
4567 8
# / is on /dev/hda1
/dev/hda1 8127400 6738524 1306308 84% /

There are a lot of powerful (and dangerous) features such as
feature you can set or clear various file system features in the superblock
freeb to mark data blocks as unallocated vs. setb
freei to free the inode specified
clri to clear the contents of the inode
chroot to chroot to the directory
find_free_block
find_free_inode
init_filesys to create an ext2 file system
kill_file deallocate the file and its blocks. It doesn't remove any direntry to this inode. not ' rm' or 'unlink'.
logdump to dump the ext3 journal
modify_inode modify the contents of the inode structure
ls/mkdir/mknod/rm/rmdir
'debugfs' starts interactively by default, unless you have '-R' to request one-time use only. A session would be like below:
# debugfs
debugfs 1.35 (28-Feb-2004)
debugfs: open /dev/hda1
debugfs: icheck 12345
Block Inode number
12345 340
debugfs: ncheck 340
Inode Pathname
340 /usr/X11R6/lib/xscreensaver/mountain
debugfs: close
debugfs: quit

0 Comments:

Post a Comment

<< Home