When dealing with LVM storage in production environments, disk failures can sometimes leave behind orphaned volume groups (VGs) and logical volumes (LVs) after replacement. The typical scenario looks like this:
# lvscan
/dev/vg04/swap: read failed after 0 of 4096 at 4294901760: Input/output error
/dev/vg04/vz: read failed after 0 of 4096 at 995903864832: Input/output error
Standard LVM cleanup commands like vgreduce --removemissing
often fail because:
- The metadata area on the failed disk is completely inaccessible
- LVM still tries to scan the missing physical volume before processing commands
- The VG/LV metadata becomes corrupted due to abrupt disappearance
Here are three proven approaches to clean up this situation:
Method 1: Using Device Filtering
# Create temporary LVM filter to exclude the missing device
echo "filter = [ \"a|/dev/sd[abc]|\", \"r|.*|\" ]" > /etc/lvm/lvm.conf
# Then proceed with removal
vgreduce --removemissing --force vg04
lvremove /dev/vg04/swap
lvremove /dev/vg04/vz
vgremove vg04
Method 2: Direct Metadata Editing
# Backup current LVM metadata
vgcfgbackup -f /root/vgbackup.txt vg04
# Edit the backup file to remove references to missing PV
vim /root/vgbackup.txt
# Restore the cleaned configuration
vgcfgrestore -f /root/vgbackup.txt vg04
Method 3: Emergency Metadata Reconstruction
For severely corrupted cases:
# Force metadata reconstruction
vgimportclone --base /dev/sda /dev/sdb /dev/sdc
# Then verify and clean up
vgscan
vgreduce --removemissing --force vg04
Best practices for disk replacement in LVM environments:
- Always
vgreduce
before physical removal - Maintain regular
vgcfgbackup
of metadata - Consider using RAID underneath LVM for redundancy
- Implement monitoring for PV/VG health status
Remember that these operations should be performed during maintenance windows as they may temporarily affect other LVM operations.
When a physical disk fails in an LVM configuration before properly removing it from the volume group, you're left with orphaned VGs and LVs that continue to appear in LVM metadata but have no underlying physical storage. This creates persistent I/O errors when running any LVM command.
The key indicators of this issue are:
1. Repeated "read failed" errors mentioning specific LV paths
2. "Volume group not found" errors despite the VG being listed
3. Failed attempts to deactivate or remove the VG/LVs
Here's the step-by-step solution to completely remove these orphaned entries:
1. First, verify the current VG status:
# vgs --foreign
# pvs --foreign
2. Force removal of missing PVs from the VG:
# vgreduce --removemissing --force --config 'devices { filter = [ "a|.*|" ] }' vg04
3. If the VG still won't remove, try exporting it:
# vgexport vg04
# vgimport vg04
4. For stubborn cases, manually edit LVM metadata:
# vgcfgbackup -f vg04_backup vg04
# vgremove --force --force vg04
5. Final cleanup of device maps:
# dmsetup remove /dev/vg04/swap
# dmsetup remove /dev/vg04/vz
# rm -rf /dev/vg04
- Always properly remove disks from LVM before physical removal
- Consider using
--test
flag before destructive operations - Maintain regular
vgcfgbackup
of your LVM configuration
If you still encounter issues after these steps, check:
# ls -l /dev/mapper/
# cat /proc/mounts | grep vg04
# lvmdiskscan