How to Properly Remove Orphaned LVM Volume Groups and Logical Volumes After Disk Failure

When dealing with LVM storage in production environments, disk failures can sometimes leave behind orphaned volume groups (VGs) and logical volumes (LVs) after replacement. The typical scenario looks like this:

# lvscan
/dev/vg04/swap: read failed after 0 of 4096 at 4294901760: Input/output error
/dev/vg04/vz: read failed after 0 of 4096 at 995903864832: Input/output error

Standard LVM cleanup commands like vgreduce --removemissing often fail because:

The metadata area on the failed disk is completely inaccessible
LVM still tries to scan the missing physical volume before processing commands
The VG/LV metadata becomes corrupted due to abrupt disappearance

Here are three proven approaches to clean up this situation:

Method 1: Using Device Filtering

# Create temporary LVM filter to exclude the missing device
echo "filter = [ \"a|/dev/sd[abc]|\", \"r|.*|\" ]" > /etc/lvm/lvm.conf

# Then proceed with removal
vgreduce --removemissing --force vg04
lvremove /dev/vg04/swap
lvremove /dev/vg04/vz
vgremove vg04

Method 2: Direct Metadata Editing

# Backup current LVM metadata
vgcfgbackup -f /root/vgbackup.txt vg04

# Edit the backup file to remove references to missing PV
vim /root/vgbackup.txt

# Restore the cleaned configuration
vgcfgrestore -f /root/vgbackup.txt vg04

Method 3: Emergency Metadata Reconstruction

For severely corrupted cases:

# Force metadata reconstruction
vgimportclone --base /dev/sda /dev/sdb /dev/sdc

# Then verify and clean up
vgscan
vgreduce --removemissing --force vg04

Best practices for disk replacement in LVM environments:

Always vgreduce before physical removal
Maintain regular vgcfgbackup of metadata
Consider using RAID underneath LVM for redundancy
Implement monitoring for PV/VG health status

Remember that these operations should be performed during maintenance windows as they may temporarily affect other LVM operations.

When a physical disk fails in an LVM configuration before properly removing it from the volume group, you're left with orphaned VGs and LVs that continue to appear in LVM metadata but have no underlying physical storage. This creates persistent I/O errors when running any LVM command.

The key indicators of this issue are:

1. Repeated "read failed" errors mentioning specific LV paths
2. "Volume group not found" errors despite the VG being listed
3. Failed attempts to deactivate or remove the VG/LVs

Here's the step-by-step solution to completely remove these orphaned entries:

1. First, verify the current VG status:

# vgs --foreign
# pvs --foreign

2. Force removal of missing PVs from the VG:

# vgreduce --removemissing --force --config 'devices { filter = [ "a|.*|" ] }' vg04

3. If the VG still won't remove, try exporting it:

# vgexport vg04
# vgimport vg04

4. For stubborn cases, manually edit LVM metadata:

# vgcfgbackup -f vg04_backup vg04
# vgremove --force --force vg04

5. Final cleanup of device maps:

# dmsetup remove /dev/vg04/swap
# dmsetup remove /dev/vg04/vz
# rm -rf /dev/vg04

Always properly remove disks from LVM before physical removal
Consider using --test flag before destructive operations
Maintain regular vgcfgbackup of your LVM configuration

If you still encounter issues after these steps, check:

# ls -l /dev/mapper/
# cat /proc/mounts | grep vg04
# lvmdiskscan

ServerDevWorker