How to Safely Remove a Failed Physical Volume from LVM Without Data Loss on Surviving Disks


2 views

When dealing with LVM configurations where a physical volume (PV) has failed completely and is no longer detectable by the system (showing as "unknown device" in pvdisplay), we face a critical recovery situation. In your case:

Couldn't find device with uuid WWeM0m-MLX2-o0da-tf7q-fJJu-eiGl-e7UmM3.

The system cannot access the 1.82TB disk, while the secondary 931GB disk (/dev/sdb1) remains operational but shows no free extents.

Before attempting any removal, ensure you've tried basic recovery:

# Attempt to reactivate the volume group
vgchange -ay media

# Check for any recoverable portions
pvscan --cache

When the disk is unrecoverable and you need to preserve data on the remaining disk:

# First, make sure the VG is active
vgchange -ay media

# Force the removal of the missing PV
vgreduce --removemissing --force media

# Verify the remaining structure
vgs
pvs

Critical Note: This operation will permanently lose all data that was exclusively stored on the failed PV. Only data that had mirrors or was stored on the surviving PV will remain.

For your ext4 filesystem on the remaining disk:

# Check filesystem consistency
e2fsck -f /dev/media/your_logical_volume

# Resize the filesystem if needed (after reducing LV)
resize2fs /dev/media/your_logical_volume

If you suspect some data might be recoverable from the failed disk:

# Try to recreate the PV signature (if disk is physically accessible but logically damaged)
pvcreate --uuid WWeM0m-MLX2-o0da-tf7q-fJJu-eiGl-e7UmM3 --restorefile /etc/lvm/archive/media_xxxx.vg /dev/sdX

# Then attempt vgcfgrestore
vgcfgrestore -f /etc/lvm/archive/media_xxxx.vg media

To avoid complete data loss in such scenarios:

# Always maintain LVM metadata backups
vgcfgbackup media

# Consider implementing mirroring for critical data
lvcreate -L 100G -m1 -n important_data media /dev/sdb1 /dev/sdc1

When dealing with LVM (Logical Volume Manager) configurations, encountering a failed physical volume (PV) while other PVs in the same volume group (VG) remain functional is a critical situation. In this case, we have:

  • A corrupted 1.82TB disk (UUID: WWeM0m-MLX2...) showing as "unknown device"
  • A healthy 931GB disk (/dev/sdb1) still operational
  • Both PVs are 100% allocated with no free extents
  • The filesystem is ext4

Before attempting any recovery operations:

# Create a backup of LVM metadata
vgcfgbackup media
# Verify backup exists
ls /etc/lvm/backup/media

Since pvmove isn't possible (the disk is completely inaccessible), we'll need to force the removal:

# Mark the PV as missing
pvchange --uuid WWeM0m-MLX2-o0da-tf7q-fJJu-eiGl-e7UmM3 --config 'devices { missing_names=1 }'
# Force the removal of the PV from VG
vgreduce --removemissing --force media

After removing the failed PV, you'll need to:

  1. Activate the remaining logical volumes with partial mode:
  2. vgchange -ay --partial media
  3. Check filesystem integrity (only if absolutely necessary):
  4. e2fsck -f /dev/media/your_logical_volume
  5. Remount the filesystem read-only first:
  6. mount -o ro,remount /your/mount/point

If the above method doesn't work due to metadata issues, try reconstructing the VG with only the good PV:

# Create a new VG with the good PV
vgcreate media_new /dev/sdb1
# Use vgcfgrestore if you have good metadata backups
vgcfgrestore -f /etc/lvm/backup/media media_new

Best practices for LVM configurations:

  • Always maintain LVM metadata backups (vgcfgbackup)
  • Consider RAID for critical storage instead of simple LVM
  • Monitor PV health with smartctl and lvmpmd
  • Keep some free space in your VG for emergencies