How to Forcefully Deactivate an LVM Volume Group with Underlying MD RAID Device Errors


10 views

When dealing with storage subsystem failures involving both LVM and MD RAID, you might encounter situations where vgchange -an fails to properly deactivate a volume group due to underlying device errors. This typically happens when:

  • MD RAID arrays drop out of the kernel due to disk failures
  • I/O errors prevent clean LVM metadata operations
  • The kernel maintains references to the PV devices

Before attempting any forceful operations, it's crucial to check the current state:


# Check VG status
vgs --options vg_name,vg_attr

# Check PV status
pvs --options pv_name,vg_name,pv_attr,dev_size

# Check MD status
mdadm --detail /dev/mdX

When standard methods fail, use this sequence of commands:


# First attempt clean deactivation
vgchange -an vg_name

# If that fails, try with partial mode
vgchange -an --partial vg_name

# Force device removal from kernel
dmsetup remove_all --force

# Verify no remaining references
ls /dev/mapper/ | grep vg_name

For persistent device mapper entries that won't release:


# List all device mapper devices
dmsetup ls

# Remove specific device
dmsetup remove /dev/mapper/vg_name-lv_name

# Alternative removal method
echo 1 > /sys/block/dm-X/device/delete

To make your system more resilient:


# Add these to lvm.conf:
devices {
    filter = [ "a|/dev/md.*|", "r|.*|" ]
    global_filter = [ "a|/dev/md.*|", "r|.*|" ]
    md_component_detection = 1
}

When dealing with storage failures in Linux systems using both LVM and MD RAID, you might encounter situations where a volume group refuses to deactivate due to underlying device errors. This typically happens when:

  • Multiple disks in a RAID array fail
  • I/O errors prevent clean device deactivation
  • MD RAID marks the array as failed

LVM maintains several layers of device locks that can prevent clean deactivation:

# Check active volume groups
vgs --units b

# View physical volume status
pvs -v

When vgchange -an fails, try this comprehensive approach:

# 1. Attempt clean deactivation first
vgchange -an vg_name

# 2. If that fails, try with verbose debugging
vgchange -anvvvv vg_name 2>&1 | grep -i 'device'

# 3. Check for any active logical volumes
lvs -o+lv_active

# 4. Force deactivation of individual logical volumes
lvchange -an /dev/vg_name/lv_name

# 5. Use dmsetup to remove device mapper entries
dmsetup remove_all --force

# 6. Flush multipath if used
multipath -F

# 7. Finally try vgchange again with force
vgchange -an --force --force vg_name

For the underlying MD RAID issue, consider these steps before LVM operations:

# Stop the array completely
mdadm --stop /dev/mdX

# Examine array details
mdadm --examine /dev/sd[abc]1

# Reassemble with missing devices
mdadm --assemble --force /dev/mdX /dev/sda1 /dev/sdb1 --run

When all else fails, directly remove the device from kernel memory:

# Find major:minor numbers
ls -l /dev/vg_name/lv_name

# Use sysfs to remove (example for 253:2)
echo 1 > /sys/block/dm-2/device/delete

# Or for entire device
echo 1 > /sys/block/sdb/device/delete

To avoid similar situations in the future:

  • Monitor RAID health with mdadm --detail --scan
  • Configure proper LVM autoactivation settings
  • Implement monitoring for sector errors
  • Consider RAID 6 instead of RAID 5 for large arrays

When troubleshooting, gather these diagnostics:

# Collect LVM debug info
vgcfgbackup -v vg_name
vgs --all -v -vvvv

# Check kernel messages
dmesg | grep -E 'md|lvm|sd'

# Examine device mapper status
dmsetup ls --tree
dmsetup status