How to Forcefully Deactivate an LVM Volume Group with Underlying MD RAID Device Errors


2 views

When dealing with storage subsystem failures involving both LVM and MD RAID, you might encounter situations where vgchange -an fails to properly deactivate a volume group due to underlying device errors. This typically happens when:

  • MD RAID arrays drop out of the kernel due to disk failures
  • I/O errors prevent clean LVM metadata operations
  • The kernel maintains references to the PV devices

Before attempting any forceful operations, it's crucial to check the current state:


# Check VG status
vgs --options vg_name,vg_attr

# Check PV status
pvs --options pv_name,vg_name,pv_attr,dev_size

# Check MD status
mdadm --detail /dev/mdX

When standard methods fail, use this sequence of commands:


# First attempt clean deactivation
vgchange -an vg_name

# If that fails, try with partial mode
vgchange -an --partial vg_name

# Force device removal from kernel
dmsetup remove_all --force

# Verify no remaining references
ls /dev/mapper/ | grep vg_name

For persistent device mapper entries that won't release:


# List all device mapper devices
dmsetup ls

# Remove specific device
dmsetup remove /dev/mapper/vg_name-lv_name

# Alternative removal method
echo 1 > /sys/block/dm-X/device/delete

To make your system more resilient:


# Add these to lvm.conf:
devices {
    filter = [ "a|/dev/md.*|", "r|.*|" ]
    global_filter = [ "a|/dev/md.*|", "r|.*|" ]
    md_component_detection = 1
}

When dealing with storage failures in Linux systems using both LVM and MD RAID, you might encounter situations where a volume group refuses to deactivate due to underlying device errors. This typically happens when:

  • Multiple disks in a RAID array fail
  • I/O errors prevent clean device deactivation
  • MD RAID marks the array as failed

LVM maintains several layers of device locks that can prevent clean deactivation:

# Check active volume groups
vgs --units b

# View physical volume status
pvs -v

When vgchange -an fails, try this comprehensive approach:

# 1. Attempt clean deactivation first
vgchange -an vg_name

# 2. If that fails, try with verbose debugging
vgchange -anvvvv vg_name 2>&1 | grep -i 'device'

# 3. Check for any active logical volumes
lvs -o+lv_active

# 4. Force deactivation of individual logical volumes
lvchange -an /dev/vg_name/lv_name

# 5. Use dmsetup to remove device mapper entries
dmsetup remove_all --force

# 6. Flush multipath if used
multipath -F

# 7. Finally try vgchange again with force
vgchange -an --force --force vg_name

For the underlying MD RAID issue, consider these steps before LVM operations:

# Stop the array completely
mdadm --stop /dev/mdX

# Examine array details
mdadm --examine /dev/sd[abc]1

# Reassemble with missing devices
mdadm --assemble --force /dev/mdX /dev/sda1 /dev/sdb1 --run

When all else fails, directly remove the device from kernel memory:

# Find major:minor numbers
ls -l /dev/vg_name/lv_name

# Use sysfs to remove (example for 253:2)
echo 1 > /sys/block/dm-2/device/delete

# Or for entire device
echo 1 > /sys/block/sdb/device/delete

To avoid similar situations in the future:

  • Monitor RAID health with mdadm --detail --scan
  • Configure proper LVM autoactivation settings
  • Implement monitoring for sector errors
  • Consider RAID 6 instead of RAID 5 for large arrays

When troubleshooting, gather these diagnostics:

# Collect LVM debug info
vgcfgbackup -v vg_name
vgs --all -v -vvvv

# Check kernel messages
dmesg | grep -E 'md|lvm|sd'

# Examine device mapper status
dmsetup ls --tree
dmsetup status