When dealing with storage subsystem failures involving both LVM and MD RAID, you might encounter situations where vgchange -an
fails to properly deactivate a volume group due to underlying device errors. This typically happens when:
- MD RAID arrays drop out of the kernel due to disk failures
- I/O errors prevent clean LVM metadata operations
- The kernel maintains references to the PV devices
Before attempting any forceful operations, it's crucial to check the current state:
# Check VG status
vgs --options vg_name,vg_attr
# Check PV status
pvs --options pv_name,vg_name,pv_attr,dev_size
# Check MD status
mdadm --detail /dev/mdX
When standard methods fail, use this sequence of commands:
# First attempt clean deactivation
vgchange -an vg_name
# If that fails, try with partial mode
vgchange -an --partial vg_name
# Force device removal from kernel
dmsetup remove_all --force
# Verify no remaining references
ls /dev/mapper/ | grep vg_name
For persistent device mapper entries that won't release:
# List all device mapper devices
dmsetup ls
# Remove specific device
dmsetup remove /dev/mapper/vg_name-lv_name
# Alternative removal method
echo 1 > /sys/block/dm-X/device/delete
To make your system more resilient:
# Add these to lvm.conf:
devices {
filter = [ "a|/dev/md.*|", "r|.*|" ]
global_filter = [ "a|/dev/md.*|", "r|.*|" ]
md_component_detection = 1
}
When dealing with storage failures in Linux systems using both LVM and MD RAID, you might encounter situations where a volume group refuses to deactivate due to underlying device errors. This typically happens when:
- Multiple disks in a RAID array fail
- I/O errors prevent clean device deactivation
- MD RAID marks the array as failed
LVM maintains several layers of device locks that can prevent clean deactivation:
# Check active volume groups
vgs --units b
# View physical volume status
pvs -v
When vgchange -an
fails, try this comprehensive approach:
# 1. Attempt clean deactivation first
vgchange -an vg_name
# 2. If that fails, try with verbose debugging
vgchange -anvvvv vg_name 2>&1 | grep -i 'device'
# 3. Check for any active logical volumes
lvs -o+lv_active
# 4. Force deactivation of individual logical volumes
lvchange -an /dev/vg_name/lv_name
# 5. Use dmsetup to remove device mapper entries
dmsetup remove_all --force
# 6. Flush multipath if used
multipath -F
# 7. Finally try vgchange again with force
vgchange -an --force --force vg_name
For the underlying MD RAID issue, consider these steps before LVM operations:
# Stop the array completely
mdadm --stop /dev/mdX
# Examine array details
mdadm --examine /dev/sd[abc]1
# Reassemble with missing devices
mdadm --assemble --force /dev/mdX /dev/sda1 /dev/sdb1 --run
When all else fails, directly remove the device from kernel memory:
# Find major:minor numbers
ls -l /dev/vg_name/lv_name
# Use sysfs to remove (example for 253:2)
echo 1 > /sys/block/dm-2/device/delete
# Or for entire device
echo 1 > /sys/block/sdb/device/delete
To avoid similar situations in the future:
- Monitor RAID health with
mdadm --detail --scan
- Configure proper LVM autoactivation settings
- Implement monitoring for sector errors
- Consider RAID 6 instead of RAID 5 for large arrays
When troubleshooting, gather these diagnostics:
# Collect LVM debug info
vgcfgbackup -v vg_name
vgs --all -v -vvvv
# Check kernel messages
dmesg | grep -E 'md|lvm|sd'
# Examine device mapper status
dmsetup ls --tree
dmsetup status