When examining your /proc/mdstat
and mdadm --detail
output, we see a classic RAID-5 degraded state where one device is marked as "removed". This typically occurs when:
- The disk's UUID changed unexpectedly (often due to filesystem corruption or hardware issues)
- The superblock became out of sync with the array
- The device was temporarily unavailable during an array operation
Before proceeding, confirm the current state with these essential commands:
# Check array status
cat /proc/mdstat
# Detailed array information
mdadm --detail /dev/md0
# Examine disk superblocks
mdadm --examine /dev/sd[bcde]1
Since the device is physically present but logically removed, we need to perform a forced re-addition:
# First, mark the device as failed (even though it's "removed")
mdadm /dev/md0 --fail /dev/sdX
# Then remove it from the array
mdadm /dev/md0 --remove /dev/sdX
# Re-add the device with force
mdadm /dev/md0 --add /dev/sdX --force
If the disk is healthy but has a changed UUID, you can update the superblock without rebuilding:
# Backup current superblock first!
mdadm --examine --scan > /etc/mdadm/mdadm.conf.bak
# Update the superblock to match array UUID
mdadm --update-subdev=uuid --uuid=5a4d2b61:9c5c6ad5:aea414d0:5f8dbc13 /dev/sdX
After re-adding the disk, monitor the rebuild:
watch -n 5 cat /proc/mdstat
# Or for more detail:
watch -n 5 mdadm --detail /dev/md0
Consider these best practices:
- Update your mdadm.conf after changes:
mdadm --examine --scan >> /etc/mdadm/mdadm.conf
- Implement regular array checks:
echo check > /sys/block/md0/md/sync_action
- Set up email alerts in mdadm.conf
If the rebuild fails or stalls:
# Check kernel messages
dmesg | grep md
# Verify disk health
smartctl -a /dev/sdX
# Force a resync if needed
echo repair > /sys/block/md0/md/sync_action
When working with Linux software RAID (mdadm), you might encounter a situation where a perfectly healthy disk gets marked as "removed" in your RAID-5 array. This typically happens when the system fails to recognize the disk's superblock UUID, even though the physical device is functioning properly.
First, let's examine the current state of the array. From your /proc/mdstat
and mdadm --detail
output, we can see:
# Current array status
$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md0 : inactive sdd1[0] sdc[3] sde1[1]
3907034368 blocks
# Detailed array information
$ sudo mdadm --detail /dev/md0
[...]
Number Major Minor Raiddevice State
0 8 49 0 active sync /dev/sdd1
1 8 65 1 active sync /dev/sde1
2 0 0 2 removed
3 8 32 3 active sync /dev/sdc
Since the disk is physically present but marked as removed, we have two approaches:
Option 1: Re-add the existing disk
If you believe the disk is healthy and just needs to be reconnected to the array:
# First, stop the array
$ sudo mdadm --stop /dev/md0
# Then reassemble with all devices
$ sudo mdadm --assemble /dev/md0 /dev/sd[c-e]1 --force
Option 2: Replace with a new disk
If you prefer to replace the disk (even though it might be healthy):
# First, fail the removed slot
$ sudo mdadm /dev/md0 --fail /dev/sdb1
# Then remove it from the array
$ sudo mdadm /dev/md0 --remove /dev/sdb1
# Finally, add the new disk
$ sudo mdadm /dev/md0 --add /dev/sdb1
After performing either operation, monitor the rebuild progress:
# Watch the rebuild progress
$ watch cat /proc/mdstat
To avoid similar issues in the future, consider these best practices:
- Regularly check array status with
mdadm --detail --scan
- Implement monitoring for your RAID arrays
- Keep spare disks available for quick replacement
- Document your RAID configuration thoroughly