How to Replace a “Removed” Disk in Linux MD RAID-5 Array and Force Rebuild


6 views

When examining your /proc/mdstat and mdadm --detail output, we see a classic RAID-5 degraded state where one device is marked as "removed". This typically occurs when:

  • The disk's UUID changed unexpectedly (often due to filesystem corruption or hardware issues)
  • The superblock became out of sync with the array
  • The device was temporarily unavailable during an array operation

Before proceeding, confirm the current state with these essential commands:

# Check array status
cat /proc/mdstat

# Detailed array information
mdadm --detail /dev/md0

# Examine disk superblocks
mdadm --examine /dev/sd[bcde]1

Since the device is physically present but logically removed, we need to perform a forced re-addition:

# First, mark the device as failed (even though it's "removed")
mdadm /dev/md0 --fail /dev/sdX

# Then remove it from the array
mdadm /dev/md0 --remove /dev/sdX

# Re-add the device with force
mdadm /dev/md0 --add /dev/sdX --force

If the disk is healthy but has a changed UUID, you can update the superblock without rebuilding:

# Backup current superblock first!
mdadm --examine --scan > /etc/mdadm/mdadm.conf.bak

# Update the superblock to match array UUID
mdadm --update-subdev=uuid --uuid=5a4d2b61:9c5c6ad5:aea414d0:5f8dbc13 /dev/sdX

After re-adding the disk, monitor the rebuild:

watch -n 5 cat /proc/mdstat

# Or for more detail:
watch -n 5 mdadm --detail /dev/md0

Consider these best practices:

  • Update your mdadm.conf after changes: mdadm --examine --scan >> /etc/mdadm/mdadm.conf
  • Implement regular array checks: echo check > /sys/block/md0/md/sync_action
  • Set up email alerts in mdadm.conf

If the rebuild fails or stalls:

# Check kernel messages
dmesg | grep md

# Verify disk health
smartctl -a /dev/sdX

# Force a resync if needed
echo repair > /sys/block/md0/md/sync_action

When working with Linux software RAID (mdadm), you might encounter a situation where a perfectly healthy disk gets marked as "removed" in your RAID-5 array. This typically happens when the system fails to recognize the disk's superblock UUID, even though the physical device is functioning properly.

First, let's examine the current state of the array. From your /proc/mdstat and mdadm --detail output, we can see:

# Current array status
$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md0 : inactive sdd1[0] sdc[3] sde1[1]
      3907034368 blocks

# Detailed array information
$ sudo mdadm --detail /dev/md0
[...]
Number Major Minor Raiddevice State
     0     8    49  0          active sync /dev/sdd1
     1     8    65  1          active sync /dev/sde1
     2     0    0   2          removed
     3     8    32  3          active sync /dev/sdc

Since the disk is physically present but marked as removed, we have two approaches:

Option 1: Re-add the existing disk

If you believe the disk is healthy and just needs to be reconnected to the array:

# First, stop the array
$ sudo mdadm --stop /dev/md0

# Then reassemble with all devices
$ sudo mdadm --assemble /dev/md0 /dev/sd[c-e]1 --force

Option 2: Replace with a new disk

If you prefer to replace the disk (even though it might be healthy):

# First, fail the removed slot
$ sudo mdadm /dev/md0 --fail /dev/sdb1

# Then remove it from the array
$ sudo mdadm /dev/md0 --remove /dev/sdb1

# Finally, add the new disk
$ sudo mdadm /dev/md0 --add /dev/sdb1

After performing either operation, monitor the rebuild progress:

# Watch the rebuild progress
$ watch cat /proc/mdstat

To avoid similar issues in the future, consider these best practices:

  • Regularly check array status with mdadm --detail --scan
  • Implement monitoring for your RAID arrays
  • Keep spare disks available for quick replacement
  • Document your RAID configuration thoroughly