Advanced RAID5 Recovery: Handling Double Disk Failure with Mismatched Device Order in mdadm


11 views

When dealing with mdadm RAID5 arrays, double disk failures combined with device order mismatches create one of the most challenging recovery scenarios. Here's what happened in this specific case:

Original array composition (from /proc/mdstat backup):
md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

When dealing with non-sequential device numbering in mdadm RAID5, these technical aspects become crucial:

  • Parity block calculation depends on device order, not slot numbers
  • The --assume-clean flag preserves data but resets superblocks
  • Device numbering ([0],[1],[2],[4]) indicates previous array modifications

Based on the available information, here's how to attempt reconstruction:

# First, examine all remaining devices for RAID components
mdadm --examine /dev/sdb1 /dev/sdd1 /dev/sde1

# Attempt assembly with original device order
mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sde1 /dev/sdd1 --run

# If unsuccessful, try recreating with explicit numbering
mdadm --create /dev/md0 --level=5 --raid-devices=4 --chunk=512 \\
  --layout=left-symmetric --assume-clean \\
  /dev/sdb1 missing /dev/sdd1 /dev/sde1

The missing [3] and presence of [4] suggests historical array modifications. The solution:

# Create array with correct numbering (requires mdadm v4.1+)
mdadm --create /dev/md0 --level=5 --raid-devices=4 \\
  --assume-clean --update=devicenum \\
  /dev/sdb1:0 missing:1 /dev/sdd1:2 /dev/sde1:4

If the array assembles but filesystem remains unrecognized:

# First attempt forced ext4 mount
mount -t ext4 -o ro,noload /dev/md0 /mnt/recovery

# If failed, try filesystem repair
fsck.ext4 -v -f -n /dev/md0  # Dry run first
fsck.ext4 -v -f -y /dev/md0  # Actual repair

To avoid similar situations, implement these safeguards:

  • Regular mdadm --detail --scan >> /etc/mdadm.conf
  • Periodic mdadm --examine --scan > /root/mdadm_examine.txt
  • Consider RAID6 for arrays with large drives
  • Implement proper backup rotation (3-2-1 rule)

When standard mdadm commands fail, consider these specialized tools:

# Using ddrescue to clone failing drives
ddrescue -d -r3 /dev/sde /mnt/backup/sde.img /mnt/backup/sde.log

# Analyzing with raidreconf
raidreconf -a analyze -n4 -l5 -c512 -p left-symmetric /dev/sd[bed]1

When dealing with dual disk failures in a 4-drive RAID5 array, the situation becomes particularly precarious when superblocks get corrupted and device order becomes ambiguous. Let me walk through the technical specifics of this recovery scenario.

From the /proc/mdstat backup, we see the original layout:

md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

Key observations about this configuration:

  • Non-sequential device indexes: [0], [1], [2], [4]
  • Superblock version 1.2
  • 512k chunk size
  • Left-symmetric algorithm (algorithm 2)

The failed recovery path demonstrates several common pitfalls:

# Problematic commands executed:
mdadm --create /dev/md0 --assume-clean -l5 -n4 /dev/sdb1 missing /dev/sdd1 /dev/sde1

This approach didn't account for:

  1. The original device ordering
  2. The non-sequential device indexes
  3. Proper superblock preservation

To correctly reconstruct with original indexes, we need to use --assume-clean with explicit slot assignment:


mdadm --create /dev/md0 --assume-clean -l5 -n4 \
    --layout=left-symmetric --chunk=512 \
    /dev/sdb1[0] missing[1] /dev/sdd1[2] /dev/sde1[4]

The device index is crucial for parity calculation in RAID5. The missing slot [1] corresponds to the failed /dev/sdc1.

After array reconstruction, filesystem recovery may be needed:


# Check filesystem consistency
fsck -n /dev/md0

# If EXT4 superblock is corrupted
e2fsck -b 32768 /dev/md0  # Try backup superblock

If standard reconstruction fails, consider these advanced approaches:

  1. Dump superblocks from all devices:
    mdadm --examine /dev/sd[bde]1 > superblock_backup.txt
  2. Manually calculate parity blocks using raid5check utility
  3. Attempt assembly with --build option instead of --create

Essential documentation commands:


# Save critical mdadm configuration
mdadm --detail --scan > /etc/mdadm/mdadm.conf
mdadm --examine /dev/sd[b-e]1 > mdadm_examine_backup.txt

Remember: RAID5 with large drives requires:

  • Regular scrubbing (mdadm --action=check /dev/md0)
  • Proper monitoring
  • Complete backup solution