Advanced RAID5 Recovery: Handling Double Disk Failure with Mismatched Device Order in mdadm


3 views

When dealing with mdadm RAID5 arrays, double disk failures combined with device order mismatches create one of the most challenging recovery scenarios. Here's what happened in this specific case:

Original array composition (from /proc/mdstat backup):
md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

When dealing with non-sequential device numbering in mdadm RAID5, these technical aspects become crucial:

  • Parity block calculation depends on device order, not slot numbers
  • The --assume-clean flag preserves data but resets superblocks
  • Device numbering ([0],[1],[2],[4]) indicates previous array modifications

Based on the available information, here's how to attempt reconstruction:

# First, examine all remaining devices for RAID components
mdadm --examine /dev/sdb1 /dev/sdd1 /dev/sde1

# Attempt assembly with original device order
mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sde1 /dev/sdd1 --run

# If unsuccessful, try recreating with explicit numbering
mdadm --create /dev/md0 --level=5 --raid-devices=4 --chunk=512 \\
  --layout=left-symmetric --assume-clean \\
  /dev/sdb1 missing /dev/sdd1 /dev/sde1

The missing [3] and presence of [4] suggests historical array modifications. The solution:

# Create array with correct numbering (requires mdadm v4.1+)
mdadm --create /dev/md0 --level=5 --raid-devices=4 \\
  --assume-clean --update=devicenum \\
  /dev/sdb1:0 missing:1 /dev/sdd1:2 /dev/sde1:4

If the array assembles but filesystem remains unrecognized:

# First attempt forced ext4 mount
mount -t ext4 -o ro,noload /dev/md0 /mnt/recovery

# If failed, try filesystem repair
fsck.ext4 -v -f -n /dev/md0  # Dry run first
fsck.ext4 -v -f -y /dev/md0  # Actual repair

To avoid similar situations, implement these safeguards:

  • Regular mdadm --detail --scan >> /etc/mdadm.conf
  • Periodic mdadm --examine --scan > /root/mdadm_examine.txt
  • Consider RAID6 for arrays with large drives
  • Implement proper backup rotation (3-2-1 rule)

When standard mdadm commands fail, consider these specialized tools:

# Using ddrescue to clone failing drives
ddrescue -d -r3 /dev/sde /mnt/backup/sde.img /mnt/backup/sde.log

# Analyzing with raidreconf
raidreconf -a analyze -n4 -l5 -c512 -p left-symmetric /dev/sd[bed]1

When dealing with dual disk failures in a 4-drive RAID5 array, the situation becomes particularly precarious when superblocks get corrupted and device order becomes ambiguous. Let me walk through the technical specifics of this recovery scenario.

From the /proc/mdstat backup, we see the original layout:

md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

Key observations about this configuration:

  • Non-sequential device indexes: [0], [1], [2], [4]
  • Superblock version 1.2
  • 512k chunk size
  • Left-symmetric algorithm (algorithm 2)

The failed recovery path demonstrates several common pitfalls:

# Problematic commands executed:
mdadm --create /dev/md0 --assume-clean -l5 -n4 /dev/sdb1 missing /dev/sdd1 /dev/sde1

This approach didn't account for:

  1. The original device ordering
  2. The non-sequential device indexes
  3. Proper superblock preservation

To correctly reconstruct with original indexes, we need to use --assume-clean with explicit slot assignment:


mdadm --create /dev/md0 --assume-clean -l5 -n4 \
    --layout=left-symmetric --chunk=512 \
    /dev/sdb1[0] missing[1] /dev/sdd1[2] /dev/sde1[4]

The device index is crucial for parity calculation in RAID5. The missing slot [1] corresponds to the failed /dev/sdc1.

After array reconstruction, filesystem recovery may be needed:


# Check filesystem consistency
fsck -n /dev/md0

# If EXT4 superblock is corrupted
e2fsck -b 32768 /dev/md0  # Try backup superblock

If standard reconstruction fails, consider these advanced approaches:

  1. Dump superblocks from all devices:
    mdadm --examine /dev/sd[bde]1 > superblock_backup.txt
  2. Manually calculate parity blocks using raid5check utility
  3. Attempt assembly with --build option instead of --create

Essential documentation commands:


# Save critical mdadm configuration
mdadm --detail --scan > /etc/mdadm/mdadm.conf
mdadm --examine /dev/sd[b-e]1 > mdadm_examine_backup.txt

Remember: RAID5 with large drives requires:

  • Regular scrubbing (mdadm --action=check /dev/md0)
  • Proper monitoring
  • Complete backup solution