When two disks fail simultaneously in a RAID-5 array (especially with large 3TB SATA drives), we're dealing with one of the worst-case scenarios for storage administrators. The Dell PERC controller's behavior you're observing - where one disk shows as "missing" and another as "degraded" - suggests possible controller firmware issues combined with actual disk failures.
Possible failure sequences:
1. Disk 1 fails completely (electrical/mechanical) 2. During rebuild attempt, Disk 3 experiences URE (Unrecoverable Read Error) 3. Controller marks Disk 3 as degraded due to timeout 4. Rebuild stalls at 1% because parity data is now corrupted
First, examine the array status without relying on the PERC BIOS:
# Install mdadm if not present yum install mdadm -y # Examine array components mdadm --examine /dev/sd[b-f] | grep -A5 "Event" # Force assemble in degraded mode mdadm --assemble --force /dev/md0 /dev/sd[b,c,d,e,f] --verbose
If standard rebuild fails, try reconstructing metadata:
# Create new array with same parameters mdadm --create --verbose /dev/md0 --level=5 --raid-devices=5 \\ --force /dev/sd[b,c,d,e,f] # Then attempt file system check fsck -vy /dev/md0
Check SMART data on remaining drives:
smartctl -a /dev/sdb | grep -i "reallocated\|pending\|uncorrectable" for disk in b c d e f; do smartctl -H /dev/sd${disk} | grep "test result" done
If all DIY methods fail and data is critical, professional services can:
- Read platters directly in clean rooms
- Reconstruct sector-by-sector using specialized hardware
- Handle firmware corruption cases
For future setups, consider migrating to RAID-6 or ZFS with proper monitoring:
# Example ZFS pool creation zpool create tank raidz2 sdb sdc sdd sde sdf zpool status -v
When two disks fail simultaneously in a RAID-5 array (especially during rebuild operations), you're facing one of storage administration's worst-case scenarios. The mathematical probability isn't as low as you'd hope - research shows a 22% chance of second disk failure during rebuild on large SATA arrays (StorageReview, 2021).
Your scenario exhibits classic symptoms of what we call "cascading failure":
- Disk 1 physically fails (potentially bad sectors or controller issues)
- Disk 3 experiences read errors during rebuild stress
- The RAID controller marks Disk 3 as degraded due to timeout thresholds
First, check the current array status:
sudo mdadm --detail /dev/mdX
cat /proc/mdstat
For forced array assembly (last resort):
sudo mdadm --assemble --force /dev/mdX /dev/sd[bcde]1 --verbose
sudo mdadm --manage /dev/mdX --add /dev/sdf1
When standard rebuilds stall, try these advanced techniques:
# Check disk for bad blocks
sudo badblocks -sv /dev/sdX > badblocks.txt
# Create disk image skipping errors
sudo ddrescue -d -r3 /dev/sdX /mnt/recovery/sdX.img /mnt/recovery/sdX.log
Modify your monitoring to catch early signs:
# SMART monitoring cron job
*/30 * * * * /usr/sbin/smartctl -a /dev/sdX | grep -i "Reallocated_Sector_Ct\|Current_Pending_Sector"
Consider migrating to RAID-6 for arrays >2TB, or implement staggered disk replacement cycles.