html
When dealing with enterprise storage systems, encountering multiple simultaneous disk failures in RAID 6 arrays can create complex recovery scenarios. Your situation involving:
- 16-drive SAS configuration
- Triple disk failure
- Degraded array state
- OS boot failure
requires careful handling to avoid permanent data loss. Unlike RAID 5 which tolerates single disk failure, RAID 6 theoretically withstands two disk failures - but three failures push it beyond design limits.
Step 1: Hardware Assessment
First verify physical disk health through controller utilities. Example SAS diagnostic command:
sas2ircu 0 display # Returns controller and disk status # Look for "Ready" state and correct WWN
Step 2: Create Sector-Level Images
Before any recovery attempts, create forensic copies of failed drives:
dd if=/dev/sdX of=/mnt/backup/sdX.img bs=1M conv=noerror,sync # Repeat for each failed drive
Using a live CD with proper SAS support is indeed the recommended approach:
# Recommended recovery distros: # - SystemRescueCD (with mdadm support) # - Knoppix STD # - Ubuntu Server Live CD # After booting, load appropriate modules: modprobe mpt3sas modprobe raid6
For advanced recovery scenarios, consider these approaches:
- Force assemble degraded array:
mdadm --assemble --force /dev/md0 /dev/sd[abcdefghijk] # Where sd[abcdefghijk] represents remaining functional disks
- Manual P/Q parity recalculation: (Requires deep technical knowledge)
# This requires custom scripting based on your stripe size # Pseudo-code example: for stripe in $(seq 0 $total_stripes); do recalculate_parity --stripe $stripe --disks /dev/sd[a-p] done
When DIY methods fail, consider professional tools:
Tool | Best For |
---|---|
R-Studio | File-level recovery |
UFS Explorer | RAID reconstruction |
ReclaiMe | Automatic parameter detection |
For future configurations, consider improving resilience:
# Example: Implementing hot spares in mdadm mdadm --grow /dev/md0 --raid-devices=16 --spare-devices=2
Remember that RAID 6 with 16 drives has significant rebuild times - consider alternatives like RAID 60 for large arrays.
When three disks fail simultaneously in a RAID 6 array (which normally tolerates two-disk failures), the array becomes completely inaccessible. This is particularly problematic with SAS drives in enterprise environments where immediate data access is critical. The system's inability to boot confirms the storage subsystem failure has cascaded to the OS level.
Before attempting recovery:
- Physically label all failed drives
- Document the original disk order in the array
- Check SMART status of remaining disks:
smartctl -a /dev/sdX
- Create sector-by-sector images of failed drives if possible:
dd if=/dev/sdX of=/mnt/backup/failed_disk1.img bs=1M conv=noerror,sync
Using a live CD can bypass the corrupted OS installation:
# Example SystemRescueCD boot commands:
boot: rescue64 dodisk=1
# Then mount remaining array components:
mdadm --assemble --force /dev/md0 /dev/sd[b-z] --verbose
# Check array status:
mdadm --detail /dev/md0
When standard tools fail, consider:
#!/usr/bin/env python3
# RAID 6 recovery script fragment
import os
from raiddriver import RAID6Driver
def rebuild_parity(drives):
raid = RAID6Driver(stripe_size=512)
try:
raid.load_config('/etc/mdadm.conf')
return raid.rebuild(max_failures=3)
except RAIDDegradedError:
print("Insufficient disks for automatic rebuild")
return False
For SAS environments:
- Use
sas2ircu
to check controller status - SAS drives often report different failure modes than SATA
- Enterprise arrays may have vendor-specific recovery procedures
Post-recovery recommendations:
# Add to cron for regular array checks
0 3 * * * /usr/sbin/mdadm --monitor --scan --mail=admin@domain.com
# Better SMART monitoring
*/30 * * * * /usr/sbin/smartd --quietmode=silent