When working with Linux servers using software RAID, one of the most frustrating scenarios is dealing with identical-looking failed drives. Unlike hardware RAID controllers with indicator lights, software RAID setups on commodity hardware provide no visual cues when a drive fails.
The professional approach begins before failure occurs. Create a physical-to-logical mapping of your drives:
# Get drive serial numbers and device mappings
for drive in /dev/sd[a-f]; do
echo -n "$drive: "
sudo smartctl -i $drive | grep -i serial
done
# Sample output:
/dev/sda: Serial Number: WD-WCC4N5PH6K45
/dev/sdb: Serial Number: WD-WCC4N5PH6K46
/dev/sdc: Serial Number: WD-WCC4N5PH6K47
[...]
Document these serial numbers physically near your server or in your documentation system.
When a failure occurs, use these Linux commands to identify the problematic drive:
Method 1: Check RAID Status
# Check software RAID status
cat /proc/mdstat
sudo mdadm --detail /dev/md0
This will show which device is marked as failed or removed from the array.
Method 2: Cross-Reference with SMART Data
# Check all drives for SMART errors
for drive in /dev/sd[a-f]; do
echo "Checking $drive:"
sudo smartctl -H $drive | grep -i "test result"
done
Method 3: Locate by Device ID
Match the failed device from RAID to physical slot:
# Find physical port mapping
ls -l /dev/disk/by-path/
When you need to physically locate the drive:
# Make the drive LED blink (if supported)
sudo hdparm --identify /dev/sdX
For systems without LED support:
- Note the failed device (e.g., /dev/sdd)
- Check physical connections to match SATA port numbers
- Use the serial number mapping you created earlier
- Label drives with their device IDs when installing
- Document drive-to-port mapping in your wiki
- Implement monitoring that includes drive serial numbers
- Consider using drive trays with individual LEDs
When working with commodity servers using software RAID, you'll often encounter identical-looking drives. Unlike enterprise storage with dedicated LED indicators, consumer-grade hardware requires smarter identification methods. Let me share the techniques I've developed over years of Linux sysadmin work.
First, gather intelligence while the system is running. The lsblk
command gives you the physical disk hierarchy:
lsblk -o NAME,MODEL,SERIAL,SIZE,ROTA,MOUNTPOINT
Sample output:
sda ST4000DM004 ZDH1A2K3 3.7T 1
└─sda1 3.7T /mnt/data
sdb ST4000DM004 ZDH1A2K8 3.7T 1
└─md127 3.7T /mnt/array
Check your RAID status to identify the failed device:
cat /proc/mdstat
md127 : active raid5 sdb1[0] sdc1[2] sdd1[3](F) sde1[4]
11721038848 blocks super 1.2 level 5, 512k chunk
The (F)
flag marks the failed drive. Now match this to physical devices.
Modern Linux provides multiple ways to map logical devices to physical ports:
ls -l /sys/block/sd*/device
This shows the SATA port connections:
lrwxrwxrwx 1 root root 0 Aug 1 09:00 /sys/block/sda/device -> ../../../0:0:0:0
lrwxrwxrwx 1 root root 0 Aug 1 09:00 /sys/block/sdb/device -> ../../../0:0:1:0
For drives still responding but showing errors:
for drive in /dev/sd[a-f]; do
echo "=== $drive ==="
smartctl -i $drive | grep -E "Model|Serial"
smartctl -H $drive | grep "test result"
done
When you finally open the case:
- Use SATA port numbers (often printed on motherboard)
- Create a physical diagram during initial setup
- Temporary label drives with erasable marker
- Use drive serial numbers (printed on label)
Here's a script I use for quick identification:
#!/bin/bash
echo "Physical Drive Identification Report"
echo "Generated: $(date)"
echo "------------------------------------"
for device in /sys/block/sd*; do
devname=$(basename $device)
model=$(cat $device/device/model)
serial=$(cat $device/device/serial)
port=$(readlink $device/device | awk -F: '{print $NF}')
echo "[Port $port] $devname: $model (SN: $serial)"
done
For future-proofing:
- Document drive positions during initial setup
- Consider drive caddies with LED indicators
- Implement monitoring that logs physical locations