How to Identify Failed Physical Disks in FreeNAS/ZFS: Serial Number Mapping and LED Control Techniques


2 views

When working with FreeNAS and ZFS on a multi-bay enclosure like the Supermicro X6DHE-XB, drive identification becomes critical during failure scenarios. Unlike hardware RAID controllers that provide direct LED indicators, ZFS operates at a higher abstraction layer.

Before deploying your storage array, implement these essential steps:

# Record drive serial numbers and bay positions
for i in {1..8}; do
  serial=$(smartctl -i /dev/da${i} | grep "Serial Number" | awk '{print $3}')
  echo "Bay $i: $serial" >> /root/drive_mapping.txt
done

ZFS provides several tools for drive health monitoring:

# Check overall pool status
zpool status

# Detailed SMART data for each drive
smartctl -a /dev/da1

# Continuous monitoring (run as cron job)
zpool status -x | grep -q "DEGRADED" && echo "ALERT: Pool degraded" | mail -s "ZFS Alert" admin@example.com

While ZFS doesn't natively control enclosure LEDs, these methods can help:

# Using sg_ses for SAS/SATA enclosures
sg_ses --index=2 --set=ident /dev/sg3

# Alternative for 3Ware controllers
tw_cli /c0/u0 set identify=on

When drives fail completely and can't report serial numbers:

  1. Check kernel logs: dmesg | grep -i error
  2. Verify drive presence: camcontrol devlist
  3. Physically inspect the enclosure (listen for unusual sounds)

Create a custom script to map physical locations:

#!/bin/sh
for device in $(sysctl -n kern.disks); do
  bay=$(camcontrol inquiry ${device} | grep "Slot" | awk '{print $2}')
  serial=$(smartctl -i /dev/${device} | grep "Serial" | awk '{print $3}')
  echo "${device} in bay ${bay} has serial ${serial}"
done

When replacing a failed drive:

# Offline the failed drive
zpool offline tank da2

# Physically replace the drive

# Clear the fault LED (if applicable)
sg_ses --index=2 --clear=ident /dev/sg3

# Bring the new drive online
zpool replace tank da2

When working with ZFS on FreeNAS, drive identification differs significantly from traditional hardware RAID systems. The operating system assigns device identifiers (like sda, sdb) dynamically during boot, meaning they may change between reboots. This creates challenges when you need to physically locate a failed drive in your 16-bay Supermicro enclosure.

Before putting your storage system into production:

# Record drive serial numbers and bay positions
for i in {1..8}; do
  smartctl -i /dev/da${i} | grep "Serial Number" | awk '{print "Bay '$i': " $3}'
done > /root/drive_mapping.txt

This Bash script creates a mapping between physical bays and drive serial numbers. Store this information in multiple locations (onsite and offsite).

ZFS provides several tools for disk health monitoring:

# Check overall pool status
zpool status -v

# Detailed SMART data for all drives
smartctl --scan | awk '{print $1}' | xargs -I {} smartctl -a {}

ZFS handles different failure scenarios differently:

Degraded but Responsive Drives

For drives that fail but remain detectable:

# Identify the failed device
zpool status | grep FAULTED

# Cross-reference with physical location
glabel status | grep daX
camcontrol inquiry daX | grep serial

Completely Failed Drives

For drives that don't respond at all:

  1. Note which bay becomes inactive in your enclosure's activity LEDs
  2. Check kernel logs for recent disconnect events: dmesg | grep disconnect
  3. Physically inspect drives (hot-swap bays make this easier)

Create a custom script to maintain real-time drive mapping:

#!/bin/sh
# Generate current drive mapping
echo "Current Drive Mapping - $(date)" > /var/log/drive_mapping.log
for disk in $(sysctl -n kern.disks); do
  serial=$(smartctl -i /dev/${disk} | grep "Serial Number" | awk '{print $3}')
  echo "/dev/${disk}: ${serial}" >> /var/log/drive_mapping.log
done

Schedule this with cron to run hourly.

For enterprise environments, consider:

  • Integrating with IPMI for physical LED control
  • Setting up email alerts for predictive failures using SMART
  • Implementing automated drive location scripts using SES (SCSI Enclosure Services)

Example SES command to blink a drive LED:

# Requires compatible hardware
sesutil -a locate -d /dev/daX -e on

After replacing a drive, always verify the new drive's health before adding it to your pool:

# Run extended SMART test
smartctl -t long /dev/daX

# Check replacement drive's statistics
smartctl -a /dev/daX | grep -E "Reallocated|Pending|Uncorrectable"

Remember to update your drive mapping documentation after any hardware changes.