How to Identify Failing Disk in Linux Software RAID by Blinking LED Locator Light


14 views

When dealing with hardware RAID controllers, blinking a disk's LED is typically straightforward using vendor-specific tools. However, with Linux software RAID (mdadm), the process requires different approaches since there's no direct hardware control.

The ledctl tool from the ledmon package is specifically designed for this purpose:

sudo apt install ledmon  # Debian/Ubuntu
sudo yum install ledmon  # RHEL/CentOS

# Turn on locate LED
sudo ledctl locate=/dev/sdc

# Turn off locate LED
sudo ledctl locate_off=/dev/sdc

For SAS/SATA disks, the sg_ses tool can control enclosure services:

sudo apt install sg3-utils

# Find the enclosure device
ls /dev/sg*

# Blink the LED (replace X with your enclosure number)
sudo sg_ses --index=0 --set=ident /dev/sgX

As a last resort, you can generate disk activity to make the activity LED blink:

# Continuous read (caution: may stress failing disk)
sudo dd if=/dev/sdc of=/dev/null bs=1M count=1000 status=progress

# Alternative: generate SMART self-test activity
sudo smartctl -t short /dev/sdc

Before replacing any disk, double-check its identity:

# Check physical disk info
sudo hdparm -I /dev/sdc | grep -i serial

# Verify with smartctl
sudo smartctl -i /dev/sdc

# For NVMe disks
sudo nvme list
sudo nvme id-ctrl /dev/nvme0n1 -H | grep -i serial

After identifying and replacing the disk:

# Add the new disk to the array
sudo mdadm --manage /dev/mdX --add /dev/sdc

# Monitor rebuild progress
watch cat /proc/mdstat

When dealing with RAID arrays in rack-mounted servers, physically identifying a specific disk can be challenging - especially when dealing with imminent disk failure indicated by SMART attributes. Here are several reliable methods to make the drive LED blink for identification.

The most direct approach is using the sg_led utility from the sg3_utils package:

sudo apt-get install sg3-utils  # Debian/Ubuntu
sudo yum install sg3_utils      # RHEL/CentOS

# To start blinking:
sudo sg_led --ident /dev/sdc

# To stop blinking:
sudo sg_led --ident-off /dev/sdc

For mdadm software RAID specifically:

# First find which disk in the array is faulty
cat /proc/mdstat

# Then use mdadm to blink the LED
sudo mdadm --manage /dev/md0 --set-faulty /dev/sdc1
sudo mdadm --manage /dev/md0 --fail /dev/sdc1

Some systems support the SGPIO protocol through ledmon:

sudo systemctl start ledmon
sudo ledctl locate=/dev/sdc

If LED control methods don't work, consider these alternatives:

# Generate disk activity (may cause light to blink)
sudo dd if=/dev/sdc of=/dev/null bs=1M count=1000 status=progress

# Check sysfs for enclosure services
ls /sys/class/enclosure/

If the LED won't blink:

  • Verify the disk supports LED control (check smartctl -i /dev/sdc)
  • Check if your HBA/RAID controller supports SGPIO/SES
  • Try different device nodes (/dev/sgX or /dev/bsg/X)

When working with rack servers:

  1. Blink one disk at a time
  2. Have a colleague verify the correct disk is blinking
  3. Consider temporarily labeling disks after identification