How to Monitor Hard Disk Health Behind Dell PERC H710 RAID Controller on CentOS 6 Using CLI Tools


1 views

Dell's PERC H710 RAID controller presents a unique monitoring challenge on CentOS 6 systems. Unlike standard disks, the controller abstracts physical drives, making traditional tools like smartctl ineffective:

# smartctl -a /dev/sda
Device does not support SMART
Error Counter logging not supported
Device does not support Self Test logging

After extensive testing, I found that Dell's MegaCLI utility (now replaced by storcli) provides the necessary access. However, the standard LSI tools don't support H710 out of the box.

First, download the appropriate version for CentOS 6:

wget https://dl.dell.com/FOLDERXXXXXX/perccli-1.17.10-1.noarch.rpm
rpm -ivh perccli-1.17.10-1.noarch.rpm

Use the following command to view physical disk health:

# /opt/MegaRAID/perccli/perccli64 /c0 show all

Sample output showing healthy disks:

----------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
----------------------------------------------------------------------
252:0     3 Onln   0 1.818 TB SATA HDD N   N  512 WDC WD20EFRX-68A  U  
252:1     4 Onln   0 1.818 TB SATA HDD N   N  512 WDC WD20EFRX-68A  U  

Here's a bash script to check disk status and send email alerts:

#!/bin/bash
LOG_FILE="/var/log/disk_monitor.log"
EMAIL="admin@example.com"

# Check disk status
DISK_STATUS=$(/opt/MegaRAID/perccli/perccli64 /c0 /eall /sall show | grep -E 'Offln|Degraded|Failed')

if [ -n "$DISK_STATUS" ]; then
    echo "$(date) - Disk alert: $DISK_STATUS" >> $LOG_FILE
    echo "$DISK_STATUS" | mail -s "RAID Disk Alert" $EMAIL
fi

Add this to crontab for hourly checks:

0 * * * * /path/to/disk_monitor.sh

For newer systems, consider migrating to storcli:

wget https://downloads.dell.com/FOLDERYYYYYY/storcli-1.21.06-1.noarch.rpm
rpm -ivh storcli-1.21.06-1.noarch.rpm

The equivalent check command becomes:

# /opt/MegaRAID/storcli/storcli64 /c0 /eall /sall show
  • Always verify the download URLs from Dell's official support site
  • Test email functionality before deploying in production
  • Consider logging to syslog for better integration
  • Monitor controller battery health separately

When working with Dell PowerEdge servers running CentOS 6 with PERC H710 RAID controllers, traditional monitoring tools hit significant limitations. The combination of:

  • LSI MegaRAID SAS 2208 chipset (Thunderbolt architecture)
  • CentOS 6's aging driver stack
  • Dell's limited Linux support for legacy systems

creates a perfect storm for administrators needing physical disk health monitoring.

Standard approaches fail here:

# smartctl cannot access physical disks behind RAID
smartctl -a /dev/sda
# Output: "Device does not support SMART"

# megacli tools from LSI aren't compatible
rpm -ivh MegaCli-8.07.14-1.noarch.rpm
# Fails with driver conflicts

While not officially supported for CentOS 6, Dell OpenManage Server Administrator (OMSA) v7.4 can be forced to work:

# Add Dell repository
wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi | bash

# Install specific components
yum install srvadmin-all libsmbios libcmpiCppImpl0 openwsman-client

# Start services
/opt/dell/srvadmin/sbin/srvadmin-services.sh start

Create a bash script to parse OMSA output:

#!/bin/bash
# monitor_raid.sh
ALERT_EMAIL="admin@example.com"

# Check physical disk states
DISK_STATUS=$(omreport storage pdisk controller=0 | grep "State" | awk '{print $3}')

for status in $DISK_STATUS; do
  if [[ "$status" != "Online" ]]; then
    FAILED_DISKS=$(omreport storage pdisk controller=0 | grep -B 5 -A 3 "State.*$status")
    echo "$FAILED_DISKS" | mail -s "RAID Disk Alert on $(hostname)" $ALERT_EMAIL
    exit 1
  fi
done

# Check battery status
BATTERY_HEALTH=$(omreport storage battery | grep "State" | awk '{print $3}')
if [[ "$BATTERY_HEALTH" != "Good" ]]; then
  echo "RAID battery problem: $BATTERY_HEALTH" | mail -s "RAID Battery Alert" $ALERT_EMAIL
fi

Add to crontab for regular checks:

# Add to /etc/crontab
0 */4 * * * root /usr/local/bin/monitor_raid.sh

For systems where OMSA won't install, try this direct approach:

# Extract physical disk info
PDLIST=$(/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep -E "Firmware state|Slot Number")

# Parse output
while read -r line; do
  if [[ $line == *"Slot"* ]]; then
    SLOT=${line#*:}
  elif [[ $line == *"Firmware"* ]]; then
    if [[ $line != *"Online"* && $line != *"Hotspare"* ]]; then
      echo "Disk in slot $SLOT failed: $line"
    fi
  fi
done <<< "$PDLIST"
  • OMSA 7.4 is the last version with CentOS 6 compatibility
  • MegaCLI 8.07.14 may work but requires manual driver patching
  • Always test email alerts before production deployment