Dell's PERC H710 RAID controller presents a unique monitoring challenge on CentOS 6 systems. Unlike standard disks, the controller abstracts physical drives, making traditional tools like smartctl
ineffective:
# smartctl -a /dev/sda
Device does not support SMART
Error Counter logging not supported
Device does not support Self Test logging
After extensive testing, I found that Dell's MegaCLI
utility (now replaced by storcli
) provides the necessary access. However, the standard LSI tools don't support H710 out of the box.
First, download the appropriate version for CentOS 6:
wget https://dl.dell.com/FOLDERXXXXXX/perccli-1.17.10-1.noarch.rpm
rpm -ivh perccli-1.17.10-1.noarch.rpm
Use the following command to view physical disk health:
# /opt/MegaRAID/perccli/perccli64 /c0 show all
Sample output showing healthy disks:
----------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
----------------------------------------------------------------------
252:0 3 Onln 0 1.818 TB SATA HDD N N 512 WDC WD20EFRX-68A U
252:1 4 Onln 0 1.818 TB SATA HDD N N 512 WDC WD20EFRX-68A U
Here's a bash script to check disk status and send email alerts:
#!/bin/bash
LOG_FILE="/var/log/disk_monitor.log"
EMAIL="admin@example.com"
# Check disk status
DISK_STATUS=$(/opt/MegaRAID/perccli/perccli64 /c0 /eall /sall show | grep -E 'Offln|Degraded|Failed')
if [ -n "$DISK_STATUS" ]; then
echo "$(date) - Disk alert: $DISK_STATUS" >> $LOG_FILE
echo "$DISK_STATUS" | mail -s "RAID Disk Alert" $EMAIL
fi
Add this to crontab for hourly checks:
0 * * * * /path/to/disk_monitor.sh
For newer systems, consider migrating to storcli
:
wget https://downloads.dell.com/FOLDERYYYYYY/storcli-1.21.06-1.noarch.rpm
rpm -ivh storcli-1.21.06-1.noarch.rpm
The equivalent check command becomes:
# /opt/MegaRAID/storcli/storcli64 /c0 /eall /sall show
- Always verify the download URLs from Dell's official support site
- Test email functionality before deploying in production
- Consider logging to syslog for better integration
- Monitor controller battery health separately
When working with Dell PowerEdge servers running CentOS 6 with PERC H710 RAID controllers, traditional monitoring tools hit significant limitations. The combination of:
- LSI MegaRAID SAS 2208 chipset (Thunderbolt architecture)
- CentOS 6's aging driver stack
- Dell's limited Linux support for legacy systems
creates a perfect storm for administrators needing physical disk health monitoring.
Standard approaches fail here:
# smartctl cannot access physical disks behind RAID
smartctl -a /dev/sda
# Output: "Device does not support SMART"
# megacli tools from LSI aren't compatible
rpm -ivh MegaCli-8.07.14-1.noarch.rpm
# Fails with driver conflicts
While not officially supported for CentOS 6, Dell OpenManage Server Administrator (OMSA) v7.4 can be forced to work:
# Add Dell repository
wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi | bash
# Install specific components
yum install srvadmin-all libsmbios libcmpiCppImpl0 openwsman-client
# Start services
/opt/dell/srvadmin/sbin/srvadmin-services.sh start
Create a bash script to parse OMSA output:
#!/bin/bash
# monitor_raid.sh
ALERT_EMAIL="admin@example.com"
# Check physical disk states
DISK_STATUS=$(omreport storage pdisk controller=0 | grep "State" | awk '{print $3}')
for status in $DISK_STATUS; do
if [[ "$status" != "Online" ]]; then
FAILED_DISKS=$(omreport storage pdisk controller=0 | grep -B 5 -A 3 "State.*$status")
echo "$FAILED_DISKS" | mail -s "RAID Disk Alert on $(hostname)" $ALERT_EMAIL
exit 1
fi
done
# Check battery status
BATTERY_HEALTH=$(omreport storage battery | grep "State" | awk '{print $3}')
if [[ "$BATTERY_HEALTH" != "Good" ]]; then
echo "RAID battery problem: $BATTERY_HEALTH" | mail -s "RAID Battery Alert" $ALERT_EMAIL
fi
Add to crontab for regular checks:
# Add to /etc/crontab
0 */4 * * * root /usr/local/bin/monitor_raid.sh
For systems where OMSA won't install, try this direct approach:
# Extract physical disk info
PDLIST=$(/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep -E "Firmware state|Slot Number")
# Parse output
while read -r line; do
if [[ $line == *"Slot"* ]]; then
SLOT=${line#*:}
elif [[ $line == *"Firmware"* ]]; then
if [[ $line != *"Online"* && $line != *"Hotspare"* ]]; then
echo "Disk in slot $SLOT failed: $line"
fi
fi
done <<< "$PDLIST"
- OMSA 7.4 is the last version with CentOS 6 compatibility
- MegaCLI 8.07.14 may work but requires manual driver patching
- Always test email alerts before production deployment