While originally designed for HDDs, modern SSDs fully support SMART (Self-Monitoring, Analysis and Reporting Technology) with SSD-specific attributes. The technology has evolved to address flash memory characteristics like:
- Program/Erase cycle counts
- Wear leveling statistics
- Bad block management
- NAND endurance metrics
These critical parameters differ from traditional HDD SMART:
// Example SMART attributes for SSDs:
{
"id": 177,
"name": "Wear_Leveling_Count",
"value": 94,
"worst": 94,
"threshold": 0,
"raw_value": 6
}
Here's a Python example using the smartmontools package:
import subprocess
def get_ssd_smart(device='/dev/nvme0'):
cmd = f"smartctl -a {device}"
try:
output = subprocess.check_output(cmd.split()).decode()
return parse_smart_output(output)
except subprocess.CalledProcessError as e:
print(f"Error reading SMART data: {e}")
return None
Attribute | NVMe | SATA SSD |
---|---|---|
Media Errors | SMART/0x01 | SMART 184 |
Temperature | Composite Temp | SMART 194 |
Pay special attention to:
- Percentage Used (SMART 0xAD) ≥ 90% indicates approaching endurance limit
- Available Spare ≤ 10% requires immediate replacement
- Uncorrectable Error Count > 0 suggests potential data corruption
Sample Bash script for regular checks:
#!/bin/bash
THRESHOLD=90
CURRENT=$(smartctl -A /dev/sda | grep "Percentage Used" | awk '{print $4}')
if [ "$CURRENT" -ge "$THRESHOLD" ]; then
echo "WARNING: SSD wear level at $CURRENT%" | mail -s "SSD Alert" admin@example.com
fi
While originally designed for HDDs, SMART (Self-Monitoring, Analysis and Reporting Technology) has evolved to support SSDs with vendor-specific attributes. Modern SSDs implement SMART through standardized ATA/SCSI commands, though interpretation differs due to SSD's unique failure modes.
Critical SSD-specific SMART attributes include:
# Example SMART attributes for Samsung 870 EVO SSD
Attribute 5: Reallocated_Sector_Count
Attribute 9: Power_On_Hours
Attribute 170: Available_Reserve_Space
Attribute 171: Program_Fail_Count
Attribute 172: Erase_Fail_Count
Attribute 174: Unexpected_Power_Loss_Count
Attribute 177: Wear_Leveling_Count
Attribute 179: Used_Rsvd_Blk_Cnt_Tot
Attribute 181: Program_Fail_Cnt_Total
Attribute 182: Erase_Fail_Count_Total
Attribute 187: Reported_Uncorrect_Errors
Attribute 194: Temperature_Celsius
Attribute 231: SSD_Life_Left
Here's Python code using smartmontools to read SSD SMART data:
import subprocess
def get_ssd_smart(device='/dev/nvme0'):
cmd = ['sudo', 'smartctl', '-a', device]
try:
output = subprocess.check_output(cmd).decode()
return parse_smart_output(output)
except subprocess.CalledProcessError as e:
print(f"Error reading SMART data: {e}")
return None
def parse_smart_output(output):
results = {}
lines = output.split('\\n')
for line in lines:
if line.strip().startswith('Critical Warning'):
results['critical_warning'] = line.split(':')[1].strip()
elif 'Available Spare' in line:
results['available_spare'] = line.split(':')[1].strip()
elif 'Percentage Used' in line:
results['percentage_used'] = line.split(':')[1].strip()
return results
SSD wear indicators require different interpretation than HDD metrics:
// JavaScript example for SSD health calculation
function calculateSSDHealth(smartData) {
const remainingLife = 100 - smartData.percentage_used;
const spareBlocks = smartData.available_spare;
const critical = smartData.critical_warning !== '0x00';
let healthScore = remainingLife * 0.7;
if (spareBlocks < 10) healthScore *= 0.5;
if (critical) healthScore = 0;
return Math.max(0, Math.min(100, healthScore));
}
For system administrators managing mixed environments:
# PowerShell script for Windows SSD monitoring
$diskDrives = Get-PhysicalDisk | Where-Object { $_.MediaType -eq 'SSD' }
foreach ($disk in $diskDrives) {
$smart = Get-StorageReliabilityCounter -PhysicalDisk $disk
[PSCustomObject]@{
DeviceId = $disk.DeviceId
Model = $disk.FriendlyName
Temperature = $smart.Temperature
Wear = $smart.Wear
ReadErrors = $smart.ReadErrorsTotal
WriteErrors = $smart.WriteErrorsTotal
}
}
Critical thresholds for enterprise environments:
- Available spare blocks < 5%
- Wear leveling count > manufacturer's TBW rating
- Uncorrectable error count > 0
- Media errors or CRC errors increasing rapidly
Example Nagios check for SSD health:
#!/bin/bash
WARNING=10
CRITICAL=5
HEALTH=$(smartctl -a /dev/nvme0 | grep -i 'percentage used' | awk '{print $3}' | cut -d'%' -f1)
AVAIL_SPARE=$(smartctl -a /dev/nvme0 | grep -i 'available spare' | awk '{print $4}' | cut -d'%' -f1)
if [ $HEALTH -ge $CRITICAL ]; then
echo "CRITICAL: SSD at ${HEALTH}% remaining life"
exit 2
elif [ $HEALTH -ge $WARNING ]; then
echo "WARNING: SSD at ${HEALTH}% remaining life"
exit 1
else
echo "OK: SSD at ${HEALTH}% remaining life, ${AVAIL_SPARE}% spare"
exit 0
fi