Decoding mpt2sas Error Codes 0x31120303 and 0x31110d01: LSI SAS2008 HBA Troubleshooting Guide

When working with LSI SAS2008-based HBAs (like the Supermicro AOC-USAS2-L8I), you might encounter these recurring mpt2sas driver messages:

kernel: [timestamp] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
kernel: [timestamp] mpt2sas0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)

These messages typically appear in clusters, with 0x31120303 preceding 0x31110d01 by 5-30 seconds. The pattern suggests a hardware communication issue triggering both an abort (code 0x12) and reset (code 0x11).

Examining the Linux kernel's mpi_log_sas.h reveals the structure of these codes:

/*
 * First byte (0x31): MPI2_LOGINFO_TYPE_SAS
 * Second byte (0x12/0x11): Event code
 * Remaining bytes: Sub-codes
 */

Breakdown of the observed errors:

0x31120303: SAS protocol layer (PL) abort (0x12) with undocumented subcode 0x0303
0x31110d01: SAS protocol layer reset (0x11) with subcode likely indicating link reset

For systems with mixed drive models (WD30EZRX and ST3000DM001 in this case), verify:

# Check drive negotiation speeds
smartctl -i /dev/sdX | grep -i speed
# Verify link status
cat /sys/class/scsi_host/host*/link_speed

Common issues in such configurations include:

Incompatible SATA/SAS negotiation between HBA and backplane
Marginal signal integrity with certain drive combinations
Power delivery issues when multiple drives spin up simultaneously

Gather detailed HBA information:

# Check firmware version
lspci -vvv -s $(lspci | grep -i lsi | awk '{print $1}') | grep -i firmware
# Monitor port status
systool -c scsi_host -v | grep -A10 "Class Device = host"

For real-time monitoring during array rebuilds:

watch -n 1 "dmesg | grep mpt2sas | tail -n 10"

While you've already updated to P12 firmware (2008IT12.FW), consider:

Cross-flashing to LSI-branded firmware (often more frequently updated)
Verifying the boot order setting matches your usage (IT mode vs. IR mode)
Checking for known issues with your specific backplane model

When these errors occur during mdadm operations, monitor rebuild speed:

# Check current rebuild progress
cat /proc/mdstat
# Detailed I/O statistics per disk
iostat -x 1 /dev/sd[b-f]

Typical symptoms include:

Periodic throughput drops corresponding to error clusters
Increased I/O wait times during error events
Possible command timeouts if the issues persist

If the errors persist but don't cause operational failures:

# Try adjusting driver parameters
echo 30 > /sys/class/scsi_host/host*/linkup_response_timeout
# Increase device timeout for mdadm
echo 60 > /sys/block/md*/md/dev-*/timeout

For physical layer issues:

Try different SFF-8087 to SATA breakout cables
Test with drives connected directly to HBA (bypassing backplane)
Verify power supply capacity during concurrent drive operations

Set up comprehensive logging:

# Persistent syslog filtering
:msg, contains, "mpt2sas" /var/log/mpt2sas.log
# Rotated daily log collection
*/5 * * * * dmesg | grep mpt2sas >> /var/log/mpt2sas_$(date +\%Y\%m\%d).log

This creates a searchable history of error patterns for correlation with performance metrics or hardware changes.

When working with LSI SAS2008-based HBAs (like the Supermicro AOC-USAS2-L8I), you might encounter cryptic syslog entries during RAID operations. Here's a typical sequence:

Jul 13 06:06:23 durandal kernel: [366918.435596] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Jul 13 06:06:28 durandal kernel: [366923.145524] mpt2sas0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)

After examining the Linux kernel source (particularly mpi_log_sas.h), we can break down the components:

0x31120303 structure:
- 0x31 (SAS_LOG_INFO)
- 0x12 (SAS_ABORT_TASK)
- 0x0303 (unknown subcode)

0x31110d01 structure:  
- 0x31 (SAS_LOG_INFO)
- 0x11 (SAS_PORT_RESET_TRIGGER)
- 0x0d01 (unknown subcode)

Check drive health (even if SMART looks clean):

for d in /dev/sd{a,b,c,d,e}; do 
  smartctl -a $d | grep -E "Reallocated|Pending|Uncorrectable"
done

Monitor controller statistics:

watch -n 1 "cat /proc/scsi/mpt2sas/0 | grep -A 10 ioc_status"

Verify link speeds:
```
lsscsi -t
sas2ircu 0 display
```

Based on similar cases, these errors often appear when:

Drive firmware incompatibilities exist (especially with 3TB+ drives)
Physical connection issues (bad cables/backplanes)
Power supply problems causing voltage drops
Controller firmware needing updates

1. Update all components:

# Check for latest mpt2sas driver
modinfo mpt2sas | grep version
# Compare with upstream kernel version

2. Test individual drives:

badblocks -sv -b 4096 -t 0x00 -o badblocks.log /dev/sdX

3. Controller-specific diagnostics:

sas2ircu 0 status
sas2flash -list

When seeing unusually long rebuild times (10,000+ minutes):

# Monitor rebuild speed:
watch -n 60 "cat /proc/mdstat && echo && iostat -xmd 1 5"

# Temporary throttle adjustment:
echo 50000 > /proc/sys/dev/raid/speed_limit_max

ServerDevWorker

Decoding mpt2sas Error Codes 0x31120303 and 0x31110d01: LSI SAS2008 HBA Troubleshooting Guide

Related Articles