Decoding mpt2sas Error Codes 0x31120303 and 0x31110d01: LSI SAS2008 HBA Troubleshooting Guide


6 views

When working with LSI SAS2008-based HBAs (like the Supermicro AOC-USAS2-L8I), you might encounter these recurring mpt2sas driver messages:

kernel: [timestamp] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
kernel: [timestamp] mpt2sas0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)

These messages typically appear in clusters, with 0x31120303 preceding 0x31110d01 by 5-30 seconds. The pattern suggests a hardware communication issue triggering both an abort (code 0x12) and reset (code 0x11).

Examining the Linux kernel's mpi_log_sas.h reveals the structure of these codes:

/*
 * First byte (0x31): MPI2_LOGINFO_TYPE_SAS
 * Second byte (0x12/0x11): Event code
 * Remaining bytes: Sub-codes
 */

Breakdown of the observed errors:

  • 0x31120303: SAS protocol layer (PL) abort (0x12) with undocumented subcode 0x0303
  • 0x31110d01: SAS protocol layer reset (0x11) with subcode likely indicating link reset

For systems with mixed drive models (WD30EZRX and ST3000DM001 in this case), verify:

# Check drive negotiation speeds
smartctl -i /dev/sdX | grep -i speed
# Verify link status
cat /sys/class/scsi_host/host*/link_speed

Common issues in such configurations include:

  • Incompatible SATA/SAS negotiation between HBA and backplane
  • Marginal signal integrity with certain drive combinations
  • Power delivery issues when multiple drives spin up simultaneously

Gather detailed HBA information:

# Check firmware version
lspci -vvv -s $(lspci | grep -i lsi | awk '{print $1}') | grep -i firmware
# Monitor port status
systool -c scsi_host -v | grep -A10 "Class Device = host"

For real-time monitoring during array rebuilds:

watch -n 1 "dmesg | grep mpt2sas | tail -n 10"

While you've already updated to P12 firmware (2008IT12.FW), consider:

  • Cross-flashing to LSI-branded firmware (often more frequently updated)
  • Verifying the boot order setting matches your usage (IT mode vs. IR mode)
  • Checking for known issues with your specific backplane model

When these errors occur during mdadm operations, monitor rebuild speed:

# Check current rebuild progress
cat /proc/mdstat
# Detailed I/O statistics per disk
iostat -x 1 /dev/sd[b-f]

Typical symptoms include:

  • Periodic throughput drops corresponding to error clusters
  • Increased I/O wait times during error events
  • Possible command timeouts if the issues persist

If the errors persist but don't cause operational failures:

# Try adjusting driver parameters
echo 30 > /sys/class/scsi_host/host*/linkup_response_timeout
# Increase device timeout for mdadm
echo 60 > /sys/block/md*/md/dev-*/timeout

For physical layer issues:

  1. Try different SFF-8087 to SATA breakout cables
  2. Test with drives connected directly to HBA (bypassing backplane)
  3. Verify power supply capacity during concurrent drive operations

Set up comprehensive logging:

# Persistent syslog filtering
:msg, contains, "mpt2sas" /var/log/mpt2sas.log
# Rotated daily log collection
*/5 * * * * dmesg | grep mpt2sas >> /var/log/mpt2sas_$(date +\%Y\%m\%d).log

This creates a searchable history of error patterns for correlation with performance metrics or hardware changes.


When working with LSI SAS2008-based HBAs (like the Supermicro AOC-USAS2-L8I), you might encounter cryptic syslog entries during RAID operations. Here's a typical sequence:

Jul 13 06:06:23 durandal kernel: [366918.435596] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Jul 13 06:06:28 durandal kernel: [366923.145524] mpt2sas0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)

After examining the Linux kernel source (particularly mpi_log_sas.h), we can break down the components:

0x31120303 structure:
- 0x31 (SAS_LOG_INFO)
- 0x12 (SAS_ABORT_TASK)
- 0x0303 (unknown subcode)

0x31110d01 structure:  
- 0x31 (SAS_LOG_INFO)
- 0x11 (SAS_PORT_RESET_TRIGGER)
- 0x0d01 (unknown subcode)
  1. Check drive health (even if SMART looks clean):
    for d in /dev/sd{a,b,c,d,e}; do 
      smartctl -a $d | grep -E "Reallocated|Pending|Uncorrectable"
    done
  2. Monitor controller statistics:
    watch -n 1 "cat /proc/scsi/mpt2sas/0 | grep -A 10 ioc_status"
  3. Verify link speeds:
    lsscsi -t
    sas2ircu 0 display

Based on similar cases, these errors often appear when:

  • Drive firmware incompatibilities exist (especially with 3TB+ drives)
  • Physical connection issues (bad cables/backplanes)
  • Power supply problems causing voltage drops
  • Controller firmware needing updates

1. Update all components:

# Check for latest mpt2sas driver
modinfo mpt2sas | grep version
# Compare with upstream kernel version

2. Test individual drives:

badblocks -sv -b 4096 -t 0x00 -o badblocks.log /dev/sdX

3. Controller-specific diagnostics:

sas2ircu 0 status
sas2flash -list

When seeing unusually long rebuild times (10,000+ minutes):

# Monitor rebuild speed:
watch -n 60 "cat /proc/mdstat && echo && iostat -xmd 1 5"

# Temporary throttle adjustment:
echo 50000 > /proc/sys/dev/raid/speed_limit_max