When working with our Debian 7 production server (kernel 3.2.0-4-amd64) on Supermicro X10SLL-F hardware, we consistently encounter SATA link reset errors across all connected drives (2 SSDs and 2 HDDs). The kernel messages show:
ata1: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
ata1: irq_stat 0x00400040, connection status changed
ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }
ata1: hard resetting link
After reviewing multiple bug reports, we implemented several common fixes:
1. Disabling NCQ:
for dev in /sys/block/sd[a-d]; do
echo 1 > $dev/device/queue_depth
done
2. Adjusting power management:
echo "medium_power" > /sys/class/scsi_host/host0/link_power_management_policy
The SATA link speed degrades from 6.0Gbps to 1.5Gbps after errors occur. Checking the controller details:
lspci -nn | grep SATA
00:1f.2 SATA controller [0106]: Intel Corporation Lynx Point 6-port SATA Controller 1 [AHCI mode] [8086:8c02] (rev 04)
1. Kernel Parameters:
Try adding these to GRUB configuration:
libata.force=noncq
libata.force=1.5Gbps
ahci.no_em_buffer=1
2. Firmware Updates:
Check for BIOS updates from Supermicro and firmware updates for both SSDs (Toshiba) and HDDs (Seagate).
3. Physical Connections:
Despite affecting all drives simultaneously, worth checking:
for i in {1..6}; do
echo "Link $i:" $(cat /sys/class/ata_link/link$i/sata_spd)
done
Create a monitoring script to log link resets:
#!/bin/bash
LOG=/var/log/sata_resets.log
dmesg -w | grep --line-buffered "hard resetting link" | \
while read line; do
echo "$(date) - $line" >> $LOG
done
Enable detailed ATA debugging:
echo 1 > /sys/module/libata/parameters/debug
dmesg -wH
This will show detailed state transitions and may reveal timing issues during link negotiation.
When dealing with SATA link errors on Linux systems with Intel Lynx Point controllers, the kernel typically logs messages like this:
ata1: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
ata1: irq_stat 0x00400040, connection status changed
ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }
ata1: hard resetting link
This issue frequently appears in production environments with:
- Supermicro X10 series motherboards
- Intel Lynx Point SATA controllers (AHCI mode)
- Mixed SSD/HDD configurations
- Debian/Ubuntu systems with kernel versions 3.x-5.x
From my troubleshooting experience, these are the most likely culprits:
# Check current link speed (example for /dev/sda)
cat /sys/class/ata_link/link1/sata_spd
# Verify controller capabilities
lspci -vvv -s $(lspci | grep SATA | cut -d' ' -f1)
Step 1: Verify Physical Connections
Before diving into software solutions:
- Reseat all SATA cables
- Try different SATA ports
- Check power supply connections
Step 2: Adjust Kernel Parameters
Try these tweaks in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="... libata.force=noncq libata.force=1.5Gbps pcie_aspm=off"
Then update GRUB and reboot:
update-grub && reboot
Step 3: Power Management Settings
For each host controller (adjust hostX as needed):
for host in /sys/class/scsi_host/host*; do
echo "min_power" > ${host}/link_power_management_policy
done
When basic fixes don't work, try these:
# Enable detailed ATA debugging
echo 1 > /sys/kernel/debug/tracing/events/ata/enable
cat /sys/kernel/debug/tracing/trace_pipe
# Check for controller-specific quirks
modinfo ahci | grep flags
Consider these last-resort options:
- Upgrade to a newer kernel (4.19+ has improved Lynx Point support)
- Disable ASPM completely in BIOS
- Try a different SATA controller mode (RAID instead of AHCI)
- Replace the SATA controller with an LSI HBA
Create a simple monitoring script:
#!/bin/bash
LOG=/var/log/sata_errors.log
dmesg | grep -E "exception Emask|hard resetting link" >> $LOG
if [ -s $LOG ]; then
mail -s "SATA Errors Detected" admin@example.com < $LOG
fi