Troubleshooting Intel e1000e Driver Hardware Hangs and PHY Reset Issues in Linux


1 views

The logs reveal two critical patterns in the e1000e driver behavior:

1. "PHY reset is blocked due to SOL/IDER session"
2. Repeated "Detected Hardware Unit Hang" messages with register dumps

The SOL (Serial Over LAN) and IDER (IDE-Redirection) feature in Intel NICs can interfere with normal operation. Check current status:

# Check if SOL is active in BIOS
sudo dmidecode | grep -i sol
# Disable SOL/IDER if possible in BIOS settings

# Alternative: force disable via module parameters
echo "options e1000e allow_unsupported_sfp=1" | sudo tee /etc/modprobe.d/e1000e.conf
sudo modprobe -r e1000e
sudo modprobe e1000e

The hang messages show transmission queue issues. Try these tuning parameters:

# Set tx/rx descriptors (adjust based on your NIC's capabilities)
sudo ethtool -G eth0 rx 2048 tx 2048
# Disable TSO/GSO for troubleshooting
sudo ethtool -K eth0 tso off gso off
# Enable flow control
sudo ethtool -A eth0 rx on tx on

ASPM (Active State Power Management) often causes issues:

# Check current ASPM status
lspci -vv | grep -i aspm
# Disable ASPM at kernel level
echo "pcie_aspm=off" | sudo tee -a /etc/default/grub
sudo update-grub

While you've tried multiple versions, note that:

  • v1.0.2-k4 is ancient (2009 era)
  • v1.2.20 shows better link negotiation
  • Consider even newer versions from Intel's source

Deeper register analysis can reveal hardware issues:

# Capture full register dump
sudo ethtool -d eth0 > eth0_registers.txt
# Check EEPROM contents
sudo ethtool -e eth0 > eth0_eeprom.bin
# Monitor interrupts
watch -n 1 'cat /proc/interrupts | grep eth0'

Add these kernel parameters if issues persist:

# In /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="... pcie_aspm=off msi=1 pci=nommconf"
# Then run:
sudo update-grub && sudo reboot

For the 82573L controller specifically:

  • Check for known hardware errata
  • Test with different Ethernet cables
  • Try different switch ports
  • Consider PCIe lane issues (try different slot)

After analyzing your dmesg logs and configuration details, we're dealing with multiple symptoms of the same underlying issue:

e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex
e1000e 0000:02:00.0: eth0: Detected Hardware Unit Hang
(unregistered net_device): PHY reset is blocked due to SOL/IDER session
  • Driver version progression from 1.0.2-k4 to 1.2.20
  • Consistent ASPM (Active State Power Management) disabling messages
  • SOL/IDER session blocking PHY reset
  • Hardware Unit Hang errors with transmission descriptor issues

1. Disable ASPM Completely

Add this to your kernel boot parameters in /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm=off"

Then update grub and reboot:

sudo update-grub
sudo reboot

2. Adjust Ethernet Interface Parameters

Create a systemd service to apply optimal settings:

# /etc/systemd/system/e1000e-tune.service
[Unit]
Description=Intel e1000e Tuning
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/sbin/ethtool -G eth0 rx 2048 tx 256
ExecStart=/usr/sbin/ethtool -K eth0 tso off gso off
ExecStart=/usr/sbin/ethtool -C eth0 rx-usecs 10

[Install]
WantedBy=multi-user.target

3. Apply PCI Power Management Tweaks

Create a udev rule:

# /etc/udev/rules.d/99-e1000e.rules
ACTION=="add", SUBSYSTEM=="net", DRIVERS=="e1000e", ATTR{device/power/control}="on"

4. Alternative: Blacklist the Built-in Module

Force using your compiled version:

# /etc/modprobe.d/e1000e.conf
blacklist e1000e
options e1000e IntMode=1,1,1 InterruptThrottleRate=3000,3000,3000

Use these to monitor the interface status:

watch -n 1 "ethtool eth0 | grep -e Speed -e Duplex"
journalctl -f -u NetworkManager
dmesg -wH

If problems persist, collect detailed diagnostics:

sudo ethtool -d eth0 > ethtool-dump.txt
sudo lspci -vvv -s 02:00.0 > lspci.txt
sudo cat /proc/interrupts > interrupts.txt

For persistent hardware hangs, consider:

# /etc/modprobe.d/e1000e.conf
options e1000e max_vfs=0 debug=16

This increases debug output while disabling virtualization features that might interfere.