Debugging RX_Missed_Errors: Comprehensive Guide for Network Packet Drops on Linux Servers


2 views

When examining packet drops on your NIC (Network Interface Card), you might encounter a situation where ifconfig reports significant RX drops while /sys/class/net/ethX/statistics/rx_dropped shows zero. This typically indicates the drops are being counted as rx_missed_errors instead.

# Typical output showing the discrepancy
$ ifconfig eth2 | grep 'RX.*drop'
          RX packets:2059646370 errors:0 dropped:7142467 overruns:0 frame:0
$ cat /sys/class/net/eth2/statistics/rx_dropped
0
$ cat /sys/class/net/eth2/statistics/rx_missed_errors
7142467

For Intel 10GbE NICs (ixgbe driver), these are the most frequent causes:

  • Insufficient RX descriptor ring buffer size
  • CPU saturation preventing timely packet processing
  • IRQ (Interrupt Request) balancing issues
  • Network traffic bursts exceeding processing capacity
  • Hardware limitations or firmware bugs

Start with these essential diagnostics:

# Check driver settings and hardware info
$ ethtool -i eth2
driver: ixgbe
version: 3.15.1-k
firmware-version: 0x800003e1

# Examine current ring buffer sizes
$ ethtool -g eth2
Ring parameters for eth2:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             512
RX Mini:        0
RX Jumbo:       0
TX:             512

# Check interrupt distribution
$ cat /proc/interrupts | grep eth2

Increasing the RX ring buffer often resolves missed errors:

# Temporarily increase ring buffer (survives reboot)
$ ethtool -G eth2 rx 2048

# Make permanent by adding to /etc/rc.local
ethtool -G eth2 rx 2048

For systems handling high traffic (10Gbps+), consider values between 2048-4096.

Add these to /etc/sysctl.conf for better performance:

# Increase socket read buffers
net.core.rmem_max = 16777216
net.core.rmem_default = 16777216

# Increase number of incoming connections backlog
net.core.netdev_max_backlog = 30000

# Enable interrupt moderation (adjust usecs as needed)
$ ethtool -C eth2 rx-usecs 50

After making changes, monitor improvements with:

# Watch error counters in real-time
$ watch -n1 "cat /sys/class/net/eth2/statistics/rx_missed_errors"

# Check current ring buffer usage (look for drops)
$ ethtool -S eth2 | grep -E 'rx_pkts|rx_missed|rx_no_buffer'

For persistent issues:

  1. Update ixgbe driver and firmware
  2. Test with different MTU sizes (try 9000 for jumbo frames)
  3. Disable power saving features:
    ethtool --set-eee eth2 eee off
  4. Experiment with RSS queues:
    ethtool -L eth2 combined 16

Remember to test changes methodically and monitor their impact.


After migrating services between servers, you're seeing a significant packet drop count in your interface statistics:

$ ifconfig eth2 | grep 'RX.*drop'
      RX packets:2059646370 errors:0 dropped:7142467 overruns:0 frame:0

What's particularly interesting is the discrepancy between different measurement methods:

$ cat /sys/class/net/eth2/statistics/rx_dropped
0
$ cat /sys/class/net/eth2/statistics/rx_missed_errors
7142467

Your NIC uses the ixgbe driver (version 3.15.1-k), which handles 10GbE Intel network interfaces. The rx_missed_errors counter specifically tracks packets that the NIC received but couldn't deliver to the kernel's network stack due to resource constraints.

1. Check ring buffer sizes:

$ ethtool -g eth2
Ring parameters for eth2:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             512
RX Mini:        0
RX Jumbo:       0
TX:             512

2. Monitor interrupt coalescing:

$ ethtool -c eth2
Coalesce parameters for eth2:
Adaptive RX: on  TX: on
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

Based on the ixgbe driver behavior and your migration scenario, these are the most likely culprits:

  • Receive buffer exhaustion - The NIC's ring buffers are too small for the new traffic pattern
  • IRQ balancing issues - CPU cores aren't properly handling interrupts
  • DMA mapping problems - The new server's memory configuration differs
  • Flow control misconfiguration - Missing pause frames during traffic bursts

Increase ring buffers:

$ sudo ethtool -G eth2 rx 4096
$ sudo ethtool -G eth2 tx 4096

Adjust interrupt moderation:

$ sudo ethtool -C eth2 rx-usecs 50 rx-frames 32

Check NUMA settings (if applicable):

$ cat /sys/class/net/eth2/device/numa_node
0
$ numactl --hardware

Create a monitoring script to track the issue:

#!/bin/bash
INTERFACE="eth2"
LOG_FILE="/var/log/nic_stats.log"

while true; do
    RX_MISSED=$(cat /sys/class/net/$INTERFACE/statistics/rx_missed_errors)
    RX_DROPPED=$(cat /sys/class/net/$INTERFACE/statistics/rx_dropped)
    RX=$(ifconfig $INTERFACE | grep 'RX.*drop')
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    
    echo "$TIMESTAMP - RX_MISSED: $RX_MISSED | RX_DROPPED: $RX_DROPPED | IFCONFIG: $RX" >> $LOG_FILE
    sleep 5
done

For persistent issues, consider these deeper investigations:

$ perf stat -e 'ixgbe:*' -a sleep 10
$ ethtool --register-dump eth2 | grep -i 'miss\|drop'
$ dmesg | grep -i 'ixgbe\|dma\|buffer'

Remember that rx_missed_errors often indicate that packets were received by the hardware but couldn't be processed by the driver, typically due to resource constraints rather than network issues.