Linux Network Packet Drops in __netif_receive_skb_core: Diagnosis and Solutions for RX Packet Loss on Ubuntu Servers

When examining the netstat -ni output, we see persistent RX-DRP counters incrementing on the physical interface (eno1) while bridge and virtual interfaces show no drops. The ethtool -S eno1 reveals specific queue drops in rx_queue_2_drops, suggesting a potential bottleneck in CPU core affinity or interrupt handling.

The dropwatch utility pinpoints the exact kernel function where drops occur:

sudo ./dropwatch -l kas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
6 drops at ip_forward+1b5 (0xffffffff97978615)

The /proc/net/softnet_stat shows significant values in the third column (SoftIRQ misses):

008bcbf0 00000000 0000355d 00000000 00000000
004875d8 00000000 00002408 00000000 00000000

First, increase network processing budget and backlog:

# sysctl -w net.core.netdev_budget=600
# sysctl -w net.core.netdev_max_backlog=3000

For Intel I210 NICs specifically, adjust ring buffer sizes:

# ethtool -G eno1 rx 2048 tx 2048
# ethtool -C eno1 rx-usecs 100 rx-frames 50

Check current IRQ assignments:

cat /proc/interrupts | grep eno1
echo "2-3" > /proc/irq/24/smp_affinity_list  # Example for 4-core system

Install and configure irqbalance:

sudo apt install irqbalance
sudo systemctl enable --now irqbalance

For systems running multiple containers, optimize bridge settings:

sudo brctl setfd br-f4e34 0
sudo sysctl -w net.bridge.bridge-nf-call-iptables=0

Capture detailed packet processing metrics:

sudo perf probe -a '__netif_receive_skb_core'
sudo perf stat -e 'probe:__netif_receive_skb_core' -a sleep 10

Monitor softirq distribution across CPUs:

watch -n1 'cat /proc/softirqs | grep NET_RX'

After applying changes, verify improvements with:

watch -n1 'cat /proc/net/softnet_stat; ethtool -S eno1 | grep drops'

Consider updating network driver if issues persist:

sudo apt install --reinstall linux-modules-extra-$(uname -r)

When monitoring network interfaces using netstat -ni, we see consistent RX-DRP increments on the physical interface (eno1) while other bridge/veth interfaces show zero drops. The drops occur even during light traffic conditions (~2 packets/sec during SSH sessions).

# Continuous monitoring command:
watch -n 1 "netstat -ni | grep eno1"

The Intel I210 NIC (igb driver) shows queue-specific drops in queue 2 according to ethtool:

# Check NIC-specific drops:
ethtool -S eno1 | grep -E 'rx_queue.*drops'
rx_queue_2_drops: 35  # This increments over time

Key findings from hardware diagnostics:

RX checksum offloading disabled (confirmed via ethtool -k)
Ring buffers at default 256 (max 4096)
No apparent hardware errors (CRC, alignment, etc.)

Using dropwatch reveals the primary drop location:

# Build and run dropwatch:
git clone https://github.com/pavel-odintsov/drop_watch
cd drop_watch && make
sudo ./dropwatch -l kas

The output consistently points to __netif_receive_skb_core as the main drop point, indicating potential issues with:

SoftIRQ processing capacity
Backlog queue limitations
Packet filtering at the core networking layer

Implemented the following adjustments without resolving drops:

# Increased backlog and budget parameters
echo 4096 > /proc/sys/net/core/netdev_max_backlog
echo 600 > /proc/sys/net/core/netdev_budget
echo 600 > /proc/sys/net/core/netdev_budget_usecs

# Disabled various offloading features
ethtool -K eno1 gro off lro off gso off tso off

The /proc/net/softnet_stat continues showing incrementing counters in column 3, suggesting unprocessed packets despite increased budgets.

With multiple bridge networks and veth pairs, we examined potential namespace-related drops:

# Check interface drops across all namespaces:
for ns in $(ip netns list | awk '{print $1}'); do
    ip netns exec $ns netstat -ni
done

Key findings:

Drops only occur on physical interface, not virtual interfaces
No correlation between container traffic and drop rate
iptables rules (including Docker's) don't show matching drop counters

Implemented kernel tracing to capture drop events:

# Trace packet drops in real-time:
sudo perf probe --add '__netif_receive_skb_core skb->len'
sudo perf record -e probe:__netif_receive_skb_core -a -g -- sleep 30
sudo perf script

After comprehensive testing, the resolution involved:

# Apply final working configuration:
# 1. Increase ring buffers
ethtool -G eno1 rx 2048 tx 2048

# 2. Adjust IRQ balancing
sudo apt install irqbalance
sudo systemctl enable --now irqbalance

# 3. CPU affinity for NIC interrupts
for irq in $(grep eno1 /proc/interrupts | awk -F: '{print $1}'); do
    echo 0-3 > /proc/irq/$irq/smp_affinity_list
done

# 4. Disable problematic power management
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

The root cause was ultimately identified as CPU contention between Docker's network stack processing and the NIC's interrupt handling on the same cores.

ServerDevWorker

Linux Network Packet Drops in __netif_receive_skb_core: Diagnosis and Solutions for RX Packet Loss on Ubuntu Servers

Related Articles