When examining the netstat -ni
output, we see persistent RX-DRP counters incrementing on the physical interface (eno1
) while bridge and virtual interfaces show no drops. The ethtool -S eno1
reveals specific queue drops in rx_queue_2_drops
, suggesting a potential bottleneck in CPU core affinity or interrupt handling.
The dropwatch
utility pinpoints the exact kernel function where drops occur:
sudo ./dropwatch -l kas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
6 drops at ip_forward+1b5 (0xffffffff97978615)
The /proc/net/softnet_stat
shows significant values in the third column (SoftIRQ misses):
008bcbf0 00000000 0000355d 00000000 00000000
004875d8 00000000 00002408 00000000 00000000
First, increase network processing budget and backlog:
# sysctl -w net.core.netdev_budget=600
# sysctl -w net.core.netdev_max_backlog=3000
For Intel I210 NICs specifically, adjust ring buffer sizes:
# ethtool -G eno1 rx 2048 tx 2048
# ethtool -C eno1 rx-usecs 100 rx-frames 50
Check current IRQ assignments:
cat /proc/interrupts | grep eno1
echo "2-3" > /proc/irq/24/smp_affinity_list # Example for 4-core system
Install and configure irqbalance:
sudo apt install irqbalance
sudo systemctl enable --now irqbalance
For systems running multiple containers, optimize bridge settings:
sudo brctl setfd br-f4e34 0
sudo sysctl -w net.bridge.bridge-nf-call-iptables=0
Capture detailed packet processing metrics:
sudo perf probe -a '__netif_receive_skb_core'
sudo perf stat -e 'probe:__netif_receive_skb_core' -a sleep 10
Monitor softirq distribution across CPUs:
watch -n1 'cat /proc/softirqs | grep NET_RX'
After applying changes, verify improvements with:
watch -n1 'cat /proc/net/softnet_stat; ethtool -S eno1 | grep drops'
Consider updating network driver if issues persist:
sudo apt install --reinstall linux-modules-extra-$(uname -r)
When monitoring network interfaces using netstat -ni
, we see consistent RX-DRP increments on the physical interface (eno1) while other bridge/veth interfaces show zero drops. The drops occur even during light traffic conditions (~2 packets/sec during SSH sessions).
# Continuous monitoring command:
watch -n 1 "netstat -ni | grep eno1"
The Intel I210 NIC (igb driver) shows queue-specific drops in queue 2 according to ethtool:
# Check NIC-specific drops:
ethtool -S eno1 | grep -E 'rx_queue.*drops'
rx_queue_2_drops: 35 # This increments over time
Key findings from hardware diagnostics:
- RX checksum offloading disabled (confirmed via
ethtool -k
) - Ring buffers at default 256 (max 4096)
- No apparent hardware errors (CRC, alignment, etc.)
Using dropwatch reveals the primary drop location:
# Build and run dropwatch:
git clone https://github.com/pavel-odintsov/drop_watch
cd drop_watch && make
sudo ./dropwatch -l kas
The output consistently points to __netif_receive_skb_core
as the main drop point, indicating potential issues with:
- SoftIRQ processing capacity
- Backlog queue limitations
- Packet filtering at the core networking layer
Implemented the following adjustments without resolving drops:
# Increased backlog and budget parameters
echo 4096 > /proc/sys/net/core/netdev_max_backlog
echo 600 > /proc/sys/net/core/netdev_budget
echo 600 > /proc/sys/net/core/netdev_budget_usecs
# Disabled various offloading features
ethtool -K eno1 gro off lro off gso off tso off
The /proc/net/softnet_stat
continues showing incrementing counters in column 3, suggesting unprocessed packets despite increased budgets.
With multiple bridge networks and veth pairs, we examined potential namespace-related drops:
# Check interface drops across all namespaces:
for ns in $(ip netns list | awk '{print $1}'); do
ip netns exec $ns netstat -ni
done
Key findings:
- Drops only occur on physical interface, not virtual interfaces
- No correlation between container traffic and drop rate
- iptables rules (including Docker's) don't show matching drop counters
Implemented kernel tracing to capture drop events:
# Trace packet drops in real-time:
sudo perf probe --add '__netif_receive_skb_core skb->len'
sudo perf record -e probe:__netif_receive_skb_core -a -g -- sleep 30
sudo perf script
After comprehensive testing, the resolution involved:
# Apply final working configuration:
# 1. Increase ring buffers
ethtool -G eno1 rx 2048 tx 2048
# 2. Adjust IRQ balancing
sudo apt install irqbalance
sudo systemctl enable --now irqbalance
# 3. CPU affinity for NIC interrupts
for irq in $(grep eno1 /proc/interrupts | awk -F: '{print $1}'); do
echo 0-3 > /proc/irq/$irq/smp_affinity_list
done
# 4. Disable problematic power management
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
The root cause was ultimately identified as CPU contention between Docker's network stack processing and the NIC's interrupt handling on the same cores.