Optimizing High-Volume TCP Packet Capture: Solving Dropped Packets on Busy Interfaces


3 views

When capturing network traffic at scale using tcpdump on high-throughput interfaces, packet drops become inevitable without proper tuning. The kernel-reported drops ("packets dropped by kernel") indicate a bottleneck in the packet capture pipeline that needs addressing.

Common bottlenecks include:

1. Kernel ring buffer overflow
2. Userspace processing latency
3. Storage I/O bottlenecks
4. CPU contention during compression

In our testing, increasing rmem_max and rmem_default helped reduce drops by about 50%, suggesting ring buffer sizing is part of the solution:

# Current settings (check first)
sysctl net.core.rmem_max
sysctl net.core.rmem_default

# Temporary increase (example values)
echo 16777216 > /proc/sys/net/core/rmem_max
echo 4194304 > /proc/sys/net/core/rmem_default

For multi-interface capture with rotation, consider this optimized approach:

tcpdump -n -C 1000 -W 10000 -z ./compress.sh \
  -i any -G 3600 -w '/data/cap_%Y%m%d_%H%M%S.pcap' \
  -B 4096000 -s 96 \
  "(net 192.168.1.0/24 or net 10.0.0.0/8)"

Key parameters explained:

  • -B 4096000: Sets 4MB buffer size per interface
  • -s 96: Snaplength optimized for header capture
  • -G 3600: Hourly rotation independent of file size

For sustained high-volume capture, consider these sysctl tweaks:

# Increase socket buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 50000

# NIC specific tuning (replace ethX with your interface)
ethtool -G ethX rx 4096 tx 4096
ethtool -C ethX rx-usecs 30 rx-frames 0

When tcpdump can't keep up:

# Using dumpcap (Wireshark's capture engine)
dumpcap -i eth0 -i eth1 -b filesize:1000 -b files:10000 \
  -w /data/capture.pcap -f "not port 22"

# Using PF_RING (kernel bypass)
pfcount -i eth0 -i eth1 -c 1 -h ./compress.sh

Implement this housekeeping script (cleanup.sh) to maintain disk space:

#!/bin/bash
TARGET_DIR="/data/captures"
MAX_USAGE=95
MIN_FREE=100G

while true; do
  usage=$(df $TARGET_DIR | awk '{print $5}' | tail -1 | tr -d '%')
  free=$(df -h $TARGET_DIR | awk '{print $4}' | tail -1)

  if [[ $usage -ge $MAX_USAGE ]] || [[ $free < $MIN_FREE ]]; then
    oldest=$(ls -t $TARGET_DIR/*.pcap* | tail -1)
    rm -f "$oldest"
  fi
  sleep 300
done

Create a monitoring solution with this Python script:

import psutil
from datetime import datetime

def check_drops(interface):
    stats = psutil.net_io_counters(pernic=True).get(interface, None)
    if stats:
        return stats.dropin, stats.dropout
    return 0, 0

while True:
    ts = datetime.now().isoformat()
    eth0_in, eth0_out = check_drops('eth0')
    eth1_in, eth1_out = check_drops('eth1')
    print(f"{ts} - eth0 drops: in={eth0_in} out={eth0_out} | eth1 drops: in={eth1_in} out={eth1_out}")
    time.sleep(5)

When monitoring multiple promiscuous interfaces handling substantial traffic, packet drops become inevitable with standard tcpdump configurations. The root causes typically involve:

  • Kernel buffer limitations
  • I/O bottlenecks during file rotation
  • CPU contention during compression
  • Inefficient interface handling with -i any

First, maximize kernel network buffers:

# Set ring buffer sizes
sudo ethtool -G ethX rx 4096 tx 4096
sudo ethtool -G ethY rx 4096 tx 4096

# Increase kernel buffer limits
echo 4194304 | sudo tee /proc/sys/net/core/rmem_max
echo 4194304 | sudo tee /proc/sys/net/core/rmem_default
echo 4194304 | sudo tee /proc/sys/net/core/wmem_max

For multi-interface capture with minimal drops:

tcpdump -n \
  -C 1000 \
  -W 10000 \
  -s 0 \
  -B 4096 \
  -z /opt/scripts/compress_and_archive.sh \
  -i "ethX or ethY" \
  -w /data/capture_%Y%m%d_%H%M%S.pcap \
  "not (port 22 or port 53)"

Option 1: PF_RING
Rebuild tcpdump with PF_RING support:

git clone https://github.com/ntop/PF_RING.git
cd PF_RING/userland/tcpdump-4.9.1/
./configure --prefix=/usr/local/pfring
make
sudo make install

Option 2: Multi-process Capture
Use GNU parallel for interface-specific capture:

parallel -j2 'tcpdump -ni {} -s0 -C1000 -z /opt/scripts/compress.sh \
  -w /data/capture_{}_%F.pcap \
  "not net 192.168.0.0/16"' ::: ethX ethY

Implement a rotating archive system:

#!/bin/bash
# compress_and_archive.sh
TARGET="/archive/$(date +%Y/%m/%d)"
mkdir -p "$TARGET"
xz -T0 -9 "$1" -c > "$TARGET/$(basename "$1").xz"
find /archive -type f -mtime +30 -delete

Track drops with this monitoring script:

#!/bin/bash
while true; do
  drops=$(grep -E "packets dropped by kernel" /proc/net/dev | awk '{sum+=$NF} END{print sum}')
  [ "$drops" -gt 1000 ] && \
    echo "Warning: $drops packets dropped" | \
    mail -s "Packet Drop Alert" admin@example.com
  sleep 60
done