Passive TCP Packet Loss Monitoring in Linux: Analyzing Retransmissions and ACKs for Network Diagnostics


2 views

The Linux kernel exposes TCP connection statistics through several interfaces that allow passive monitoring without injecting test traffic:

# Key files containing TCP metrics
/proc/net/netstat (Contains TcpExtTCPLoss)
/proc/net/snmp (Overall TCP statistics)
/sys/kernel/debug/tracing/events/tcp/tcp_retransmit_skb (for tracing)

Here are the most effective tools for passive TCP loss analysis:

1. ss Command with Extended Stats

ss -e -i -p -m -t
# Output includes:
#   retrans: Number of retransmits
#   bytes_retrans: Bytes retransmitted
#   rto: Retransmit timeout value

2. tcpretrans from BCC Tools

This BPF-based tool traces TCP retransmissions in real-time:

sudo tcpretrans
# Sample output:
# TIME     PID    IP LADDR:LPORT          T> RADDR:RPORT          STATE
# 10:01:23 12345  4  192.168.1.2:443      >  10.0.0.5:53218       ESTABLISHED

3. Custom eBPF Monitoring

For advanced users, this eBPF program tracks retransmissions:

#include <bcc/proto.h>
#include <linux/sched.h>

BPF_HASH(packet_loss, u32);

int trace_tcp_retransmit(struct pt_regs *ctx, struct sock *sk) {
    u32 pid = bpf_get_current_pid_tgid();
    packet_loss.increment(pid);
    return 0;
}

To identify problematic connections, combine multiple metrics:

awk '/TcpExt:/ {print "TCP segments retransmitted:", $21}' /proc/net/netstat
awk '/Tcp:/ {print "Active connections:", $9}' /proc/net/snmp

For ongoing monitoring, consider these visualization tools:

  • Prometheus + Grafana (using node_exporter TCP metrics)
  • Elastic Stack with Packetbeat
  • Custom dashboards using ss/tcpretrans output

This script identifies top talkers with packet loss:

#!/bin/bash
watch -n 5 "ss -ti | \
awk '/retrans/ {split($3, local, \":\"); \
split($5, remote, \":\"); \
print remote[1]\" \"$10}' | \
sort -k2 -nr | head"

When troubleshooting network performance issues, TCP retransmissions serve as critical indicators of packet loss. The Linux kernel maintains extensive statistics about TCP connections through several interfaces:

# Quick check for TCP retransmits
cat /proc/net/netstat | grep -i tcpretrans
ss -ti | grep -i retrans

For comprehensive passive monitoring, consider these approaches:

1. ss Command with Watch

watch -n 1 "ss -tinp | awk '/retrans/ {print \$0}'"

2. tcpretrans from perf-tools

Brendan Gregg's perf-tools provides excellent visibility:

./tcpretrans -l
Tracing TCP retransmits... Ctrl-C to end.
PID    COMM         LADDR           LPORT RADDR           RPORT STATE
1234   nginx        192.168.1.2     80    203.0.113.45    54231 ESTABLISHED

3. eBPF-based Monitoring

For advanced users, eBPF provides real-time insights:

# Using bpftrace
bpftrace -e 'tracepoint:tcp:tcp_retransmit_skb {
    printf("TCP retransmit %s:%d -> %s:%d\n", ntop(args->saddr), 
           args->sport, ntop(args->daddr), args->dport);
}'

For persistent monitoring, consider this Python script using scapy:

from scapy.all import *
from collections import defaultdict

retrans_counts = defaultdict(int)

def packet_callback(pkt):
    if pkt.haslayer(TCP) and pkt[TCP].flags & 0x04:  # RST flag
        retrans_counts[(pkt[IP].src, pkt[IP].dst)] += 1

sniff(filter="tcp", prn=packet_callback, store=0)

For long-term analysis, export data to Prometheus/Grafana:

# Example metric export
import prometheus_client
RETRANS_COUNTER = prometheus_client.Counter(
    'tcp_retransmits_total',
    'TCP retransmissions by connection',
    ['source_ip', 'dest_ip']
)

# In packet_callback:
RETRANS_COUNTER.labels(pkt[IP].src, pkt[IP].dst).inc()

Adjust these sysctls for better monitoring accuracy:

sysctl -w net.ipv4.tcp_retries2=5
sysctl -w net.ipv4.tcp_slow_start_after_idle=0