Debugging short-lived TCP connections that appear in tcpdump
but vanish before traditional tools like netstat
or ss
can capture them is particularly frustrating. These connections typically appear in TIME_WAIT
state when we try to investigate, which means the process information is already gone from kernel tables.
Here are the most effective approaches I've found for tracking down these connection sources:
1. SystemTap for Real-time Monitoring
SystemTap provides kernel-level visibility into connection attempts:
probe kernel.function("tcp_v4_connect") {
printf("%s [%d] connecting to %s:%d\n",
execname(), pid(), ip_ntop(htonl($daddr)), htons($dport))
}
2. eBPF/bcc Tools
The bcc toolkit includes powerful networking probes:
# Track TCP connections with process info
/usr/share/bcc/tools/tcpconnect
3. Audit Framework
Configure the audit subsystem to log socket connections:
auditctl -a exit,always -F arch=b64 -S connect
In the specific case mentioned (HAProxy health checks), here's how to verify:
strace -f -e trace=network -p $(pgrep haproxy)
For containerized environments, check the network namespace:
nsenter -t $(pidof haproxy) -n netstat -antp
Remember that some monitoring methods add overhead. For production systems:
- Use sampling (e.g., SystemTap's timer probes)
- Filter by destination port
- Consider kernel rate limiting
When monitoring Apache traffic with tcpdump
, I observed TCP connections being established and terminated exactly every 2 seconds. The connections completed their full lifecycle (SYN→SYN-ACK→ACK→FIN) too quickly for traditional tools to capture process ownership:
# tcpdump -i any port 80
12:34:56.789 IP client.54231 > server.80: Flags [S], seq 123456
12:34:56.789 IP server.80 > client.54231: Flags [S.], seq 654321, ack 123457
12:34:56.790 IP client.54231 > server.80: Flags [.], ack 1
12:34:56.790 IP client.54231 > server.80: Flags [F.], seq 1, ack 1
12:34:56.790 IP server.80 > client.54231: Flags [.], ack 2
netstat -ctp
shows process information only for active connections, not TIME_WAIT states. By the time you run the command, the connection has already transitioned:
# netstat -ctp
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 server:http client:54231 TIME_WAIT -
Use these methods to catch ephemeral connections:
1. eBPF/bcc trace
# /usr/share/bcc/tools/tcpconnect -t
TIME(s) PID COMM IP SADDR DADDR DPORT
12.345 5678 haproxy 4 10.0.0.1 10.0.0.2 80
2. SystemTap Script
probe kernel.function("tcp_v4_connect") {
printf("%s [%d] %s → %s:%d\\n", execname(), pid(), ip_ntop(@cast($sk->__sk_common.skc_daddr, "in_addr")->s_addr),
ip_ntop(@cast($sk->__sk_common.skc_rcv_saddr, "in_addr")->s_addr), $dport)
}
3. Audit Framework
# auditctl -a exit,always -F arch=b64 -S connect -k short_tcp
# ausearch -k short_tcp -i
Once you've identified haproxy as the culprit, verify its configuration:
backend web
option httpchk
server s1 10.0.0.2:80 check inter 2s
The inter 2s
parameter explains the 2-second interval observed in tcpdump.
For production environments, deploy this bpftrace script to log all short-lived connections:
#!/usr/bin/bpftrace
kprobe:tcp_close
{
$sk = (struct sock *)arg0;
$duration = nsecs - $sk->sk_start_connect_ns;
if ($duration < 5000000000) { // 5 seconds
time("%H:%M:%S ");
printf("Short connection (%d ms): %s pid=%d\\n",
$duration/1000000,
comm, pid);
}
}