How to Identify Processes Behind Short-Lived TCP Connections on Linux Servers


3 views

Debugging short-lived TCP connections that appear in tcpdump but vanish before traditional tools like netstat or ss can capture them is particularly frustrating. These connections typically appear in TIME_WAIT state when we try to investigate, which means the process information is already gone from kernel tables.

Here are the most effective approaches I've found for tracking down these connection sources:

1. SystemTap for Real-time Monitoring

SystemTap provides kernel-level visibility into connection attempts:


probe kernel.function("tcp_v4_connect") {
    printf("%s [%d] connecting to %s:%d\n", 
        execname(), pid(), ip_ntop(htonl($daddr)), htons($dport))
}

2. eBPF/bcc Tools

The bcc toolkit includes powerful networking probes:


# Track TCP connections with process info
/usr/share/bcc/tools/tcpconnect

3. Audit Framework

Configure the audit subsystem to log socket connections:


auditctl -a exit,always -F arch=b64 -S connect

In the specific case mentioned (HAProxy health checks), here's how to verify:


strace -f -e trace=network -p $(pgrep haproxy)

For containerized environments, check the network namespace:


nsenter -t $(pidof haproxy) -n netstat -antp

Remember that some monitoring methods add overhead. For production systems:

  • Use sampling (e.g., SystemTap's timer probes)
  • Filter by destination port
  • Consider kernel rate limiting

When monitoring Apache traffic with tcpdump, I observed TCP connections being established and terminated exactly every 2 seconds. The connections completed their full lifecycle (SYN→SYN-ACK→ACK→FIN) too quickly for traditional tools to capture process ownership:

# tcpdump -i any port 80
12:34:56.789 IP client.54231 > server.80: Flags [S], seq 123456
12:34:56.789 IP server.80 > client.54231: Flags [S.], seq 654321, ack 123457
12:34:56.790 IP client.54231 > server.80: Flags [.], ack 1
12:34:56.790 IP client.54231 > server.80: Flags [F.], seq 1, ack 1
12:34:56.790 IP server.80 > client.54231: Flags [.], ack 2

netstat -ctp shows process information only for active connections, not TIME_WAIT states. By the time you run the command, the connection has already transitioned:

# netstat -ctp
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 server:http client:54231 TIME_WAIT -

Use these methods to catch ephemeral connections:

1. eBPF/bcc trace

# /usr/share/bcc/tools/tcpconnect -t
TIME(s) PID COMM IP SADDR DADDR DPORT
12.345 5678 haproxy 4 10.0.0.1 10.0.0.2 80

2. SystemTap Script

probe kernel.function("tcp_v4_connect") {
    printf("%s [%d] %s → %s:%d\\n", execname(), pid(), ip_ntop(@cast($sk->__sk_common.skc_daddr, "in_addr")->s_addr),
           ip_ntop(@cast($sk->__sk_common.skc_rcv_saddr, "in_addr")->s_addr), $dport)
}

3. Audit Framework

# auditctl -a exit,always -F arch=b64 -S connect -k short_tcp
# ausearch -k short_tcp -i

Once you've identified haproxy as the culprit, verify its configuration:

backend web
    option httpchk
    server s1 10.0.0.2:80 check inter 2s

The inter 2s parameter explains the 2-second interval observed in tcpdump.

For production environments, deploy this bpftrace script to log all short-lived connections:

#!/usr/bin/bpftrace

kprobe:tcp_close
{
    $sk = (struct sock *)arg0;
    $duration = nsecs - $sk->sk_start_connect_ns;
    
    if ($duration < 5000000000) { // 5 seconds
        time("%H:%M:%S ");
        printf("Short connection (%d ms): %s pid=%d\\n", 
               $duration/1000000,
               comm, pid);
    }
}