Optimizing TCP/IP Connections: Solving Excessive TIME_WAIT States in Linux Socket Connections


10 views

When running netstat or ss -ant on Linux systems, seeing thousands of connections in TIME_WAIT state (especially targeting port 111/sunrpc) indicates a TCP/IP connection handling issue. This is a common pain point for developers working with high-throughput socket applications.

# Typical diagnostic commands:
netstat -ant | awk '/^tcp/ {print $6}' | sort | uniq -c
ss -ant | grep 'TIME-WAIT' | wc -l

Each TIME_WAIT connection consumes system resources for 60-240 seconds (default timeout) after closure. In high-traffic scenarios, this can:

  • Exhaust available ephemeral ports (32768-61000 by default)
  • Increase latency as new connections wait for ports
  • Trigger "Address already in use" errors

The specific pattern showing connections between localhost ports and port 111 (sunrpc) suggests either:

  1. An overactive NFS client implementation
  2. A misconfigured service continually querying portmapper
  3. A connection pool not properly recycling sockets

Add these to /etc/sysctl.conf (then run sysctl -p):

# Reduce TIME_WAIT timeout to 30 seconds
net.ipv4.tcp_fin_timeout = 30

# Enable socket reuse
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1  # Note: Dangerous on NAT networks

# Increase ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535

# Increase max connections
net.ipv4.tcp_max_tw_buckets = 2000000

For developers writing socket-based applications:

// Python example with proper socket handling
import socket

def create_connection():
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    try:
        s.connect(('localhost', 111))
        # ... handle connection ...
    finally:
        s.close()  # Ensures proper FIN handshake

# Better alternative using context manager
from contextlib import closing

with closing(socket.socket()) as s:
    s.connect(('localhost', 111))
    # Automatic proper cleanup

For deeper investigation:

# Monitor connection states in real-time
watch -n 1 "ss -s | grep -i wait"

# Check which processes might be responsible
lsof -i :111

# Kernel connection tracking
cat /proc/net/nf_conntrack | grep sunrpc

# Network stack statistics
cat /proc/net/netstat | grep -i tcp

When you're seeing thousands of TCP connections stuck in TIME_WAIT state pointing to localhost:sunrpc (port 111), you're witnessing normal TCP protocol behavior - but at an abnormal scale. Each TIME_WAIT represents a properly closed connection that the kernel maintains for 60 seconds by default (2*MSL) to handle any delayed packets.

# View current TIME_WAIT timeout (in seconds)
cat /proc/sys/net/ipv4/tcp_fin_timeout

The key observations from your netstat output reveal:

  • All connections are local (127.0.0.1)
  • Targeting port 111 (sunrpc)
  • Ephemeral ports in 60XXX range
  • No associated process (PID -)

This suggests an RPC service (like portmapper) is being hammered by local processes - potentially cron jobs, monitoring tools, or misconfigured services making rapid successive calls.

First, identify the source of these RPC calls:

# Monitor RPC calls in real-time
sudo rpcinfo -p
sudo tcpdump -i lo -nn 'port 111' -c 100

# Check which processes use RPC (run as root)
lsof -i :111
netstat -tulp | grep rpc

For temporary relief, adjust these sysctl values:

# Reduce TIME_WAIT duration (default 60)
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

# Enable TIME_WAIT reuse (Linux 4.1+)
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

# Increase available port range
echo '1024 65000' > /proc/sys/net/ipv4/ip_local_port_range

Make changes permanent by adding to /etc/sysctl.conf:

net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65000

For sustainable fixes:

  • RPC Client Optimization: Configure clients to reuse connections (NFS mount options, rpcbind settings)
  • Connection Pooling: Implement keepalive for RPC clients
  • Service Isolation: Containerize services making excessive RPC calls

Example NFS client optimization:

mount -o proto=tcp,vers=3,timeo=600,retrans=2,hard,intr \
  nfsserver:/share /mnt/share

Set up alerts for TIME_WAIT buildup:

# Nagios check example
#!/bin/bash
WARN=1000
CRIT=5000

count=$(netstat -ant | grep -c 'TIME_WAIT.*:111')
if [ $count -gt $CRIT ]; then
  echo "CRITICAL: $count RPC TIME_WAIT connections"
  exit 2
elif [ $count -gt $WARN ]; then
  echo "WARNING: $count RPC TIME_WAIT connections"
  exit 1
else
  echo "OK: $count RPC TIME_WAIT connections"
  exit 0
fi