Optimizing TCP/IP Connections: Solving Excessive TIME_WAIT States in Linux Socket Connections

When running netstat or ss -ant on Linux systems, seeing thousands of connections in TIME_WAIT state (especially targeting port 111/sunrpc) indicates a TCP/IP connection handling issue. This is a common pain point for developers working with high-throughput socket applications.

# Typical diagnostic commands:
netstat -ant | awk '/^tcp/ {print $6}' | sort | uniq -c
ss -ant | grep 'TIME-WAIT' | wc -l

Each TIME_WAIT connection consumes system resources for 60-240 seconds (default timeout) after closure. In high-traffic scenarios, this can:

Exhaust available ephemeral ports (32768-61000 by default)
Increase latency as new connections wait for ports
Trigger "Address already in use" errors

The specific pattern showing connections between localhost ports and port 111 (sunrpc) suggests either:

An overactive NFS client implementation
A misconfigured service continually querying portmapper
A connection pool not properly recycling sockets

Add these to /etc/sysctl.conf (then run sysctl -p):

# Reduce TIME_WAIT timeout to 30 seconds
net.ipv4.tcp_fin_timeout = 30

# Enable socket reuse
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1  # Note: Dangerous on NAT networks

# Increase ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535

# Increase max connections
net.ipv4.tcp_max_tw_buckets = 2000000

For developers writing socket-based applications:

// Python example with proper socket handling
import socket

def create_connection():
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    try:
        s.connect(('localhost', 111))
        # ... handle connection ...
    finally:
        s.close()  # Ensures proper FIN handshake

# Better alternative using context manager
from contextlib import closing

with closing(socket.socket()) as s:
    s.connect(('localhost', 111))
    # Automatic proper cleanup

For deeper investigation:

# Monitor connection states in real-time
watch -n 1 "ss -s | grep -i wait"

# Check which processes might be responsible
lsof -i :111

# Kernel connection tracking
cat /proc/net/nf_conntrack | grep sunrpc

# Network stack statistics
cat /proc/net/netstat | grep -i tcp

When you're seeing thousands of TCP connections stuck in TIME_WAIT state pointing to localhost:sunrpc (port 111), you're witnessing normal TCP protocol behavior - but at an abnormal scale. Each TIME_WAIT represents a properly closed connection that the kernel maintains for 60 seconds by default (2*MSL) to handle any delayed packets.

# View current TIME_WAIT timeout (in seconds)
cat /proc/sys/net/ipv4/tcp_fin_timeout

The key observations from your netstat output reveal:

All connections are local (127.0.0.1)
Targeting port 111 (sunrpc)
Ephemeral ports in 60XXX range
No associated process (PID -)

This suggests an RPC service (like portmapper) is being hammered by local processes - potentially cron jobs, monitoring tools, or misconfigured services making rapid successive calls.

First, identify the source of these RPC calls:

# Monitor RPC calls in real-time
sudo rpcinfo -p
sudo tcpdump -i lo -nn 'port 111' -c 100

# Check which processes use RPC (run as root)
lsof -i :111
netstat -tulp | grep rpc

For temporary relief, adjust these sysctl values:

# Reduce TIME_WAIT duration (default 60)
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

# Enable TIME_WAIT reuse (Linux 4.1+)
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

# Increase available port range
echo '1024 65000' > /proc/sys/net/ipv4/ip_local_port_range

Make changes permanent by adding to /etc/sysctl.conf:

net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65000

For sustainable fixes:

RPC Client Optimization: Configure clients to reuse connections (NFS mount options, rpcbind settings)
Connection Pooling: Implement keepalive for RPC clients
Service Isolation: Containerize services making excessive RPC calls

Example NFS client optimization:

mount -o proto=tcp,vers=3,timeo=600,retrans=2,hard,intr \
  nfsserver:/share /mnt/share

Set up alerts for TIME_WAIT buildup:

# Nagios check example
#!/bin/bash
WARN=1000
CRIT=5000

count=$(netstat -ant | grep -c 'TIME_WAIT.*:111')
if [ $count -gt $CRIT ]; then
  echo "CRITICAL: $count RPC TIME_WAIT connections"
  exit 2
elif [ $count -gt $WARN ]; then
  echo "WARNING: $count RPC TIME_WAIT connections"
  exit 1
else
  echo "OK: $count RPC TIME_WAIT connections"
  exit 0
fi

ServerDevWorker

Optimizing TCP/IP Connections: Solving Excessive TIME_WAIT States in Linux Socket Connections

Related Articles