Troubleshooting Persistent ESTABLISHED TCP Connections in Legacy Java Applications on CentOS 4.5


3 views

When monitoring our legacy Java application on CentOS 4.5, we observed frequent crashes accompanied by the following pattern:

java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)

Running netstat -nato reveals hundreds of lingering connections in ESTABLISHED off state:

tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:12059   ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:49179   ESTABLISHED off (0.00/0/0)

The immediate constraint becomes apparent when checking system limits:

$ ulimit -n
1024

The root cause likely stems from improper socket closure in the Java application. Here's a problematic pattern we often see:

// Bad practice - no proper cleanup
try {
    Socket socket = new Socket(host, port);
    // ... use socket ...
} catch (IOException e) {
    e.printStackTrace();
}

Instead, the correct approach should be:

// Proper resource management
Socket socket = null;
try {
    socket = new Socket(host, port);
    // ... use socket ...
} catch (IOException e) {
    e.printStackTrace();
} finally {
    if (socket != null) {
        try {
            socket.close();
        } catch (IOException e) {
            // Log closure error
        }
    }
}

For CentOS 4.5, we can adjust TCP settings to help clean up stale connections:

# Add to /etc/sysctl.conf
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15

# Apply changes
sysctl -p

For JRE 1.6 applications, consider implementing connection monitoring:

// Connection tracking wrapper
public class TrackedSocket extends Socket {
    private static final Set<TrackedSocket> openSockets = 
        Collections.synchronizedSet(new HashSet<>());

    public TrackedSocket() {
        super();
        openSockets.add(this);
    }

    @Override
    public void close() throws IOException {
        openSockets.remove(this);
        super.close();
    }

    public static int getOpenSocketCount() {
        return openSockets.size();
    }
}

To identify connection leaks:

# Monitor file descriptor usage
ls -l /proc/<java_pid>/fd | wc -l

# Check TCP connection states
netstat -nato | grep ESTABLISHED | grep <port> | wc -l

Consider implementing connection timeouts in your server code:

ServerSocket serverSocket = new ServerSocket(port);
serverSocket.setSoTimeout(30000); // 30 second timeout
while (true) {
    try {
        Socket clientSocket = serverSocket.accept();
        clientSocket.setSoTimeout(15000); // 15 second read timeout
        // Handle connection
    } catch (SocketTimeoutException e) {
        // Handle timeout
    }
}

When monitoring our legacy CentOS 4.5 server running a Java 1.6 application, we observed consistent crashes whenever the process hit exactly 1024 connections. The smoking gun was in the ulimit -n output confirming the file descriptor limit:

$ ulimit -n
1024

Running netstat -nato revealed hundreds of lingering connections in ESTABLISHED off state:

tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:12059   ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:49179   ESTABLISHED off (0.00/0/0)

The root cause appears to be improper socket closure in the Java application. Here's a problematic pattern we often see in legacy code:

// Bad practice - no proper cleanup
try {
    Socket clientSocket = serverSocket.accept();
    processRequest(clientSocket);
} catch (IOException e) {
    e.printStackTrace();
}

Contrast this with proper resource management:

// Correct implementation
try (Socket clientSocket = serverSocket.accept();
     InputStream in = clientSocket.getInputStream();
     OutputStream out = clientSocket.getOutputStream()) {
    
    processRequest(in, out);
} catch (IOException e) {
    logger.error("Connection error", e);
}

For CentOS 4.5, consider these kernel parameters in /etc/sysctl.conf:

# Recycle TIME_WAIT sockets faster
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

# Decrease FIN timeout
net.ipv4.tcp_fin_timeout = 30

# Increase file descriptor limit
fs.file-max = 65536

For immediate relief, this script kills stale connections (run as root):

#!/bin/bash
for conn in $(netstat -nato | awk '/ESTABLISHED off/ {print $5}' | cut -d: -f2); do
    kill -9 $(lsof -ti :$conn)
done

Track connection growth over time with:

watch -n 60 "lsof -p $(pgrep -f your_java_app) | wc -l"

Enable JVM-level socket debugging with these JVM args:

-Dsun.net.inetaddr.ttl=60
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-verbose:gc