Troubleshooting Persistent ESTABLISHED TCP Connections in Legacy Java Applications on CentOS 4.5


12 views

When monitoring our legacy Java application on CentOS 4.5, we observed frequent crashes accompanied by the following pattern:

java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)

Running netstat -nato reveals hundreds of lingering connections in ESTABLISHED off state:

tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:12059   ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:49179   ESTABLISHED off (0.00/0/0)

The immediate constraint becomes apparent when checking system limits:

$ ulimit -n
1024

The root cause likely stems from improper socket closure in the Java application. Here's a problematic pattern we often see:

// Bad practice - no proper cleanup
try {
    Socket socket = new Socket(host, port);
    // ... use socket ...
} catch (IOException e) {
    e.printStackTrace();
}

Instead, the correct approach should be:

// Proper resource management
Socket socket = null;
try {
    socket = new Socket(host, port);
    // ... use socket ...
} catch (IOException e) {
    e.printStackTrace();
} finally {
    if (socket != null) {
        try {
            socket.close();
        } catch (IOException e) {
            // Log closure error
        }
    }
}

For CentOS 4.5, we can adjust TCP settings to help clean up stale connections:

# Add to /etc/sysctl.conf
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15

# Apply changes
sysctl -p

For JRE 1.6 applications, consider implementing connection monitoring:

// Connection tracking wrapper
public class TrackedSocket extends Socket {
    private static final Set<TrackedSocket> openSockets = 
        Collections.synchronizedSet(new HashSet<>());

    public TrackedSocket() {
        super();
        openSockets.add(this);
    }

    @Override
    public void close() throws IOException {
        openSockets.remove(this);
        super.close();
    }

    public static int getOpenSocketCount() {
        return openSockets.size();
    }
}

To identify connection leaks:

# Monitor file descriptor usage
ls -l /proc/<java_pid>/fd | wc -l

# Check TCP connection states
netstat -nato | grep ESTABLISHED | grep <port> | wc -l

Consider implementing connection timeouts in your server code:

ServerSocket serverSocket = new ServerSocket(port);
serverSocket.setSoTimeout(30000); // 30 second timeout
while (true) {
    try {
        Socket clientSocket = serverSocket.accept();
        clientSocket.setSoTimeout(15000); // 15 second read timeout
        // Handle connection
    } catch (SocketTimeoutException e) {
        // Handle timeout
    }
}

When monitoring our legacy CentOS 4.5 server running a Java 1.6 application, we observed consistent crashes whenever the process hit exactly 1024 connections. The smoking gun was in the ulimit -n output confirming the file descriptor limit:

$ ulimit -n
1024

Running netstat -nato revealed hundreds of lingering connections in ESTABLISHED off state:

tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:12059   ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:10.39.151.20:10000   ::ffff:78.152.97.98:49179   ESTABLISHED off (0.00/0/0)

The root cause appears to be improper socket closure in the Java application. Here's a problematic pattern we often see in legacy code:

// Bad practice - no proper cleanup
try {
    Socket clientSocket = serverSocket.accept();
    processRequest(clientSocket);
} catch (IOException e) {
    e.printStackTrace();
}

Contrast this with proper resource management:

// Correct implementation
try (Socket clientSocket = serverSocket.accept();
     InputStream in = clientSocket.getInputStream();
     OutputStream out = clientSocket.getOutputStream()) {
    
    processRequest(in, out);
} catch (IOException e) {
    logger.error("Connection error", e);
}

For CentOS 4.5, consider these kernel parameters in /etc/sysctl.conf:

# Recycle TIME_WAIT sockets faster
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

# Decrease FIN timeout
net.ipv4.tcp_fin_timeout = 30

# Increase file descriptor limit
fs.file-max = 65536

For immediate relief, this script kills stale connections (run as root):

#!/bin/bash
for conn in $(netstat -nato | awk '/ESTABLISHED off/ {print $5}' | cut -d: -f2); do
    kill -9 $(lsof -ti :$conn)
done

Track connection growth over time with:

watch -n 60 "lsof -p $(pgrep -f your_java_app) | wc -l"

Enable JVM-level socket debugging with these JVM args:

-Dsun.net.inetaddr.ttl=60
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-verbose:gc