Advanced Techniques for Diagnosing and Resolving File Descriptor Leaks in Java Applications


2 views

When your Java application (especially in container environments like Glassfish) starts throwing java.io.IOException: Too many open files errors, you're likely dealing with a file descriptor leak. The Linux /proc/PID/fd directory and lsof output showing numerous unidentified socket connections ("can't identify protocol") are classic indicators.

Instead of raw lsof, use these targeted commands:

# Track socket FDs specifically:
lsof -p PID -a -i

# Watch FD growth in real-time:
watch -n 5 "lsof -p PID | wc -l"

# Identify leaking threads:
ls -la /proc/PID/task/*/fd | grep -v " 0 .* 1 .* 2 " | sort

Enable native socket tracking in Java:

System.setProperty("sun.net.spi.nameservice.provider.1", "dns,sun");
System.setProperty("sun.net.inetaddr.ttl", "0");

Then use VisualVM with MBeans plugin to monitor:

  • java.nio.BufferPool
  • java.net.SocketInputStream/SocketOutputStream

Implement a diagnostic interceptor:

public class FDTracker {
    private static final Map<Integer, Exception> openSockets = 
        Collections.synchronizedMap(new WeakHashMap<>());

    public static void trackSocket(Socket s) {
        openSockets.put(System.identityHashCode(s), 
            new Exception("Socket created at:"));
    }

    public static void dumpLeaks() {
        openSockets.forEach((hash, stack) -> 
            System.err.println("Leaked socket #" + hash + "\n" + 
                stack.getMessage()));
    }
}

// Usage:
Socket s = new Socket();
FDTracker.trackSocket(s);

Advanced procfs analysis:

# Continuous FD growth monitoring:
while true; do 
    echo "$(date) $(ls /proc/PID/fd | wc -l)"; 
    sleep 1; 
done

# Network-specific FD analysis:
ss -lntup | grep PID

While investigating, implement these controls:

# Increase FD limits (temporary):
ulimit -n 65535

# Emergency cleanup script:
for fd in /proc/PID/fd/*; do
    if [[ $(readlink $fd) == *"socket"* ]]; then
        echo "Closing $fd";
        exec {fd}<&-;
    fi
done
  1. Implement connection pooling for database/network connections
  2. Add finally blocks for all socket operations
  3. Use try-with-resources for Java 7+:
try (Socket s = new Socket();
     InputStream is = s.getInputStream()) {
    // socket operations
}

For production systems, use BPF tools:

# Track socket creation:
bpftrace -e 'tracepoint:syscalls:sys_enter_socket { 
    printf("PID %d creating socket\n", pid); 
}'

When encountering java.io.IOException: Too many open files in a Glassfish application, we're typically dealing with either:

  • Actual file descriptor leaks
  • Socket connections not being properly closed
  • OS-level resource constraints

The mysterious can't identify protocol sockets in lsof output suggest we're dealing with either:

1. Internal JVM sockets (RMI, JMX)
2. ORM/database connection pool sockets
3. Custom network connections

Enhanced lsof filtering:

# Track socket creation rate
watch -n 60 "lsof -p PID | grep 'can\\'t identify protocol' | wc -l"

# Show only the mysterious sockets with timestamp
lsof -p PID -o +L -i | grep 'can\\'t identify protocol'

JVM-level monitoring:

// Add these JVM flags to track socket creation
-Dsun.net.management.jmxremote.local.only=true 
-Djava.rmi.server.logCalls=true

Implement this socket tracking utility in your codebase:

import java.lang.management.*;
import java.net.*;
import java.util.*;

public class SocketTracker {
    public static void dumpOpenSockets() {
        try {
            Class socketImplClass = Class.forName("java.net.SocketImpl");
            Field socketField = socketImplClass.getDeclaredField("socket");
            socketField.setAccessible(true);
            
            Class serverSocketImplClass = Class.forName("java.net.ServerSocketImpl");
            Field serverSocketField = serverSocketImplClass.getDeclaredField("serverSocket");
            serverSocketField.setAccessible(true);
            
            // Use reflection to access internal socket tables
            // (Implementation varies by JVM version)
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

For connection pools:

# Add to domain.xml
<resources jdbc-connection-pool 
    name="leaking-pool"
    steady-pool-size="8"
    max-pool-size="32"
    pool-resize-quantity="2"
    idle-timeout-in-seconds="300"
    leak-timeout-in-seconds="60"
    leak-reclaim="true"
/>

Create a monitoring script (save as fd_monitor.sh):

#!/bin/bash
PID=$1
INTERVAL=60
LOG_FILE="/var/log/fd_leak_$PID.log"

while true; do
    DATE=$(date +"%Y-%m-%d %H:%M:%S")
    COUNT=$(lsof -p $PID | grep 'can\\'t identify protocol' | wc -l)
    echo "$DATE - $COUNT unidentified sockets" >> $LOG_FILE
    
    # Capture stack traces when threshold exceeded
    if [ $COUNT -gt 100 ]; then
        jstack $PID >> $LOG_FILE
        lsof -p $PID >> $LOG_FILE
    fi
    
    sleep $INTERVAL
done

For Linux systems, use these advanced techniques:

# Trace socket creation system calls
strace -f -e trace=network -p PID 2>&1 | grep 'socket('

# Monitor TCP connections
ss -tulnp | grep PID

# Kernel-level FD tracking
echo 1 > /proc/sys/fs/file-max
sysctl -w fs.file-nr=$(cat /proc/sys/fs/file-nr)

Implement these best practices in your code:

try (Socket socket = new Socket();
     OutputStream os = socket.getOutputStream();
     InputStream is = socket.getInputStream()) {
    // Socket operations
} catch (IOException e) {
    // Handle exception
} finally {
    // Explicit nulling helps GC
    socket = null;
}

For thread pools:

ExecutorService executor = Executors.newFixedThreadPool(10);
// ...
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
    executor.shutdownNow();
}));