During recent load testing of our enterprise application running on RHEL 5 (Dell 1955 hardware with 2×Dual Core 2.66GHz CPUs and 8GB RAM), we encountered a critical limitation when the system exhausted its default 1024 file descriptors after 24 hours of sustained operation. This discovery prompted a deep dive into file descriptor optimization for high-connection workloads.
The default configuration proves insufficient for applications handling substantial bidirectional traffic. Consider this scenario:
# Current limits (typically shown as): $ ulimit -n 1024 # For a system needing: - 1000 incoming connections - 1000 outgoing connections - Additional file operations = Recipe for failure at scale
Before adjusting limits, we must understand the kernel constraints. RHEL 5's default configuration has several relevant parameters:
# Check current system-wide maximum $ cat /proc/sys/fs/file-max 797854 # Check kernel allocation behavior $ cat /proc/sys/fs/file-nr 1664 0 797854
For production systems expecting heavy loads, we recommend a multi-tiered configuration approach:
# 1. System-wide configuration (/etc/sysctl.conf): fs.file-max = 500000 fs.nr_open = 1000000 # 2. User limits (/etc/security/limits.conf): * soft nofile 100000 * hard nofile 500000 appuser soft nofile 250000 appuser hard nofile 500000 # 3. Application-specific initialization (C example): #include <sys/resource.h> void increase_fd_limit() { struct rlimit rl; getrlimit(RLIMIT_NOFILE, &rl); rl.rlim_cur = rl.rlim_max; setrlimit(RLIMIT_NOFILE, &rl); }
While increasing limits is necessary, we must consider:
- Memory overhead (~1KB per FD for kernel structures)
- Ephemeral port range constraints
- TCP/IP stack tuning requirements
- Monitoring strategy for FD leaks
For comprehensive monitoring, implement this check in your application:
# Bash monitoring script #!/bin/bash threshold=90 current=$(lsof -u appuser | wc -l) max=$(ulimit -n) percentage=$((100*current/max)) if [ $percentage -gt $threshold ]; then logger -p local0.warn "FD usage critical: $current/$max ($percentage%)" fi
For our specific workload (2000 concurrent connections + file operations), we implemented:
# Final production configuration: # /etc/sysctl.conf: fs.file-max = 524288 net.ipv4.ip_local_port_range = 1024 65535 # /etc/security/limits.conf: appuser soft nofile 200000 appuser hard nofile 400000 # Application bootstrap: echo "Setting FD limit..." prlimit --pid $$ --nofile=400000:400000
During recent load testing of our application on RHEL 5 (Dell PowerEdge 1955 with 8GB RAM and dual-core processors), we hit a critical bottleneck when the system exhausted its default 1024 file descriptor limit after 24 hours of sustained operation. This became particularly problematic given our application's architecture:
// Example connection handling in Node.js showing FD usage
const net = require('net');
const connections = [];
// Each incoming connection consumes a file descriptor
for (let i = 0; i < 1500; i++) {
const server = net.createServer();
server.listen(3000 + i);
connections.push(server);
// Each outgoing connection also consumes an FD
const client = net.connect({port: 4000 + i});
connections.push(client);
}
The maximum file descriptors a system can handle depends on multiple factors:
- Kernel-level limits (/proc/sys/fs/file-max)
- User-level limits (ulimit -n)
- Per-process memory constraints
- System-wide file-nr allocation
Check current system-wide limits with:
cat /proc/sys/fs/file-max
cat /proc/sys/fs/file-nr
For high-volume systems expecting ~2000 connections:
# Permanent system-wide configuration
echo "fs.file-max = 500000" >> /etc/sysctl.conf
echo "* soft nofile 100000" >> /etc/security/limits.conf
echo "* hard nofile 150000" >> /etc/security/limits.conf
Here's how to programmatically handle FD limits in a Python application:
import resource
import os
def increase_file_descriptors(target=100000):
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print(f"Current limits: soft={soft}, hard={hard}")
try:
resource.setrlimit(resource.RLIMIT_NOFILE, (target, hard))
print(f"Successfully increased to {target}")
except ValueError as e:
print(f"Failed to increase limit: {e}")
# Fallback strategy
new_target = min(target, hard)
resource.setrlimit(resource.RLIMIT_NOFILE, (new_target, hard))
print(f"Set to maximum allowed: {new_target}")
When implementing high FD systems:
- Monitor FD usage with
lsof | wc -l
- Watch for memory pressure (each FD consumes ~1KB)
- Implement connection pooling where possible
- Consider using epoll/kqueue for efficient event notification