Optimizing File Descriptor Limits (ulimit -n) for High-Volume Linux Systems: Best Practices and Performance Considerations


3 views

During recent load testing of our enterprise application running on RHEL 5 (Dell 1955 hardware with 2×Dual Core 2.66GHz CPUs and 8GB RAM), we encountered a critical limitation when the system exhausted its default 1024 file descriptors after 24 hours of sustained operation. This discovery prompted a deep dive into file descriptor optimization for high-connection workloads.

The default configuration proves insufficient for applications handling substantial bidirectional traffic. Consider this scenario:

# Current limits (typically shown as):
$ ulimit -n
1024

# For a system needing:
- 1000 incoming connections
- 1000 outgoing connections 
- Additional file operations
= Recipe for failure at scale

Before adjusting limits, we must understand the kernel constraints. RHEL 5's default configuration has several relevant parameters:

# Check current system-wide maximum
$ cat /proc/sys/fs/file-max
797854

# Check kernel allocation behavior
$ cat /proc/sys/fs/file-nr
1664    0       797854

For production systems expecting heavy loads, we recommend a multi-tiered configuration approach:

# 1. System-wide configuration (/etc/sysctl.conf):
fs.file-max = 500000
fs.nr_open = 1000000

# 2. User limits (/etc/security/limits.conf):
* soft nofile 100000
* hard nofile 500000
appuser soft nofile 250000
appuser hard nofile 500000

# 3. Application-specific initialization (C example):
#include <sys/resource.h>

void increase_fd_limit() {
    struct rlimit rl;
    getrlimit(RLIMIT_NOFILE, &rl);
    rl.rlim_cur = rl.rlim_max;
    setrlimit(RLIMIT_NOFILE, &rl);
}

While increasing limits is necessary, we must consider:

  • Memory overhead (~1KB per FD for kernel structures)
  • Ephemeral port range constraints
  • TCP/IP stack tuning requirements
  • Monitoring strategy for FD leaks

For comprehensive monitoring, implement this check in your application:

# Bash monitoring script
#!/bin/bash
threshold=90
current=$(lsof -u appuser | wc -l)
max=$(ulimit -n)
percentage=$((100*current/max))

if [ $percentage -gt $threshold ]; then
    logger -p local0.warn "FD usage critical: $current/$max ($percentage%)"
fi

For our specific workload (2000 concurrent connections + file operations), we implemented:

# Final production configuration:
# /etc/sysctl.conf:
fs.file-max = 524288
net.ipv4.ip_local_port_range = 1024 65535

# /etc/security/limits.conf:
appuser soft nofile 200000
appuser hard nofile 400000

# Application bootstrap:
echo "Setting FD limit..."
prlimit --pid $$ --nofile=400000:400000

During recent load testing of our application on RHEL 5 (Dell PowerEdge 1955 with 8GB RAM and dual-core processors), we hit a critical bottleneck when the system exhausted its default 1024 file descriptor limit after 24 hours of sustained operation. This became particularly problematic given our application's architecture:

// Example connection handling in Node.js showing FD usage
const net = require('net');
const connections = [];

// Each incoming connection consumes a file descriptor
for (let i = 0; i < 1500; i++) {
    const server = net.createServer();
    server.listen(3000 + i);
    connections.push(server);
    // Each outgoing connection also consumes an FD
    const client = net.connect({port: 4000 + i});
    connections.push(client);
}

The maximum file descriptors a system can handle depends on multiple factors:

  • Kernel-level limits (/proc/sys/fs/file-max)
  • User-level limits (ulimit -n)
  • Per-process memory constraints
  • System-wide file-nr allocation

Check current system-wide limits with:

cat /proc/sys/fs/file-max
cat /proc/sys/fs/file-nr

For high-volume systems expecting ~2000 connections:

# Permanent system-wide configuration
echo "fs.file-max = 500000" >> /etc/sysctl.conf
echo "* soft nofile 100000" >> /etc/security/limits.conf
echo "* hard nofile 150000" >> /etc/security/limits.conf

Here's how to programmatically handle FD limits in a Python application:

import resource
import os

def increase_file_descriptors(target=100000):
    soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
    print(f"Current limits: soft={soft}, hard={hard}")
    
    try:
        resource.setrlimit(resource.RLIMIT_NOFILE, (target, hard))
        print(f"Successfully increased to {target}")
    except ValueError as e:
        print(f"Failed to increase limit: {e}")
        # Fallback strategy
        new_target = min(target, hard)
        resource.setrlimit(resource.RLIMIT_NOFILE, (new_target, hard))
        print(f"Set to maximum allowed: {new_target}")

When implementing high FD systems:

  • Monitor FD usage with lsof | wc -l
  • Watch for memory pressure (each FD consumes ~1KB)
  • Implement connection pooling where possible
  • Consider using epoll/kqueue for efficient event notification