When building a stats server that needs to handle 10,000 TCP connections per second, you're dealing with several layers of potential bottlenecks. Even with modern 8-core servers, default OS configurations often impose artificial limits that need careful tuning.
- File Descriptor Limits: The default 1024 limit won't cut it
- TCP TIME_WAIT State: Can exhaust available ports
- Kernel Parameters: net.core.somaxconn, tcp_max_syn_backlog etc.
- NIC Queue Settings: IRQ balancing and ring buffers
Here are the critical sysctl settings for CentOS:
# /etc/sysctl.conf
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 65536
net.core.netdev_max_backlog = 20000
fs.file-max = 100000
Here's a basic Python implementation using asyncio:
import asyncio
async def handle_client(reader, writer):
global counter
counter += 1
writer.close()
await writer.wait_closed()
async def main():
server = await asyncio.start_server(
handle_client, '0.0.0.0', 8888,
backlog=10000
)
async with server:
await server.serve_forever()
counter = 0
asyncio.run(main())
Essential tools to verify your configuration:
# Current connection statistics
ss -s
# Monitor TCP states
cat /proc/net/sockstat
# File descriptor usage
lsof | wc -l
# Network interrupts
cat /proc/interrupts | grep eth0
For production-grade implementations, consider:
- Using kernel bypass techniques like DPDK for extreme cases
- Implementing connection pooling if clients support it
- Exploring UDP instead of TCP if you can tolerate packet loss
- Distributing load across multiple ports if hitting single-port limits
While your 8-core box should handle this load, pay attention to:
- NIC queue configuration (ethtool -L)
- IRQ balancing (irqbalance service)
- NUMA awareness if using multi-socket systems
- PCIe bandwidth for high-speed NICs
When building a stats server that needs to handle 10,000 TCP connections per second, you'll encounter several bottlenecks in the Linux networking stack before hitting hardware limits. The 8-core CentOS box should have sufficient CPU power for this simple counter service, but the default OS configuration isn't optimized for such high connection rates.
# Check current kernel settings
sysctl net.ipv4.tcp_max_syn_backlog
sysctl net.core.somaxconn
sysctl net.core.netdev_max_backlog
These three parameters form the first bottleneck. The default tcp_max_syn_backlog
(typically 128) limits SYN packets in the queue, somaxconn
(usually 128) caps the connection accept queue, and netdev_max_backlog
(often 1000) restricts packets waiting in the NIC driver queue.
# /etc/sysctl.conf optimizations
net.ipv4.tcp_max_syn_backlog = 8192
net.core.somaxconn = 8192
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535
With 10k connections per second, you'll exhaust the default ephemeral port range (32768-60999) in about 3 seconds. Expand it to 1024-65535:
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
For maximum performance, consider these architecture choices:
// Sample C pseudo-code for high-performance accept()
int listen_sock = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(listen_sock, SOL_SOCKET, SO_REUSEPORT, &option, sizeof(option));
listen(listen_sock, 8192); // Must match somaxconn
// Worker threads
for (int i = 0; i < num_cores; i++) {
pthread_create(&thread, NULL, worker, NULL);
}
void* worker(void* arg) {
while (1) {
int client = accept(listen_sock, NULL, NULL);
counter++;
close(client);
}
}
Even with 10G NICs, you might hit interrupt processing limits. Enable RSS (Receive Side Scaling) and spread interrupts across cores:
# Check available RSS queues
ethtool -l eth0
# Enable multi-queue
ethtool -L eth0 combined 8
# Balance IRQs across cores
service irqbalance start
Essential monitoring commands:
# Connection tracking
ss -s
# SYN backlog overflow
netstat -s | grep -i listen
# NIC statistics
ethtool -S eth0 | grep -i drop
If you see drops in the ListenOverflows
or ListenDrops
, increase your backlog queues further.