When dealing with 50,000+ concurrent TCP connections where 95% of packets are under 150 bytes, traditional network tuning approaches often fall short. The overhead from TCP headers (typically 40 bytes) becomes significant compared to payload size, creating unique bottlenecks.
Your Broadcom NetXtreme II BCM57711E 10Gigabit NIC provides excellent hardware capabilities we can leverage:
# Verify current offload settings
ethtool -k eth0 | grep -E 'tcp-segmentation-offload|generic-segmentation-offload'
tcp-segmentation-offload: on
generic-segmentation-offload: on
Here's an enhanced /etc/sysctl.conf
configuration specifically for small-packet scenarios:
# Socket buffers optimized for small packets
net.core.rmem_default = 16384
net.core.wmem_default = 16384
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# TCP window scaling for high latency
net.ipv4.tcp_window_scaling = 1
# Reduce overhead for small packets
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_base_mss = 256
# Connection management
net.ipv4.tcp_max_syn_backlog = 3240000
net.core.somaxconn = 3240000
net.ipv4.tcp_max_tw_buckets = 1440000
# Disable features that hurt small packets
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_dsack = 0
For small packets, proper interrupt coalescing is crucial to prevent CPU overload:
# Check current settings
ethtool -c eth0
# Recommended settings for 10Gbps small packets:
sudo ethtool -C eth0 rx-usecs 8 rx-frames 32 tx-usecs 8 tx-frames 32
Consider implementing these patterns in your server code:
// Example: Batch small packets before sending
void send_buffered(int sockfd, struct connection *conn) {
if(conn->buffer_len >= OPTIMAL_PACKET_SIZE ||
(now - conn->last_send) > MAX_BUFFER_DELAY) {
send(sockfd, conn->buffer, conn->buffer_len, 0);
conn->buffer_len = 0;
conn->last_send = now;
}
}
// Use MSG_MORE flag when appropriate
send(sockfd, data, len, MSG_MORE);
Essential metrics to track after changes:
# Packet error statistics
ethtool -S eth0 | grep -E 'discard|error'
# Socket buffer utilization
cat /proc/net/sockstat
# TCP retransmission rate
cat /proc/net/netstat | grep -E 'TcpExt|TcpRetrans'
When dealing with high-density TCP connections (50,000+) transmitting primarily small packets (1-150 bytes) on gigabit networks, traditional TCP stack configurations often underperform. The combination of connection overhead and packet processing creates bottlenecks that prevent full utilization of available bandwidth.
Based on your configuration, here are critical adjustments for Ubuntu servers:
# Network buffer optimizations
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
# Connection handling
net.core.somaxconn = 32768
net.core.netdev_max_backlog = 10000
net.ipv4.tcp_max_syn_backlog = 40960
# TCP algorithm tuning
net.ipv4.tcp_sack = 1
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_adv_win_scale = 1
For Broadcom NetXtreme II adapters, consider these ethtool settings:
# Disable LRO which can hurt small packet performance
ethtool -K eth0 lro off
# Enable multi-queue support
ethtool -L eth0 combined 8
# Verify optimal interrupt coalescing
ethtool -C eth0 rx-usecs 50 tx-usecs 50
When writing socket-handling code for this scenario:
// Example of optimized socket options in C
int sock = socket(AF_INET, SOCK_STREAM, 0);
int optval = 1;
// Disable Nagle's algorithm for small packets
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval));
// Increase socket buffer sizes
int rcvbuf = 1024*1024;
int sndbuf = 1024*1024;
setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof(rcvbuf));
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &sndbuf, sizeof(sndbuf));
// Enable TCP quick ACKs
setsockopt(sock, IPPROTO_TCP, TCP_QUICKACK, &optval, sizeof(optval));
Essential commands to verify performance:
# Real-time network stats
sar -n DEV 1
# TCP connection details
ss -s
ss -tulnp
# Detailed NIC statistics
ethtool -S eth0 | grep -E "discard|error|drop"
For extreme scenarios, consider these additional parameters:
# Increase TCP memory pressure thresholds
net.ipv4.tcp_mem = 196608 262144 393216
# Reduce TIME_WAIT duration
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
# Optimize SYN packet handling
net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_max_orphans = 65536