After examining the network statistics using netstat -s
, we're seeing concerning counters that keep growing:
[root@primary data]# netstat -s | grep buffer ; sleep 10 ; netstat -s | grep buffer
20560 packets pruned from receive queue because of socket buffer overrun
997586 packets collapsed in receive queue due to low socket buffer
20587 packets pruned from receive queue because of socket buffer overrun
998646 packets collapsed in receive queue due to low socket buffer
These counters can escalate to millions on systems with longer uptimes, indicating a chronic network stack issue.
The two key metrics we're dealing with:
- Packets pruned from receive queue: These are packets that were completely dropped because the socket buffer was full
- Packets collapsed in receive queue: These are packets that were merged to save space when buffers were running low
Here are the relevant sysctl parameters currently in use:
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
For high-throughput systems, consider these additional optimizations:
# Increase the maximum receive buffer size
net.core.rmem_max = 33554432
# Increase the maximum send buffer size
net.core.wmem_max = 33554432
# Auto-tuning receive buffer (min, default, max)
net.ipv4.tcp_rmem = 4096 87380 33554432
# Auto-tuning send buffer (min, default, max)
net.ipv4.tcp_wmem = 4096 65536 33554432
# Increase the maximum number of remembered connection requests
net.ipv4.tcp_max_syn_backlog = 8192
# Increase the listen backlog
net.core.somaxconn = 8192
# Enable TCP window scaling
net.ipv4.tcp_window_scaling = 1
# Enable TCP timestamps for better RTT estimation
net.ipv4.tcp_timestamps = 1
Here's a Python script to monitor these metrics over time:
#!/usr/bin/env python3
import time
import subprocess
def get_netstat_stats():
result = subprocess.run(['netstat', '-s'], capture_output=True, text=True)
lines = result.stdout.split('\n')
pruned = [l for l in lines if 'pruned from receive queue' in l][0].split()[0]
collapsed = [l for l in lines if 'collapsed in receive queue' in l][0].split()[0]
return int(pruned), int(collapsed)
if __name__ == "__main__":
print("Time\tPruned\tCollapsed\tPruned/s\tCollapsed/s")
last_pruned, last_collapsed = get_netstat_stats()
last_time = time.time()
while True:
time.sleep(10)
current_pruned, current_collapsed = get_netstat_stats()
current_time = time.time()
pruned_rate = (current_pruned - last_pruned) / (current_time - last_time)
collapsed_rate = (current_collapsed - last_collapsed) / (current_time - last_time)
print(f"{time.ctime()}\t{current_pruned}\t{current_collapsed}\t{pruned_rate:.2f}\t{collapsed_rate:.2f}")
last_pruned, last_collapsed = current_pruned, current_collapsed
last_time = current_time
For extreme cases where socket buffers are consistently overflowing:
- Consider implementing application-level flow control
- Evaluate if your application is reading from sockets fast enough
- Check for CPU saturation that might prevent timely socket reads
- Consider using SO_RCVBUF/SO_SNDBUF socket options in your application code
Example C code for setting socket buffer sizes programmatically:
#include <sys/socket.h>
#include <stdio.h>
void set_socket_buffers(int sockfd) {
int rcvbuf_size = 1024 * 1024; // 1MB
int sndbuf_size = 1024 * 1024; // 1MB
if (setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)) < 0) {
perror("setsockopt SO_RCVBUF failed");
}
if (setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)) < 0) {
perror("setsockopt SO_SNDBUF failed");
}
}
After making changes, verify the effective buffer sizes:
# Check current buffer sizes
cat /proc/sys/net/core/rmem_max
cat /proc/sys/net/core/wmem_max
# Or check per-socket buffers (replace PID)
ss -mem -p -t -a | grep your_application_name
When running netstat -s
on a Linux system, you might encounter two particularly concerning metrics:
20560 packets pruned from receive queue because of socket buffer overrun
997586 packets collapsed in receive queue due to low socket buffer
These counters tend to grow rapidly, sometimes reaching millions within weeks of uptime. The "pruned" packets indicate data loss, while "collapsed" packets suggest suboptimal performance.
The primary causes for these issues are:
- Insufficient socket buffer sizes (
rmem
/wmem
) - Network traffic bursts exceeding buffer capacity
- Applications not reading data fast enough
- Suboptimal TCP stack tuning
While you've already adjusted some sysctl parameters, here's a more comprehensive approach:
# Recommended settings for high-throughput systems
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 873800 33554432
net.ipv4.tcp_wmem = 4096 655360 33554432
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
For applications handling high network loads, consider implementing:
// Example: Increasing socket buffer sizes in C
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
int recv_size = 32 * 1024 * 1024; // 32MB
int send_size = 32 * 1024 * 1024; // 32MB
setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &recv_size, sizeof(recv_size));
setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &send_size, sizeof(send_size));
After applying changes, monitor the impact with:
# Continuous monitoring script
while true; do
netstat -s | grep -E "pruned from receive queue|collapsed in receive queue"
ss -tem
sleep 10
done
Also check /proc/net/sockstat
for memory allocation statistics.
For extreme cases, consider:
- Implementing receive packet steering (RPS)
- Adjusting NIC ring buffers with
ethtool
- Evaluating application architecture for bottlenecks
- Considering kernel upgrades for newer TCP stack improvements