When dealing with high-latency networks (100ms RTT in your case), traditional TCP configurations often underutilize available bandwidth. Your current setup shows:
TCP Window Size: 5.2MB (well configured) Retransmission Rate: 0.29% (9018144/3085179704) Average Congestion Window: 3.3MB
The 200Mbps throughput suggests the congestion window isn't scaling properly despite using TCP Scalable. Let's examine why.
Your configuration shows proper window scaling capability:
# sysctl values already set net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 1
However, the average congestion window (owin) of 3.3MB is below the maximum advertised window (5.2MB). This indicates either:
- Insufficient buffer space for the congestion algorithm to grow
- Packet loss triggering unnecessary window reduction
- Delayed ACKs causing window growth stagnation
Add these to your existing configuration:
# Enable BBR for better high-latency performance echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf # Optimize buffer management echo "net.ipv4.tcp_adv_win_scale=2" >> /etc/sysctl.conf echo "net.ipv4.tcp_app_win=31" >> /etc/sysctl.conf # Adjust delayed ACK behavior echo "net.ipv4.tcp_delack_min=10" >> /etc/sysctl.conf echo "net.ipv4.tcp_slow_start_after_idle=0" >> /etc/sysctl.conf # Apply changes sysctl -p
Use these tools to verify improvements:
# Real-time monitoring ss -t -i -n -p state established '( dport = :5201 )' # TCP diagnostics tcptrace -l --csv your_capture.pcap > analysis.csv
Key metrics to watch:
- Congestion window size over time
- Retransmission patterns
- RTT variance
Consider testing these algorithms:
# Available algorithms sysctl net.ipv4.tcp_available_congestion_control # Try BBR (often best for high latency) sysctl -w net.ipv4.tcp_congestion_control=bbr # Or try HTCP modprobe tcp_htcp sysctl -w net.ipv4.tcp_congestion_control=htcp
Your buffer sizes are good, but ensure proper allocation:
# Check actual buffer allocation cat /proc/sys/net/ipv4/tcp_mem cat /proc/net/sockstat # For persistent configuration: echo "net.core.rmem_max=16777216" >> /etc/sysctl.conf echo "net.core.wmem_max=16777216" >> /etc/sysctl.conf echo "net.ipv4.tcp_rmem=4096 87380 16777216" >> /etc/sysctl.conf echo "net.ipv4.tcp_wmem=4096 65536 16777216" >> /etc/sysctl.conf
Additional tweaks for high-latency scenarios:
# Increase TCP selective ACK buckets echo "net.ipv4.tcp_max_sack_blks=64" >> /etc/sysctl.conf # Optimize timestamp handling echo "net.ipv4.tcp_tw_reuse=1" >> /etc/sysctl.conf echo "net.ipv4.tcp_frto=2" >> /etc/sysctl.conf # Disable unnecessary features echo "net.ipv4.tcp_sack=0" >> /etc/sysctl.conf echo "net.ipv4.tcp_dsack=0" >> /etc/sysctl.conf
When dealing with high-latency networks (100ms RTT in your case), the fundamental constraint is the bandwidth-delay product (BDP). For 790Mbps with 100ms RTT:
BDP = Bandwidth * RTT = (790 * 10^6 bits/sec) * 0.1 sec = 9.875MB
Your current window settings (5.2MB) are below this theoretical requirement. While you've increased buffer sizes, several other factors need consideration.
Your sysctl settings show good starting points, but let's analyze the key metrics from your test:
# Current TCP memory settings
echo "8192 7061504 7061504" > /proc/sys/net/ipv4/tcp_rmem
echo "8192 7061504 7061504" > /proc/sys/net/ipv4/tcp_wmem
echo 7061504 > /proc/sys/net/core/rmem_max
echo 7061504 > /proc/sys/net/core/wmem_max
While 'scalable' is a good choice, consider these alternatives with their typical use cases:
# Available congestion controls
cat /proc/sys/net/ipv4/tcp_available_congestion_control
# Try BBR for high-BDP networks
echo "bbr" > /proc/sys/net/ipv4/tcp_congestion_control
BBR often outperforms traditional loss-based algorithms in high-latency scenarios by modeling the network path.
These additional settings can significantly impact performance:
# Increase TCP window scaling
echo 1 > /proc/sys/net/ipv4/tcp_window_scaling
# Enable TCP timestamps for better RTT estimation
echo 1 > /proc/sys/net/ipv4/tcp_timestamps
# Adjust keepalive settings
echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
When benchmarking, use these iperf3 parameters for more accurate results:
# On server
iperf3 -s -p 5201
# On client (with proper window size)
iperf3 -c server_ip -p 5201 -t 120 -w 8M -P 4 -O 3 -R
Key flags:
- -w 8M: Sets window size to 8MB
- -P 4: Uses 4 parallel streams
- -O 3: Omits first 3 seconds for warmup
- -R: Reverse mode (server-to-client)
For extreme performance needs, consider these kernel parameters:
# Increase socket buffers
echo "net.core.rmem_default=12582912" >> /etc/sysctl.conf
echo "net.core.wmem_default=12582912" >> /etc/sysctl.conf
echo "net.core.rmem_max=12582912" >> /etc/sysctl.conf
echo "net.core.wmem_max=12582912" >> /etc/sysctl.conf
# TCP memory settings (min, default, max)
echo "net.ipv4.tcp_rmem=4096 12582912 25165824" >> /etc/sysctl.conf
echo "net.ipv4.tcp_wmem=4096 12582912 25165824" >> /etc/sysctl.conf
# Apply changes
sysctl -p
Use these commands to verify your settings during testing:
# Real-time TCP statistics
ss -t -i -n -p
# Detailed socket information
cat /proc/net/tcp
# Network interface statistics
ethtool -S eth0
Remember that optimal settings depend on your specific network characteristics. Always test changes methodically and measure their impact.