Optimizing Linux Network Bonding for Maximum Throughput: Solving Gigabit Link Aggregation Performance Issues

When implementing network bonding between two PowerEdge 6950 servers using Intel 82571EB adapters, I encountered suboptimal throughput despite proper link aggregation configuration. Initial tests showed:

# Single link performance
dd if=/dev/zero bs=1M | nc [destination] [port] > /dev/null
# Result: ~98MB/s per link (expected for gigabit)

However, when bonding two interfaces with balance-rr mode, throughput plateaued at 100MB/s instead of the expected 200MB/s aggregate.

The Linux bonding documentation highlights a critical limitation: balance-rr mode causes packet reordering which triggers TCP's congestion control. The key parameter here is:

# Default value
sysctl net.ipv4.tcp_reordering = 3
# Adjusted value
sysctl -w net.ipv4.tcp_reordering=127

While increasing this helped (from 70MB/s to 100MB/s), it wasn't the complete solution.

The breakthrough came from implementing multiple optimizations simultaneously:

# /etc/sysctl.conf network optimizations
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.netdev_max_backlog = 2500
net.ipv4.tcp_reordering=127

Critical bond interface settings in SLES:

# /etc/sysconfig/network/ifcfg-bond0
MTU='9216'
LINK_OPTIONS='txqueuelen 10000'

And NIC buffer adjustments:

# For each bonded interface
ethtool -G eth2 rx 2048 tx 2048
ethtool -G eth4 rx 2048 tx 2048

After applying all optimizations:

# Final throughput test
8589934592 bytes (8.6 GB) copied, 35.8489 seconds, 240 MB/s

The jumbo frames (MTU 9216) proved particularly impactful, reducing header overhead and CPU interrupts.

For those replicating this setup:

Apply all sysctl changes before bond creation
Set MTU consistently across all interfaces
Monitor /proc/net/bonding/bond0 for errors
Consider CPU affinity for interrupt handling

Remember that balance-rr bonding requires both endpoints to be similarly configured for optimal results.

When implementing link aggregation between PowerEdge 6950 servers using Intel 82571EB adapters, many engineers encounter an unexpected throughput ceiling. Despite bonding two 1Gbps interfaces with round-robin (balance-rr) mode, single TCP/IP streams often fail to achieve the expected 200MB/s (2Gbps) aggregate throughput.

# Typical symptoms include:
1. 70-90MB/s throughput per direction
2. Poor scaling with bonding enabled
3. Significant performance delta between uni-directional and bi-directional tests

The fundamental limitation stems from how TCP handles packet reordering. When using balance-rr bonding:

Packets are striped across interfaces in sequence
Network stack may receive packets out of order
TCP congestion control triggers unnecessary retransmissions

These kernel parameters made the breakthrough in my setup:

# /etc/sysctl.conf
net.ipv4.tcp_reordering=127  # Default:3
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

Don't overlook hardware settings:

# Increase ring buffers
ethtool -G eth2 rx 2048 tx 2048
ethtool -G eth4 rx 2048 tx 2048

# Benchmark with jumbo frames
ifconfig bond0 mtu 9216

For accurate measurements:

# Server A:
nc -l -p 5000 > /dev/null

# Server B:
dd if=/dev/zero bs=1M count=8192 | nc serverA 5000

# Monitor with:
sar -n DEV 1  # Or iftop -i bond0

When balance-rr still underperforms:

Test LACP (mode 4) if switches support it
Consider balance-xor mode with layer3+4 policy
Evaluate specialized NICs with packet reordering engines

ServerDevWorker

Optimizing Linux Network Bonding for Maximum Throughput: Solving Gigabit Link Aggregation Performance Issues

Related Articles