Generic Receive Offload (GRO) operates at the network interface level by coalescing multiple incoming TCP segments into larger logical units before passing them to the IP stack. This offloading technique significantly reduces per-packet processing overhead. In advanced NICs like Intel's 82571EB series, GRO implementation involves:
// Simplified GRO processing logic (conceptual)
while (packet_queue_not_empty) {
current_packet = dequeue_packet();
if (matches_existing_flow(current_packet)) {
merge_packets(flow, current_packet);
update_flow_timer(flow);
} else {
create_new_flow(current_packet);
}
}
Packet Modification Transparency: GRO operates completely transparently to both endpoints' TCP stacks. The NIC doesn't modify or generate TCP ACKs - it only coalesces incoming segments. The original packets remain intact in their payload and headers, with only the segmentation being affected during the merge operation.
Timeout Mechanisms: GRO implementations typically use several triggers to flush coalesced packets:
- Timer expiration (default 10ms in Linux)
- Packet sequence number gap detection
- TCP PSH flag reception
- Maximum segment size threshold (64KB typical)
The observed issue with uneven bandwidth distribution across VPN tunnels stems from GRO's interaction with window scaling. When window scaling is enabled:
// Problem scenario pseudocode
if (window_scaling_enabled) {
GRO_holds_packets_longer(); // Due to larger window sizes
TCP_may_timeout_waiting_for_ACK();
}
Disabling GRO (or window scaling) forces more immediate packet delivery to the stack, explaining why the bandwidth distribution becomes even. This is particularly noticeable in forwarding setups where the intermediate device doesn't terminate TCP connections.
To verify GRO-related issues:
# Check GRO status
ethtool -k eth0 | grep generic-receive-offload
# Disable GRO temporarily for testing
ethtool -K eth0 gro off
# Monitor GRO statistics
cat /proc/net/softnet_stat
For VPN forwarding scenarios:
- Consider adjusting GRO timeout values:
sysctl -w net.core.gro_flush_timeout=2000
(microseconds) - Test with different NIC driver versions - Intel's
e1000e
driver has seen significant GRO improvements in later versions - For critical applications, evaluate using
ethtool -C
to tune interrupt coalescing parameters
For deeper technical understanding:
- Linux kernel documentation: Documentation/networking/scaling.txt
- Intel NIC optimization guides for specific controller families
- Research papers on TCP offload engine (TOE) architectures
Generic Receive Offload (GRO) is a hardware acceleration technique where the network interface card (NIC) combines multiple incoming TCP segments into larger packets before passing them to the kernel network stack. This reduces CPU overhead by decreasing the number of packets the system must process.
1. Packet Modification Behavior: GRO operates transparently to TCP stacks - it doesn't modify or generate TCP ACKs. The NIC simply aggregates segments while maintaining protocol semantics.
// Conceptual GRO operation pseudocode
while (packet_buffer_not_empty) {
if (next_packet.matches_flow(prev_packet)) {
merge_packets();
update_l4_checksum();
} else {
deliver_merged_packet();
}
}
2. GRO Flush Triggers: Packets are typically delivered when:
- Packet out-of-order or non-sequential arrival occurs
- TCP PSH flag is set
- Timer expires (usually configurable via ethtool)
- Buffer reaches maximum size (typically 64KB)
The observed 200MBps/200MBps/1MBps/1MBps imbalance stems from GRO's interaction with:
- TCP Window Scaling (enabled)
- Packet forwarding topology
- NIC hardware characteristics (Intel 82571EB)
To verify and resolve:
# Check current GRO settings
ethtool -k ethX | grep generic-receive-offload
# Disable GRO temporarily
ethtool -K ethX gro off
# Permanent configuration (Ubuntu)
echo "net.core.gro_flush_timeout=1000" >> /etc/sysctl.conf
sysctl -p
For forwarding scenarios: Consider using:
# Advanced flow steering (RPS)
echo "ffff" > /sys/class/net/ethX/queues/rx-0/rps_cpus
When complete GRO disablement isn't desirable:
- Adjust GRO timeout values
- Implement NIC queue affinity
- Balance flows across multiple queues
Recommended technical references:
- Linux kernel Documentation/networking/scaling.txt
- Intel NIC performance tuning guides
- TCP/IP Architecture, Design and Implementation (Wiley)