Diagnosing and Optimizing Network Latency Between Linux Hosts: From 0.23ms to 0.1ms


3 views

When dealing with high-frequency messaging applications, even sub-millisecond latency differences matter. Your measured 0.23ms RTT between hosts suggests room for optimization. Let's break down the investigation methodology.

# Measure base latency without switch interference (direct connect)
ip link set eth0 down
ip link set eth1 down
ethtool -t eth0 online
ethtool -t eth1 online

First establish baseline NIC performance. The ethtool diagnostics will reveal hardware limitations. For Intel NICs specifically:

# Check NIC ring buffer settings
ethtool -g eth0
# Sample output:
# RX: 4096
# TX: 4096
# Consider reducing for low-latency:
ethtool -G eth0 rx 256 tx 256

Modern switches should add <0.1ms latency. Verify with:

# On switch CLI (Cisco example):
show platform hardware fed switch active fwd-asic resource-utilization
show platform hardware fed switch active fwd-asic resource tcam utilization

Adjust these sysctl parameters in /etc/sysctl.conf:

net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_low_latency = 1

For UDP-based applications (common in HFT), consider this sender configuration:

// C code snippet for UDP socket tuning
int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
int optval = 1;
setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &optval, sizeof(optval));
setsockopt(sock, SOL_SOCKET, SO_BUSY_POLL, &optval, sizeof(optval));

Bad cables often manifest as retransmissions rather than pure latency. Check with:

# Check for physical errors
ethtool -S eth0 | grep -E 'err|drop'
# Compare before/after cable replacement
# Pin IRQ handlers to specific cores
for irq in $(grep eth0 /proc/interrupts | awk '{print $1}' | sed 's/://'); do
  echo 3 > /proc/irq/$irq/smp_affinity
done

For ultimate performance, consider DPDK or XDP:

# XDP example load command
ip link set dev eth0 xdp obj xdp_drop.o sec xdp

When dealing with sub-millisecond latency requirements (like your 0.23ms current vs 0.1ms target), we need surgical precision in measurements. Traditional tools like ping and Wireshark verify latency but don't pinpoint the source.

For nano-second level analysis:

# Install precision timing tools
sudo apt install linuxptp ethtool tuned-utils

Run these on both hosts:

# Check NIC interrupt coalescing
ethtool -c eth0

# Verify DMA settings
cat /proc/interrupts | grep eth0

# Check kernel bypass capabilities
sudo ethtool -k eth0 | grep hw-tc-offload

Key switch parameters affecting sub-1ms latency:

# Sample output to verify (Cisco/Juniper syntax)
show interface xe-0/0/0 | match "cut-through|store-and-forward"

# Should return "cut-through" for lowest latency

Critical /etc/sysctl.conf parameters:

net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_low_latency=1

For UDP-based high frequency messaging:

// C example using SO_TIMESTAMPING
int flags = SOF_TIMESTAMPING_TX_HARDWARE | 
            SOF_TIMESTAMPING_RX_HARDWARE |
            SOF_TIMESTAMPING_RAW_HARDWARE;
setsockopt(sock_fd, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));

Use TDR (Time Domain Reflectometer) commands if supported:

# Intel NIC example
ethtool --cable-test eth0

In financial trading systems where we reduced latency from 0.25ms to 0.09ms by:

  1. Enabling NIC kernel bypass (DPDK)
  2. Configuring switch port buffers to 64 bytes
  3. Using kernel-bypass libraries like libfabric