During file transfers across our MetroEthernet link between two sites (connected via a single SonicWall router), Wireshark captures consistently show:
TCP Dup ACK #1#1 TCP Fast Retransmission
The traceroute shows minimal latency (under 10ms) between endpoints 192.168.2.153 (client) and 192.168.1.101 (server):
traceroute to 192.168.1.101 (192.168.1.101), 30 hops max, 60 byte packets 1 192.168.2.254 0.747 ms 2 192.168.1.101 8.995 ms
We performed multiple hardware swaps with identical results:
- Replaced SonicWall with Cisco 1800 series router (same behavior)
- Connected laptops directly to provider equipment (same subnet)
- Bypassed all customer-premises equipment
The Wireshark analysis reveals these key patterns:
No. Time Source Destination Protocol Info 1234 1.234567 192.168.2.153 192.168.1.101 TCP [TCP Dup ACK #1#1] 1235 1.234789 192.168.1.101 192.168.2.153 TCP [TCP Fast Retransmission]
Key metrics to calculate from the capture:
Retransmission rate = (Retransmitted packets / Total packets) × 100 Dup ACK frequency = Dup ACK count / Total ACKs
Recommended provider-side tests:
# Continuous ping with timestamps ping -t 192.168.1.101 | while read line; do echo "$(date): $line"; done # Path MTU discovery ping -M do -s 1472 192.168.1.101 # Jitter measurement sudo apt install iperf3 iperf3 -c 192.168.1.101 -u -b 100M -t 60 -i 1
Possible Linux system tweaks (server-side):
# Check current settings sysctl -a | grep tcp # Recommended adjustments sudo sysctl -w net.ipv4.tcp_sack=1 sudo sysctl -w net.ipv4.tcp_fack=1 sudo sysctl -w net.ipv4.tcp_window_scaling=1 sudo sysctl -w net.ipv4.tcp_timestamps=1 sudo sysctl -w net.ipv4.tcp_slow_start_after_idle=0
For file transfer applications, consider implementing:
// Python example using larger socket buffers import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 4194304) # 4MB send buffer s.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 4194304) # 4MB receive buffer
Or implement application-level retry logic:
// JavaScript example with exponential backoff async function reliableTransfer(data, maxRetries = 5) { let attempt = 0; while (attempt < maxRetries) { try { return await transferData(data); } catch (err) { const delay = Math.pow(2, attempt) * 1000; await new Promise(r => setTimeout(r, delay)); attempt++; } } throw new Error('Max retries exceeded'); }
When working with circuit providers:
- Request RFC 2544 testing results
- Ask for jitter and latency measurements under load
- Request interface error counters from their switches
- Demand testing with known-good traffic patterns
When analyzing network performance issues, few things are as frustrating as persistent TCP retransmissions. In your case, we're seeing:
1. Frequent TCP Dup ACK packets 2. TCP Fast Retransmission events 3. Occurring despite low latency (sub-10ms) 4. Persisting across different router hardware
Your testing methodology clearly points to the MetroEthernet circuit as the culprit. Key observations:
- Issue persists when bypassing routers entirely
- Same behavior when connecting laptops directly to provider equipment
- Service provider insists their tests show no problems
Let's examine what wireshark captures typically reveal in such scenarios:
// Sample tshark filter to identify retransmission patterns
tshark -r capture.pcap -Y "tcp.analysis.retransmission || tcp.analysis.fast_retransmission" \
-T fields -e frame.number -e ip.src -e ip.dst -e tcp.seq -e tcp.ack
The output would show patterns like:
1234 192.168.2.153 192.168.1.101 12345678 87654321 [TCP Fast Retransmission] 1235 192.168.2.153 192.168.1.101 12345678 87654321 [TCP Dup ACK #1]
Since providers often claim "everything tests OK," here's a Python script to gather concrete evidence:
import socket
import time
from collections import defaultdict
def monitor_tcp_performance(dest_ip, dest_port, duration):
retrans_stats = defaultdict(int)
start_time = time.time()
while time.time() - start_time < duration:
try:
with socket.create_connection((dest_ip, dest_port), timeout=2) as s:
s.send(b'PING')
data = s.recv(1024)
if not data:
retrans_stats['timeout'] += 1
except socket.timeout:
retrans_stats['timeout'] += 1
except socket.error as e:
retrans_stats[str(e)] += 1
return dict(retrans_stats)
# Example usage:
stats = monitor_tcp_performance('192.168.1.101', 445, 300)
print(f"Connection issues observed: {stats}")
MetroEthernet circuits often have hidden MTU constraints. Try this diagnostic:
# Linux MTU path discovery
ping -M do -s 1472 192.168.1.101 # Adjust size down until success
# Windows equivalent
ping -f -l 1472 192.168.1.101
When dealing with uncooperative providers, include these metrics in your reports:
- Retransmission rate percentage
- Pattern of lost segments
- Proof that local equipment isn't the bottleneck
Here's how to calculate retransmission rate from pcap data:
total_packets = $(capinfos capture.pcap | grep "Number of packets" | awk '{print $4}')
retrans_packets = $(tshark -r capture.pcap -Y "tcp.analysis.retransmission" | wc -l)
retrans_rate = $(echo "scale=2; $retrans_packets * 100 / $total_packets" | bc)
echo "Retransmission rate: $retrans_rate%"
To eliminate TCP stack variables, try lower-level testing:
# Use iperf3 for controlled testing
iperf3 -c 192.168.1.101 -t 60 -i 5 -w 256K -Z
# Look for symptoms in output:
[ ID] Interval Transfer Bitrate Retr
[ 4] 0.00-5.00 sec 112 MBytes 188 Mbits/sec 43