Diagnosing and Fixing TCP Connection Freezes in OpenVPN Tap Mode: MTU Issues, Kernel Differences, and UDP Packet Loss


2 views

When running OpenVPN in tap mode (necessary for multicast traffic), many administrators encounter a frustrating issue: TCP connections that freeze during large data transfers. The symptoms typically appear during operations like:

ssh user@vpn_host
# Freezes during:
cat large_file.log
ls -l /directory/with/many/files

The conventional wisdom points to MTU issues, leading to common configuration attempts:

# Typical OpenVPN server configuration attempts
fragment 1400
mssfix

However, as our testing shows, even aggressive MTU reduction (down to 576 bytes) doesn't always solve the problem. Wireshark analysis confirms packets aren't exceeding the specified MTU:

# Verification command
ping vpn_host -s 1450 -M do

Through systematic testing, we discovered crucial OS-dependent behavior:

  • Ubuntu clients (12.04-13.04, kernels 3.2-3.8): Freezes occur consistently
  • CentOS 6 clients: Stable connections even during heavy transfers

Wireshark captures reveal the core issue:

  1. UDP packet loss occurs due to bandwidth limitations
  2. TCP retransmissions initiate properly
  3. On Ubuntu systems: Remote host ignores valid ACKs, enters infinite retransmission loop

Based on our findings, implement these solutions in combination:

# OpenVPN server configuration
fragment 1400
mssfix
tun-mtu 1500
mtu-disc yes
link-mtu 1500

# Linux TCP stack tuning (for Ubuntu clients)
echo 1 > /proc/sys/net/ipv4/tcp_window_scaling
echo 0 > /proc/sys/net/ipv4/tcp_sack
echo 128000 > /proc/sys/net/core/rmem_max
echo 128000 > /proc/sys/net/core/wmem_max

For persistent cases, consider these additional measures:

# Alternative OpenVPN configuration
socket-flags TCP_NODELAY
shaper 1000000  # Limit bandwidth to 1Mbps if needed

For Ubuntu systems showing persistent issues, these kernel parameters often help:

# Add to /etc/sysctl.conf
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_base_mss = 1024
net.ipv4.route.mtu_expires = 1800

Use these commands to verify your settings:

# Check current MTU
ip link show dev tap0

# Verify TCP parameters
sysctl -a | grep tcp

# Monitor retransmissions
ss -eipn

When running OpenVPN in tap mode (necessary for multicast traffic), many developers encounter TCP connection stalls during large data transfers like SSH file operations. While initial suspicion points to MTU issues, deeper investigation reveals a complex interaction between network stack implementations.

Standard troubleshooting suggests configuring:

fragment 1400
mssfix

However, empirical testing shows this isn't always sufficient. Packet analysis reveals:

  • No oversized packets escaping the tunnel (verified via Wireshark)
  • Successful ping tests with -s 1450 -M do parameters
  • PMTU discovery reporting 1500 bytes despite fragmentation settings

Key observations across different Linux distributions:

OS Kernel Version Behavior
Ubuntu 12.04-13.04 3.2-3.8 Consistent freezing
CentOS 6 2.6.32 Stable operation

Wireshark captures reveal the failure sequence:

1. UDP packet loss occurs in VPN tunnel
2. TCP retransmissions initiate
3. On Ubuntu:
   - Remote host enters persistent retransmission loop
   - Client ACKs are ignored
4. On CentOS:
   - Normal retransmission recovery

Beyond basic MTU settings, these adjustments may help:

# In OpenVPN server config
mssfix 1400
fragment 1400
txqueuelen 1000

# Linux sysctl adjustments (Ubuntu)
echo "net.ipv4.tcp_mtu_probing=1" >> /etc/sysctl.conf
echo "net.ipv4.tcp_window_scaling=1" >> /etc/sysctl.conf
sysctl -p

The mysterious tracepath workaround likely functions by:

  • Forcing PMTU rediscovery
  • Resetting network stack timers
  • Triggering ICMP responses that unclog the TCP state

For Ubuntu systems, consider:

# Disable TCP offloading features that may conflict
ethtool -K tap0 tx off rx off sg off tso off gso off

# Alternative TCP congestion algorithm
echo "westwood" > /proc/sys/net/ipv4/tcp_congestion_control

Essential troubleshooting tools:

# Real-time TCP stats
ss -ti

# MTU path discovery
tracepath -n [destination]

# Packet capture with size limits
tcpdump -i tap0 -s 1500 -w vpn_capture.pcap

When persistent issues occur, consider:

  • Testing with tun mode (if multicast isn't critical)
  • Implementing UDP-based protocols like QUIC
  • Using mosh instead of SSH for interactive sessions