Debugging WireGuard Handshake Failures Behind Dual-NAT: Comprehensive Guide for Linux Administrators


3 views

When dealing with WireGuard connections between systems both behind NAT, several unique challenges emerge that standard VPN configurations don't account for. The key symptoms we're observing - repeated handshake attempts without completion - typically indicate either:

  • Packet filtering at intermediate network devices
  • Endpoin misconfiguration
  • NAT traversal failure
  • MTU/PMTUD issues

First, let's verify the port forwarding on Router A (ZyWALL USG 100). The screenshot shows correct UDP forwarding, but we should confirm with:

# On Router A:
tcpdump -i eth0 -n udp port 23456
# On Server A:
tcpdump -i enp1s0 -n udp port 23456

Notice the discrepancy between client and server ports:

# Client config shows:
Endpoint = wgsrv.example.com:33456

# Server config shows:
ListenPort = 23456

This is likely the root cause. WireGuard requires strict port matching between endpoint declarations and listening ports.

Here are the corrected configurations:

# Server A /etc/wireguard/wg0.conf:
[Interface]
Address = 10.31.33.100/24
ListenPort = 33456  # Changed to match client
PrivateKey = (redacted)

[Peer]
PublicKey = QnkTJ+Qd9G5EybA2lAx2rPNRkxiQl1W6hHeEFWgJ0zc=
AllowedIPs = 10.31.33.211/32
# Client B /etc/wireguard/wg0.conf:
[Interface]
PrivateKey = (redacted)
Address = 10.31.33.211/32

[Peer]
PublicKey = p62kU3HoXLJACI4G+9jg0PyTeKAOFIIcY5eeNy31cVs=
AllowedIPs = 10.31.33.0/24
Endpoint = wgsrv.example.com:33456  # Now matches server
PersistentKeepalive = 25

When basic fixes don't work, try these diagnostic steps:

# On both systems:
wg set wg0 peer (public-key) endpoint (ip):port persistent-keepalive 25
wg show wg0 dump
journalctl -u wg-quick@wg0 -f

For deep packet inspection:

tcpdump -i wg0 -nn -v
tcpdump -i any udp port 33456 -X -vv

Add these iptables rules to improve NAT traversal:

# On Server A:
iptables -A INPUT -p udp --dport 33456 -j ACCEPT
iptables -t nat -A PREROUTING -p udp --dport 33456 -j DNAT --to-destination 10.150.44.188
iptables -t nat -A POSTROUTING -p udp --dport 33456 -j MASQUERADE

After applying changes, verify connectivity:

# On client:
ping 10.31.33.100
wg show wg0 transfer

# On server:
wg show
ss -uap | grep wireguard

When establishing a WireGuard VPN connection between two Debian systems (both behind NAT) where:

  • Server A runs behind ZyWALL USG 100 firewall (port 23456 forwarded)
  • Client B connects through consumer-grade VDSL router
  • Dynamic DNS updates the server's public IP via A record
  • Handshake initiates but never completes (5-second retry cycle observed)

First, let's verify the packet flow is actually reaching the server:

# On Server A
sudo iptables -t nat -I INPUT 1 -p udp --dport 23456 -j LOG --log-prefix "WG-IN: "

# On Client B
sudo iptables -I OUTPUT 1 -p udp --dport 33456 -j LOG --log-prefix "WG-OUT: "

Expected log entries should show bidirectional traffic. If you only see outgoing packets from client but no server replies, we've identified the first issue.

Three critical elements often missed in NAT-to-NAT WireGuard setups:

1. Endpoint Port Mismatch

The configuration shows a discrepancy:

# Server config
ListenPort = 23456

# Client config
Endpoint = wgsrv.example.com:33456  # ← This should match server's ListenPort

2. PersistentKeepalive Settings

For consumer NAT devices, more aggressive keepalives often help:

[Peer]
PersistentKeepalive = 15  # Reduced from 25 to punch through NAT more frequently

3. MTU Considerations

Add MTU testing to both configurations:

[Interface]
MTU = 1280  # Start with conservative value

Use these to gather more technical details:

# Check kernel WireGuard module
sudo modinfo wireguard

# Verify NAT hairpinning
sudo wg show all dump
sudo conntrack -L -p udp --dport 23456

# Check for IP fragmentation
sudo tcpdump -ni any 'udp port 23456 and (ip[6:2] & 0x3fff != 0)'

The ZyWALL configuration needs these specific adjustments:

# Not just port forwarding, but proper NAT rules
1. Enable "NAT Loopback" or "Hairpin NAT"
2. Add firewall exception for ESTABLISHED,RELATED states
3. Set UDP timeout to 180+ seconds (default is often too short)

When traditional NAT traversal fails, consider:

UDP Hole Punching Script

#!/bin/bash
# Simultaneous client-server connection initiator
wg-quick down wg0
sleep 2
wg-quick up wg0 &
nc -u -p 23456 -w 1 wgsrv.example.com 23456 <<< "PING"
wait

Third-Party Relay Fallback

In peer configuration add:

[Peer]
Endpoint = wgsrv.example.com:33456
FallbackEndpoint = relay.example.com:443
PresharedKey = /etc/wireguard/psk.txt

Enable verbose WireGuard logging:

echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg -wH | grep -E 'wireguard|handshake'
1. Verify server/client port numbers match exactly
2. Confirm DNS resolution gives current IP (dig +short wgsrv.example.com)
3. Test raw UDP connectivity (nc -vzu wgsrv.example.com 23456)
4. Check for IPV6 leaks (disable if unused)
5. Validate system clock synchronization
6. Review ALL firewall rules (both ends)
7. Test with minimal MTU (1280)
8. Verify kernel version compatibility (uname -r)
9. Check for conflicting VPN services
10. Validate routing tables (ip route show table all)