iptables IPv6 Fragmentation Issue: Why Second+ Fragments Get Rejected Despite UDP Port Allow Rules


9 views

When troubleshooting IPSec VPN connections over IPv6, I encountered a perplexing scenario where UDP port 500/4500 traffic was being rejected despite explicit firewall rules. Packet captures revealed a critical pattern:

# tcpdump showing fragmented packet rejection
04:00:43.311597 IP6 (hlim 51, next-header Fragment (44) payload length: 384) 
2001:db8::be6b:d879 > 2001:db8:f:608::2: frag (0x5efa507c:1232|376)
04:00:43.311722 IP6 [...] ICMP6, destination unreachable, length 432

Contrary to common assumptions, Linux's network stack processes IPv6 fragments differently than IPv4. The key factors:

  • IPv6 fragments arrive with their own protocol headers (Next Header = 44)
  • iptables processes fragments before full reassembly occurs
  • The first fragment contains transport headers (UDP ports) but subsequent ones don't

When examining kernel logs, we see the rejection occurs because:

# Kernel log showing fragment rejection
Aug 26 04:00:43 grummle kernel: iptables: REJECT [...] PROTO=UDP
OPT ( FRAG:1232 ID:5efa507c )

The firewall chain evaluates each fragment independently. While the first fragment contains UDP port information and matches our rules, subsequent fragments:

  1. Don't contain transport layer headers
  2. Get evaluated as raw fragments
  3. Fall through to the default REJECT rule

We need to modify our iptables rules to handle fragments appropriately. Here's the corrected configuration:

# Corrected IPv6 firewall rules for IPSec
-A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -m frag --fragid 0 --fragmore -j ACCEPT  # First fragment
-A INPUT -m frag --fragid !0 --fragmore -j ACCEPT # Subsequent fragments
-A INPUT -p udp --dport 500 -j ACCEPT
-A INPUT -p udp --dport 4500 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-port-unreachable

1. Always place fragment rules before port-specific rules
2. The --fragmore flag indicates additional fragments will follow
3. For IPv4, use -f instead of -m frag
4. Consider rate limiting fragment rules to prevent DoS attacks:

-A INPUT -m frag --fragid 0 --fragmore -m limit \
--limit 1000/second -j ACCEPT

After implementing these changes:

# Verify packet acceptance
tcpdump -ni eth0 'ip6 and (udp port 500 or udp port 4500)'
# Check firewall counters
iptables -L -n -v | grep -A5 'frag'

The solution maintains security while properly handling fragmented IPSec packets, allowing VPN tunnels to establish successfully.


During IPSec VPN setup between two Linux hosts, I encountered a puzzling issue where UDP packets on ports 500/4500 were being fragmented, and iptables would reject all fragments after the first one. The kernel logs showed:

Aug 26 04:00:43 grummle kernel: iptables: REJECT IN=eth0 OUT= MAC=### SRC=2001:db8::be6b:d879 DST=2001:db8:f:608::2 LEN=424 TC=0 HOPLIMIT=51 FLOWLBL=0 OPT ( FRAG:1232 ID:5efa507c ) PROTO=UDP

Contrary to common belief, Linux doesn't always reassemble fragments before netfilter processing. The behavior depends on:

  • Kernel version and configuration (CONFIG_NF_DEFRAG_IPV6)
  • iptables rules ordering
  • IP version (IPv4 vs IPv6)

For IPv6 specifically, fragments arrive at iptables individually unless defragmentation is explicitly enabled.

The root cause lies in how connection tracking (conntrack) handles fragments:

-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -m udp -p udp --dport 500 -j ACCEPT
-A INPUT -m udp -p udp --dport 4500 -j ACCEPT

Subsequent fragments don't contain the UDP header (and thus port information), making them fail the port-based rules while not qualifying as RELATED packets.

Here's the corrected ruleset that works for both IPv4 and IPv6:

# Enable IPv6 defragmentation (if needed)
-A INPUT -m conntrack --ctstate INVALID -j DROP
-A INPUT -m frag --ipv6 -j ACCEPT

# Main rules
-A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p udp -m multiport --dports 500,4500 -m conntrack --ctstate NEW -j ACCEPT

To confirm the solution works:

  1. Check kernel defrag support: zgrep DEFRAG /proc/config.gz
  2. Monitor fragments with: tcpdump -ni eth0 'ip[6:2] & 0x3fff != 0 or ip6[6] & 0xff == 44'
  3. Verify conntrack entries: conntrack -L -p udp --dport 500

When dealing with high fragment traffic:

  • Adjust net.ipv6.ip6frag_high_thresh and net.ipv6.ip6frag_low_thresh
  • Monitor /proc/net/nf_conntrack for fragment-related entries
  • Consider rate limiting with -m limit for fragment rules