How to View and Debug Cached Path MTU (PMTU) Discovery in Linux Systems


1 views

When troubleshooting network connectivity issues, you might observe this behavior:

ping -c 4 -M do -s 1431 212.58.244.69
PING 212.58.244.69 (212.58.244.69) 1431(1459) bytes of data.
From 217.155.134.6 icmp_seq=1 Frag needed and DF set (mtu = 1458)
From 217.155.134.4 icmp_seq=2 Frag needed and DF set (mtu = 1458)

The initial ICMP "fragmentation needed" message comes from the router (217.155.134.6), but subsequent messages originate from localhost (217.155.134.4), indicating PMTU caching.

The traditional netstat -rCn shows routing cache but has limitations:

netstat -rCn
Kernel IP routing cache
Source          Destination     Gateway         Flags   MSS Window  irtt Iface
217.155.134.4   212.58.244.69   217.155.134.6          1500 0          0 eth0

More reliable modern alternatives include:

iproute2 Method

ip route get to 212.58.244.69
212.58.244.69 via 217.155.134.6 dev eth1 src 217.155.134.4
    cache mtu 1500 advmss 1460 hoplimit 64

Kernel PMTU Cache

For TCP connections, examine the PMTU cache through procfs:

cat /proc/net/pmtu_disc_cache
Destination         MTU  Age
212.58.244.69      1458  120

When standard tools don't reveal the actual PMTU, try these approaches:

1. Tracepath for Path Discovery

tracepath -n 212.58.244.69
 1:  217.155.134.4                           0.089ms pmtu 1500
 2:  217.155.134.6                           1.201ms 
 3:  195.99.125.101                          9.872ms pmtu 1458

2. TCP MTU Probing

Enable MTU probing in sysctl:

sysctl -w net.ipv4.tcp_mtu_probing=1

Windows Systems

netsh interface ipv4 show destinationcache

macOS Systems

netstat -rnv

When PMTU discovery fails, consider these workarounds:

# Temporarily lower interface MTU
ip link set dev eth0 mtu 1400

# Or disable PMTU discovery (not recommended)
sysctl -w net.ipv4.ip_no_pmtu_disc=1

For developers needing programmatic access to PMTU values:

#include <netinet/in.h>
#include <netinet/ip.h>

int get_pmtu(int sockfd, struct sockaddr_in *dest) {
    socklen_t len = sizeof(int);
    int mtu = 0;
    getsockopt(sockfd, IPPROTO_IP, IP_MTU, &mtu, &len);
    return mtu;
}

When troubleshooting network connectivity issues where packets with DF (Don't Fragment) bit set get dropped, understanding Path MTU (PMTU) caching becomes crucial. The behavior you're observing - where initial ICMP "fragmentation needed" messages come from the router but subsequent ones originate locally - indicates your system has cached the PMTU information.

The netstat -rCn output displays the kernel's routing cache, but has limitations:


# Typical output showing interface MTU instead of path MTU
217.155.134.4   212.58.244.69   217.155.134.6          1500 0          0 eth0

The MSS column actually shows the Maximum Segment Size (typically MTU-40 for TCP/IP headers), not the discovered PMTU. This explains why you're seeing 1500 (interface default) rather than the actual path MTU.

On modern Linux systems, better alternatives exist:


# 1. Using ip route show cache
ip route get to 212.58.244.69

# Sample output showing cached MTU:
212.58.244.69 via 217.155.134.6 dev eth1 src 217.155.134.4
    cache mtu 1500 advmss 1460 hoplimit 64

For more detailed PMTU information, check the /proc filesystem:


cat /proc/net/rt_cache

The kernel maintains PMTU information in its internal data structures. To inspect these:


# View IPv4 PMTU cache (requires root)
cat /proc/net/ipv4_route

# For IPv6:
cat /proc/net/ipv6_route

For specific destination debugging, combine with grep:


cat /proc/net/ipv4_route | grep 212.58.244.69
  1. First verify PMTU discovery is working:
    
    ping -M do -s 1472 example.com  # Adjust size based on expected MTU
    
  2. Check the current cached value:
    
    ip route get to example.com | grep mtu
    
  3. If needed, flush the PMTU cache:
    
    ip route flush cache
    

When standard tools don't show the expected PMTU, consider:


# Using tracepath which shows MTU per hop
tracepath -n 212.58.244.69

# Using tcpdump to observe PMTU discovery in action
tcpdump -n -i eth0 "icmp and icmp[0] == 3 and icmp[1] == 4"

These sysctl settings control PMTU behavior:


# View current settings
sysctl net.ipv4.ip_no_pmtu_disc
sysctl net.ipv4.route.mtu_expires

# Temporary modification
sysctl -w net.ipv4.route.mtu_expires=600