Linux Network Issue: Investigating “FAILED” State in ARP Cache and Gateway Connectivity Problems

When your Linux server suddenly becomes unreachable from the internet for brief periods, and ip neigh show displays a "FAILED" state for your gateway, you're likely dealing with an ARP resolution issue. This typically means your server couldn't resolve the MAC address of the gateway (192.168.14.1 in your case) through ARP requests.

Linux neighbor cache (ARP cache) entries can be in several states:

REACHABLE - Valid and reachable address
STALE     - Valid but potentially unreachable
DELAY     - Probe in progress
PROBE     - Actively verifying reachability
FAILED    - Unreachable after maximum attempts

Gateway not responding to ARP requests
Network congestion causing packet loss
Incorrect network configuration
Hardware issues (faulty NIC or switch)
Firewall blocking ARP traffic

To gather more information about the issue:

# Monitor ARP traffic
sudo tcpdump -i eth0 arp
# Check kernel ARP parameters
sysctl -a | grep arp
# Continuous neighbor cache monitoring
watch -n 1 ip -s neigh show

1. Adjust ARP timeout parameters:

sudo sysctl -w net.ipv4.neigh.default.gc_stale_time=60
sudo sysctl -w net.ipv4.neigh.default.base_reachable_time_ms=30000

2. Add a static ARP entry for the gateway:

sudo ip neigh add 192.168.14.1 lladdr 00:22:64:b6:10:5c nud permanent dev eth0

3. Check for duplicate IP addresses:

arping -D -I eth0 -c 2 192.168.14.1

Create a script to monitor and log neighbor cache issues:

#!/bin/bash
LOG_FILE="/var/log/neighbor_monitor.log"
GATEWAY_IP="192.168.14.1"

check_gateway() {
    status=$(ip neigh show | grep "$GATEWAY_IP" | awk '{print $NF}')
    if [ "$status" = "FAILED" ]; then
        echo "$(date) - Gateway $GATEWAY_IP in FAILED state" >> $LOG_FILE
        # Attempt to restore connectivity
        ip neigh flush dev eth0
    fi
}

while true; do
    check_gateway
    sleep 5
done

If the issue persists, you might need to examine kernel logs:

dmesg | grep -i arp
journalctl -k --grep="neigh"

Look for messages like "neighbour: arp_cache: neighbor table overflow" which might indicate you need to increase the ARP cache size.

Remember that in some environments (particularly cloud providers), ARP behavior might be intentionally modified by the hypervisor or network infrastructure.

When examining the output of ip neigh show (or the older arp -n command), seeing a "FAILED" state indicates that the system couldn't complete the Address Resolution Protocol (ARP) process for that particular IP address. In your case, since this appears against your gateway (192.168.14.1), it suggests temporary communication breakdowns between your server and the default route.

# Example of problematic output:
192.168.14.1 dev eth0  FAILED

Network congestion causing ARP packet loss
Gateway firewall blocking ARP requests
Hardware issues in switches or routers
MAC address changes (common in HA environments)
Duplicate IP addresses on the network

To gather more information during the failure window:

# Continuous ARP ping to gateway
arping -I eth0 -c 5 192.168.14.1

# Check kernel ARP table in real-time
watch -n 1 "ip neigh show | grep 192.168.14.1"

# Capture ARP traffic
tcpdump -i eth0 -n arp and host 192.168.14.1

For environments where you can't control the gateway configuration:

# Manually set ARP entry (temporary solution)
arp -s 192.168.14.1 00:22:64:b6:10:5c

# Adjust ARP timeout parameters (in seconds)
echo 300 > /proc/sys/net/ipv4/neigh/eth0/base_reachable_time
echo 60 > /proc/sys/net/ipv4/neigh/eth0/delay_first_probe_time

Create a monitoring script to log ARP state changes:

#!/bin/bash
LOG_FILE="/var/log/arp_monitor.log"
GATEWAY_IP="192.168.14.1"

while true; do
    STATE=$(ip neigh show | grep "$GATEWAY_IP" | awk '{print $NF}')
    if [[ "$STATE" == "FAILED" ]]; then
        echo "$(date) - ARP FAILED for $GATEWAY_IP" >> $LOG_FILE
        # Optional: trigger network restart
        # systemctl restart networking
    fi
    sleep 5
done

For enterprise environments, consider:

Setting up keepalived for automatic failover
Implementing BFD (Bidirectional Forwarding Detection)
Using ECMP (Equal-Cost Multi-Path) routing

Enable ARP debugging in the kernel:

echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_debug
dmesg -w | grep arp

ServerDevWorker

Linux Network Issue: Investigating “FAILED” State in ARP Cache and Gateway Connectivity Problems

Related Articles