When your Linux server suddenly becomes unreachable from the internet for brief periods, and ip neigh show
displays a "FAILED" state for your gateway, you're likely dealing with an ARP resolution issue. This typically means your server couldn't resolve the MAC address of the gateway (192.168.14.1 in your case) through ARP requests.
Linux neighbor cache (ARP cache) entries can be in several states:
REACHABLE - Valid and reachable address
STALE - Valid but potentially unreachable
DELAY - Probe in progress
PROBE - Actively verifying reachability
FAILED - Unreachable after maximum attempts
- Gateway not responding to ARP requests
- Network congestion causing packet loss
- Incorrect network configuration
- Hardware issues (faulty NIC or switch)
- Firewall blocking ARP traffic
To gather more information about the issue:
# Monitor ARP traffic
sudo tcpdump -i eth0 arp
# Check kernel ARP parameters
sysctl -a | grep arp
# Continuous neighbor cache monitoring
watch -n 1 ip -s neigh show
1. Adjust ARP timeout parameters:
sudo sysctl -w net.ipv4.neigh.default.gc_stale_time=60
sudo sysctl -w net.ipv4.neigh.default.base_reachable_time_ms=30000
2. Add a static ARP entry for the gateway:
sudo ip neigh add 192.168.14.1 lladdr 00:22:64:b6:10:5c nud permanent dev eth0
3. Check for duplicate IP addresses:
arping -D -I eth0 -c 2 192.168.14.1
Create a script to monitor and log neighbor cache issues:
#!/bin/bash
LOG_FILE="/var/log/neighbor_monitor.log"
GATEWAY_IP="192.168.14.1"
check_gateway() {
status=$(ip neigh show | grep "$GATEWAY_IP" | awk '{print $NF}')
if [ "$status" = "FAILED" ]; then
echo "$(date) - Gateway $GATEWAY_IP in FAILED state" >> $LOG_FILE
# Attempt to restore connectivity
ip neigh flush dev eth0
fi
}
while true; do
check_gateway
sleep 5
done
If the issue persists, you might need to examine kernel logs:
dmesg | grep -i arp
journalctl -k --grep="neigh"
Look for messages like "neighbour: arp_cache: neighbor table overflow" which might indicate you need to increase the ARP cache size.
Remember that in some environments (particularly cloud providers), ARP behavior might be intentionally modified by the hypervisor or network infrastructure.
When examining the output of ip neigh show
(or the older arp -n
command), seeing a "FAILED" state indicates that the system couldn't complete the Address Resolution Protocol (ARP) process for that particular IP address. In your case, since this appears against your gateway (192.168.14.1), it suggests temporary communication breakdowns between your server and the default route.
# Example of problematic output:
192.168.14.1 dev eth0 FAILED
- Network congestion causing ARP packet loss
- Gateway firewall blocking ARP requests
- Hardware issues in switches or routers
- MAC address changes (common in HA environments)
- Duplicate IP addresses on the network
To gather more information during the failure window:
# Continuous ARP ping to gateway
arping -I eth0 -c 5 192.168.14.1
# Check kernel ARP table in real-time
watch -n 1 "ip neigh show | grep 192.168.14.1"
# Capture ARP traffic
tcpdump -i eth0 -n arp and host 192.168.14.1
For environments where you can't control the gateway configuration:
# Manually set ARP entry (temporary solution)
arp -s 192.168.14.1 00:22:64:b6:10:5c
# Adjust ARP timeout parameters (in seconds)
echo 300 > /proc/sys/net/ipv4/neigh/eth0/base_reachable_time
echo 60 > /proc/sys/net/ipv4/neigh/eth0/delay_first_probe_time
Create a monitoring script to log ARP state changes:
#!/bin/bash
LOG_FILE="/var/log/arp_monitor.log"
GATEWAY_IP="192.168.14.1"
while true; do
STATE=$(ip neigh show | grep "$GATEWAY_IP" | awk '{print $NF}')
if [[ "$STATE" == "FAILED" ]]; then
echo "$(date) - ARP FAILED for $GATEWAY_IP" >> $LOG_FILE
# Optional: trigger network restart
# systemctl restart networking
fi
sleep 5
done
For enterprise environments, consider:
- Setting up
keepalived
for automatic failover - Implementing BFD (Bidirectional Forwarding Detection)
- Using ECMP (Equal-Cost Multi-Path) routing
Enable ARP debugging in the kernel:
echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_debug
dmesg -w | grep arp