When facing network instability, we identified several critical symptoms:
show processes cpu sorted | exc 0.00%
CPU utilization for five seconds: 99%/12%; one minute: 99%; five minutes: 99%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
12 111438973 18587995 5995 44.47% 43.88% 43.96% 0 ARP Input
174 59541847 5198737 11453 22.39% 23.47% 23.62% 0 Hulc LED Process
Multiple MAC addresses appearing on single ports was a red flag:
Vlan Mac Address Type Ports
---- ----------- -------- -----
1 001c.c06c.d620 DYNAMIC Gi1/1/3
1 001c.c06c.d694 DYNAMIC Gi1/1/3
1 001c.c06c.d6ac DYNAMIC Gi1/1/3
Before diving deeper, we verified several configuration aspects:
- Confirmed STP configuration with
show spanning-tree
- Checked VLAN assignments with
show vlan brief
- Verified no TCAM exhaustion with
show platform tcam utilization
Using Wireshark, we identified the ARP storm pattern. This Python snippet helps analyze packet captures:
from scapy.all import *
def analyze_arp(pcap_file):
packets = rdpcap(pcap_file)
arp_count = {}
for pkt in packets:
if ARP in pkt:
src_mac = pkt[ARP].hwsrc
arp_count[src_mac] = arp_count.get(src_mac, 0) + 1
return sorted(arp_count.items(), key=lambda x: x[1], reverse=True)
To mitigate the broadcast traffic impact:
interface GigabitEthernet1/0/1
storm-control broadcast level 20.00
storm-control action trap
This script helps track MAC address movements across ports:
#!/bin/bash
while true; do
date >> mac_movement.log
ssh switch "show mac address-table" | grep -i "001c.c06c" >> mac_movement.log
sleep 10
done
These commands proved valuable during troubleshooting:
show platform cpu packet statistics
show platform hardware forward drops
show ip arp inspection statistics
The actual solution involved multiple layers:
- Implemented port security on critical access ports
- Reduced ARP timeout to 240 seconds
- Enabled DHCP snooping with ARP inspection
- Created smaller VLANs to reduce broadcast domains
After implementing changes, we verified improvements:
show processes cpu | include ARP
show interfaces | include broadcast
show mac address-table count vlan 1
When dealing with network instability characterized by ARP broadcast storms and high CPU utilization on Cisco 3750X switches, we're typically facing one of these scenarios:
- Layer 2 loops in the network topology
- Misconfigured or malfunctioning network devices
- VLAN design issues (particularly with large broadcast domains)
- ARP cache poisoning or other security incidents
Several key commands help identify the root cause:
# Show ARP-related CPU utilization
show processes cpu sorted | exclude 0.00%
# Monitor broadcast traffic patterns
show interfaces | include line|broadcast
# Check MAC address table stability
show mac address-table dynamic count
show mac address-table dynamic vlan 1
# Verify TCAM utilization
show platform tcam utilization
Here are concrete steps to mitigate the issue:
! Enable storm control on affected VLANs
interface range GigabitEthernet1/0/1-24
storm-control broadcast level 20.00
storm-control action trap
end
! Implement port security where possible
interface GigabitEthernet1/0/1
switchport port-security maximum 2
switchport port-security violation restrict
switchport port-security mac-address sticky
end
! Adjust ARP timers (example for VLAN 1)
interface Vlan1
arp timeout 300
end
For persistent issues, consider this Python script to monitor MAC flaps:
import paramiko
import time
from collections import defaultdict
def monitor_mac_flaps(switch_ip, username, password, interval=60):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(switch_ip, username=username, password=password)
mac_history = defaultdict(list)
while True:
stdin, stdout, stderr = ssh.exec_command('show mac address-table dynamic')
output = stdout.read().decode()
current_macs = {}
for line in output.splitlines()[4:]: # Skip headers
parts = line.split()
if len(parts) >= 4:
vlan, mac, _, port = parts[:4]
current_macs[mac] = port
for mac, port in current_macs.items():
if mac in mac_history:
if mac_history[mac][-1] != port:
print(f"MAC FLAP: {mac} moved from {mac_history[mac][-1]} to {port}")
mac_history[mac].append(port)
time.sleep(interval)
# Usage example
monitor_mac_flaps('192.168.1.1', 'admin', 'cisco123')
Key architectural considerations:
- Segment large VLANs (/20 is too broad - consider /24 or smaller)
- Implement Private VLANs where appropriate
- Enable DHCP snooping and ARP inspection
- Consider implementing VRF-lite for different departments
For extreme cases where standard troubleshooting fails:
! Completely disable ARP on an interface for testing
interface GigabitEthernet1/0/1
no ip proxy-arp
no arp
end
! Create an ACL to block ARP temporarily
access-list 100 deny udp any any eq 67
access-list 100 deny udp any any eq 68
access-list 100 permit ip any any
interface Vlan1
ip access-group 100 in
end
Implement SNMP monitoring for these critical OIDs:
1.3.6.1.2.1.4.22.1.1 # ipNetToMediaPhysAddress (ARP table)
1.3.6.1.2.1.17.4.3.1.2 # dot1dTpFdbPort (MAC address table)
1.3.6.1.4.1.9.9.109.1.1.1.1.6 # cpmCPUTotal1minRev (CPU utilization)