In the world of server management, network issues can be a real headache. A user recently encountered a rather perplexing situation. Munin presented a graph (as shown in the image at https://i.sstatic.net/vX3Di.png), and during a significant spike on the eth0 port, they lost the ability to access the server via eth0, although IPMI port access remained intact.
The user diligently searched through the typical log files in Ubuntu 10.04 LTS, specifically in /var/log/(kern|syslog|messages). However, nothing out of the ordinary was found, and there was no dedicated log file for eth0. So, the burning question is, where are the logs for eth0?
In Ubuntu, one useful tool to check for network - related errors is dmesg. You can run the following command in the terminal:
bash
dmesg | grep eth0
This command will display any kernel - related messages that mention the eth0 interface. For example, if there was a hardware - related issue like a cable disconnect or a driver problem, it might show up here.
Another option is to use ethtool. First, install it if not already installed:
bash
sudo apt - get install ethtool
Then, you can use it to check the status of the eth0 interface:
bash
sudo ethtool eth0
This command provides a wealth of information about the network interface, including link status, speed, duplex, and error counts. You can also monitor the error counts over time to detect any abnormal increases.
If you want more detailed and continuous monitoring, you can set up a script to periodically check these values. For example, a simple bash script could look like this:
bash
#!/bin/bash
while true; do
ethtool eth0 | grep -i 'error'
sleep 60
done
Save this script as eth0_error_monitor.sh, make it executable with chmod +x eth0_error_monitor.sh, and then run it in the background with ./eth0_error_monitor.sh &. This will continuously check for any error - related messages from ethtool every 60 seconds.
When your Munin graph shows sudden spikes in Ethernet errors like the example below:
This typically indicates one of these underlying issues:
- Physical layer problems (cable/connector issues)
- Driver or NIC firmware bugs
- Network congestion or misconfiguration
- Hardware failures
Unlike application logs, Ethernet errors are primarily logged through these channels:
1. Kernel Ring Buffer (dmesg)
The most immediate source of hardware-related errors:
dmesg | grep -i eth0
dmesg | grep -i error
dmesg | grep -i dropped
2. System Log Files
Check these standard locations with rotated logs:
grep -i eth0 /var/log/syslog*
grep -i error /var/log/kern.log*
journalctl -k --since "2 hours ago" | grep eth0
3. Network Interface Statistics
Get real-time error counters:
ethtool -S eth0
ip -s link show eth0
cat /proc/net/dev | grep eth0
For persistent issues, consider these deeper inspection methods:
Packet Capture Analysis
tcpdump -i eth0 -w capture.pcap
tshark -r capture.pcap -q -z io,phs
Driver-Specific Debugging
First identify your NIC driver:
ethtool -i eth0 | grep driver
lspci -v | grep -A10 Ethernet
Then enable debug logging (example for Intel NICs):
echo 8 > /sys/module/ixgbe/parameters/debug
dmesg -w
Create a cron job to log interface statistics hourly:
#!/bin/bash
LOG_FILE="/var/log/eth0_stats.log"
echo "$(date) - ETH0 Statistics" >> $LOG_FILE
ethtool -S eth0 >> $LOG_FILE
echo "------------------------" >> $LOG_FILE
- Update NIC firmware:
ethtool -i eth0 | grep firmware
- Adjust MTU:
ip link set dev eth0 mtu 1500
- Disable problematic offloads:
ethtool -K eth0 gro off gso off