When your Gentoo server suddenly becomes unresponsive with CPU and disk I/O maxing out, you need surgical tools to identify the culprit processes. The Linux kernel provides several powerful utilities for real-time and historical process monitoring.
The classic top
command gives immediate visibility:
top -c -o %CPU
For a more user-friendly interface with process tree view and color coding:
htop --sort-key=PERCENT_CPU
Key columns to watch:
%CPU
: Process CPU utilizationRES
: Resident memory usageS
: Process state (R=running, S=sleeping)TIME+
: Total CPU time consumed
The pidstat
tool from sysstat package provides historical data collection:
# Install sysstat (Gentoo) emerge sysstat # Monitor all processes every 5 seconds pidstat -urd -h 5
Sample output interpretation:
12:15:07 PM UID PID %usr %system %guest %CPU CPU Command 12:15:12 PM 0 1321 85.23 2.14 0.00 87.37 1 /usr/bin/python
This shows process 1321 (Python) consuming 87.37% CPU on core 1.
For post-mortem analysis after crashes:
emerge psacct /etc/init.d/psacct start # View commands that consumed most CPU sa -m
Example output showing cumulative CPU usage:
python 127.38 cpu (85.2%) apache2 12.41 cpu (8.3%)
When standard tools aren't enough, SystemTap can trace kernel-level process activity:
# Install SystemTap emerge systemtap # Create CPU profiling script (cpu_profile.stp) global process_cpu probe scheduler.cpu_on { process_cpu[pid(),execname()] <<< 1 } probe timer.s(10) { foreach([pid,execname] in process_cpu) { printf("%d %s: %d\n", pid, execname, @count(process_cpu[pid,execname])) } delete process_cpu }
Run with:
stap -v cpu_profile.stp
Create a monitoring script (/usr/local/bin/monitor_cpu.sh
):
#!/bin/bash LOG=/var/log/cpu_monitor.log echo "$(date): CPU Monitoring" >> $LOG ps -eo pid,ppid,cmd,%cpu,%mem --sort=-%cpu | head -n 10 >> $LOG
Add to cron:
*/5 * * * * /usr/local/bin/monitor_cpu.sh
When your Linux server becomes unresponsive with 100% CPU and disk I/O spikes, the first step is identifying the culprit processes. Here are powerful command-line tools for real-time monitoring:
# Classic top command with CPU sorting
top -o %CPU
# More modern alternative (requires htop installation)
htop
# Detailed process tree with resource usage
ps aux --sort=-%cpu | head -n 10
For continuous monitoring between incidents, consider these approaches:
# Log top 10 CPU-consuming processes every 5 minutes
(crontab -l 2>/dev/null; echo "*/5 * * * * ps -eo pid,user,%cpu,cmd --sort=-%cpu | head -n 11 > /var/log/cpu_monitor.log") | crontab -
# Alternatively use sysstat's pidstat (requires package installation)
pidstat -u 5 -p ALL >> /var/log/pidstat.log
For deeper investigation when issues occur, Linux offers powerful tracing tools:
# strace to monitor system calls
strace -p [PID] -c
# perf for performance analysis
perf top -p [PID]
# iotop for disk I/O monitoring (requires root)
sudo iotop -o
Create a simple monitoring script that triggers alerts when thresholds are exceeded:
#!/bin/bash
THRESHOLD=90
PROCESS=$(ps -eo pid,user,%cpu,cmd --sort=-%cpu | head -n 2 | tail -n 1)
CPU_USAGE=$(echo $PROCESS | awk '{print $3}')
if (( $(echo "$CPU_USAGE > $THRESHOLD" | bc -l) )); then
echo "High CPU usage alert: $PROCESS" | mail -s "CPU Alert" admin@example.com
fi
For historical analysis, configure process accounting:
# Install process accounting tools
sudo apt-get install acct
# Enable process accounting
sudo accton on
# Generate reports
sa -m # Show user-based CPU usage
sa -a # Detailed process accounting
For the most detailed diagnostics, consider kernel instrumentation:
# SystemTap example (requires kernel headers)
probe timer.s(10) {
printf("%-25s %-8s %5s %5s\n", "COMMAND", "PID", "CPU", "DISK")
foreach (task in tasks) {
printf("%-25s %-8d %5d %5d\n", task_execname(task), task_pid(task),
task_cpu(task), task_io_read_bytes(task))
}
}