Monitoring CPU Usage per Process on Linux: Diagnosing 100% CPU Spikes on Gentoo Server


1 views

When your Gentoo server suddenly becomes unresponsive with CPU and disk I/O maxing out, you need surgical tools to identify the culprit processes. The Linux kernel provides several powerful utilities for real-time and historical process monitoring.

The classic top command gives immediate visibility:

top -c -o %CPU

For a more user-friendly interface with process tree view and color coding:

htop --sort-key=PERCENT_CPU

Key columns to watch:

  • %CPU: Process CPU utilization
  • RES: Resident memory usage
  • S: Process state (R=running, S=sleeping)
  • TIME+: Total CPU time consumed

The pidstat tool from sysstat package provides historical data collection:

# Install sysstat (Gentoo)
emerge sysstat

# Monitor all processes every 5 seconds
pidstat -urd -h 5

Sample output interpretation:

12:15:07 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
12:15:12 PM     0      1321   85.23    2.14    0.00   87.37     1  /usr/bin/python

This shows process 1321 (Python) consuming 87.37% CPU on core 1.

For post-mortem analysis after crashes:

emerge psacct
/etc/init.d/psacct start

# View commands that consumed most CPU
sa -m

Example output showing cumulative CPU usage:

python                    127.38 cpu (85.2%)
apache2                    12.41 cpu (8.3%)

When standard tools aren't enough, SystemTap can trace kernel-level process activity:

# Install SystemTap
emerge systemtap

# Create CPU profiling script (cpu_profile.stp)
global process_cpu
probe scheduler.cpu_on {
    process_cpu[pid(),execname()] <<< 1
}
probe timer.s(10) {
    foreach([pid,execname] in process_cpu) {
        printf("%d %s: %d\n", pid, execname, @count(process_cpu[pid,execname]))
    }
    delete process_cpu
}

Run with:

stap -v cpu_profile.stp

Create a monitoring script (/usr/local/bin/monitor_cpu.sh):

#!/bin/bash
LOG=/var/log/cpu_monitor.log
echo "$(date): CPU Monitoring" >> $LOG
ps -eo pid,ppid,cmd,%cpu,%mem --sort=-%cpu | head -n 10 >> $LOG

Add to cron:

*/5 * * * * /usr/local/bin/monitor_cpu.sh

When your Linux server becomes unresponsive with 100% CPU and disk I/O spikes, the first step is identifying the culprit processes. Here are powerful command-line tools for real-time monitoring:

# Classic top command with CPU sorting
top -o %CPU

# More modern alternative (requires htop installation)
htop

# Detailed process tree with resource usage
ps aux --sort=-%cpu | head -n 10

For continuous monitoring between incidents, consider these approaches:

# Log top 10 CPU-consuming processes every 5 minutes
(crontab -l 2>/dev/null; echo "*/5 * * * * ps -eo pid,user,%cpu,cmd --sort=-%cpu | head -n 11 > /var/log/cpu_monitor.log") | crontab -

# Alternatively use sysstat's pidstat (requires package installation)
pidstat -u 5 -p ALL >> /var/log/pidstat.log

For deeper investigation when issues occur, Linux offers powerful tracing tools:

# strace to monitor system calls
strace -p [PID] -c

# perf for performance analysis
perf top -p [PID]

# iotop for disk I/O monitoring (requires root)
sudo iotop -o

Create a simple monitoring script that triggers alerts when thresholds are exceeded:

#!/bin/bash
THRESHOLD=90
PROCESS=$(ps -eo pid,user,%cpu,cmd --sort=-%cpu | head -n 2 | tail -n 1)
CPU_USAGE=$(echo $PROCESS | awk '{print $3}')

if (( $(echo "$CPU_USAGE > $THRESHOLD" | bc -l) )); then
    echo "High CPU usage alert: $PROCESS" | mail -s "CPU Alert" admin@example.com
fi

For historical analysis, configure process accounting:

# Install process accounting tools
sudo apt-get install acct

# Enable process accounting
sudo accton on

# Generate reports
sa -m   # Show user-based CPU usage
sa -a   # Detailed process accounting

For the most detailed diagnostics, consider kernel instrumentation:

# SystemTap example (requires kernel headers)
probe timer.s(10) {
    printf("%-25s %-8s %5s %5s\n", "COMMAND", "PID", "CPU", "DISK")
    foreach (task in tasks) {
        printf("%-25s %-8d %5d %5d\n", task_execname(task), task_pid(task),
               task_cpu(task), task_io_read_bytes(task))
    }
}