Monitoring and Logging Process Memory/CPU Usage Over Time in Linux: Solutions for Diagnosing Resource Leaks


2 views

When diagnosing system performance issues like swap exhaustion or CPU saturation, standard monitoring tools often fall short. While solutions like Cacti, Nagios, or Munin provide excellent system-wide metrics, they typically lack granular process-level visibility - exactly what we need when tracking down rogue processes.

Here are three mature solutions that meet all specified requirements:

# Solution 1: NetData (Real-time Visualization)
sudo apt install netdata
# Access via http://localhost:19999
# Process monitoring enabled by default

NetData provides per-process metrics out of the box with beautiful dashboards. For persistent logging:

# Solution 2: Prometheus + Process Exporter
wget https://github.com/ncabatoff/process-exporter/releases/download/v0.7.5/process-exporter-0.7.5.linux-amd64.tar.gz
tar xvfz process-exporter-*.tar.gz
./process-exporter --config.path=process.yml

Sample process.yml configuration:

process_names:
  - name: "{{.Comm}}"
    cmdline:
    - '.+'

For maximum flexibility, a simple bash solution:

#!/bin/bash
while true; do
  timestamp=$(date +%s)
  ps -eo pid,comm,%cpu,%mem,rsz,vsz --sort=-%mem >> /var/log/process_monitor.log
  echo "------ ${timestamp} ------" >> /var/log/process_monitor.log
  sleep 30
done

For processing the collected data:

# Generate daily CPU averages
awk '/^[0-9]/ {cpu[$2]+=$3; count[$2]++} 
END {for(p in cpu) print p, cpu
/count
}' /var/log/process_monitor.log

For visualization, consider Grafana with the collected metrics or use GoAccess for log analysis.

  • Log rotation with logrotate to prevent disk filling
  • Process name normalization (consider using exe path rather than comm)
  • Monitoring the monitoring system itself

When diagnosing system performance issues like swap exhaustion or runaway processes, standard monitoring tools often fall short by only providing system-wide metrics. Here are several effective solutions for tracking individual process resource consumption:

The Linux /proc filesystem provides real-time process statistics. A simple monitoring script might look like:


#!/bin/bash
while true; do
    timestamp=$(date +%s)
    ps -eo pid,user,pcpu,pmem,cmd --sort=-%cpu | head -n 10 >> /var/log/process_monitor.log
    echo "---" >> /var/log/process_monitor.log
    sleep 30
done

This captures top CPU-consuming processes every 30 seconds. For memory-focused monitoring, change --sort=-%cpu to --sort=-%mem.

For more sophisticated tracking with historical data:


import psutil
import time
import sqlite3

conn = sqlite3.connect('process_stats.db')
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS process_stats
             (timestamp REAL, pid INTEGER, name TEXT, cpu REAL, mem REAL)''')

while True:
    for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_percent']):
        cursor.execute("INSERT INTO process_stats VALUES (?,?,?,?,?)",
                      (time.time(), proc.info['pid'], proc.info['name'],
                       proc.info['cpu_percent'], proc.info['memory_percent']))
    conn.commit()
    time.sleep(30)
  • Netdata: Real-time monitoring with process-level metrics and excellent visualization
  • Prometheus + Process Exporter: Collects process metrics and integrates with alerting
  • atop: Advanced system and process monitor with historical logging
  • Glances: Cross-platform monitoring tool with process tracking

With collected data, you can identify trends using simple SQL:


-- Top memory-consuming processes (daily average)
SELECT name, AVG(mem) as avg_mem 
FROM process_stats 
WHERE timestamp > strftime('%s','now','-1 day')
GROUP BY name 
ORDER BY avg_mem DESC 
LIMIT 10;

-- CPU usage spikes detection
SELECT strftime('%H:00', timestamp, 'unixepoch') as hour, 
       MAX(cpu) as peak_cpu
FROM process_stats
WHERE name = 'mysqld'
GROUP BY hour;

For services like Apache that might cause swap death, consider this alerting rule for Prometheus:


groups:
- name: memory.rules
  rules:
  - alert: HighMemoryProcess
    expr: process_resident_memory_bytes{job="process-exporter"} > 1.5e9
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Process {{ $labels.name }} using excessive memory"
      description: "{{ $labels.name }} (PID {{ $labels.pid }}) is using {{ $value }} bytes memory"