Monitoring and Logging Process Memory/CPU Usage Over Time in Linux: Solutions for Diagnosing Resource Leaks


10 views

When diagnosing system performance issues like swap exhaustion or CPU saturation, standard monitoring tools often fall short. While solutions like Cacti, Nagios, or Munin provide excellent system-wide metrics, they typically lack granular process-level visibility - exactly what we need when tracking down rogue processes.

Here are three mature solutions that meet all specified requirements:

# Solution 1: NetData (Real-time Visualization)
sudo apt install netdata
# Access via http://localhost:19999
# Process monitoring enabled by default

NetData provides per-process metrics out of the box with beautiful dashboards. For persistent logging:

# Solution 2: Prometheus + Process Exporter
wget https://github.com/ncabatoff/process-exporter/releases/download/v0.7.5/process-exporter-0.7.5.linux-amd64.tar.gz
tar xvfz process-exporter-*.tar.gz
./process-exporter --config.path=process.yml

Sample process.yml configuration:

process_names:
  - name: "{{.Comm}}"
    cmdline:
    - '.+'

For maximum flexibility, a simple bash solution:

#!/bin/bash
while true; do
  timestamp=$(date +%s)
  ps -eo pid,comm,%cpu,%mem,rsz,vsz --sort=-%mem >> /var/log/process_monitor.log
  echo "------ ${timestamp} ------" >> /var/log/process_monitor.log
  sleep 30
done

For processing the collected data:

# Generate daily CPU averages
awk '/^[0-9]/ {cpu[$2]+=$3; count[$2]++} 
END {for(p in cpu) print p, cpu
/count
}' /var/log/process_monitor.log

For visualization, consider Grafana with the collected metrics or use GoAccess for log analysis.

  • Log rotation with logrotate to prevent disk filling
  • Process name normalization (consider using exe path rather than comm)
  • Monitoring the monitoring system itself

When diagnosing system performance issues like swap exhaustion or runaway processes, standard monitoring tools often fall short by only providing system-wide metrics. Here are several effective solutions for tracking individual process resource consumption:

The Linux /proc filesystem provides real-time process statistics. A simple monitoring script might look like:


#!/bin/bash
while true; do
    timestamp=$(date +%s)
    ps -eo pid,user,pcpu,pmem,cmd --sort=-%cpu | head -n 10 >> /var/log/process_monitor.log
    echo "---" >> /var/log/process_monitor.log
    sleep 30
done

This captures top CPU-consuming processes every 30 seconds. For memory-focused monitoring, change --sort=-%cpu to --sort=-%mem.

For more sophisticated tracking with historical data:


import psutil
import time
import sqlite3

conn = sqlite3.connect('process_stats.db')
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS process_stats
             (timestamp REAL, pid INTEGER, name TEXT, cpu REAL, mem REAL)''')

while True:
    for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_percent']):
        cursor.execute("INSERT INTO process_stats VALUES (?,?,?,?,?)",
                      (time.time(), proc.info['pid'], proc.info['name'],
                       proc.info['cpu_percent'], proc.info['memory_percent']))
    conn.commit()
    time.sleep(30)
  • Netdata: Real-time monitoring with process-level metrics and excellent visualization
  • Prometheus + Process Exporter: Collects process metrics and integrates with alerting
  • atop: Advanced system and process monitor with historical logging
  • Glances: Cross-platform monitoring tool with process tracking

With collected data, you can identify trends using simple SQL:


-- Top memory-consuming processes (daily average)
SELECT name, AVG(mem) as avg_mem 
FROM process_stats 
WHERE timestamp > strftime('%s','now','-1 day')
GROUP BY name 
ORDER BY avg_mem DESC 
LIMIT 10;

-- CPU usage spikes detection
SELECT strftime('%H:00', timestamp, 'unixepoch') as hour, 
       MAX(cpu) as peak_cpu
FROM process_stats
WHERE name = 'mysqld'
GROUP BY hour;

For services like Apache that might cause swap death, consider this alerting rule for Prometheus:


groups:
- name: memory.rules
  rules:
  - alert: HighMemoryProcess
    expr: process_resident_memory_bytes{job="process-exporter"} > 1.5e9
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Process {{ $labels.name }} using excessive memory"
      description: "{{ $labels.name }} (PID {{ $labels.pid }}) is using {{ $value }} bytes memory"