How to Automatically Kill High CPU Usage Processes After Threshold Time on Linux

As a Linux system administrator managing game servers, I frequently encounter processes that crash and consume 100% CPU indefinitely. While brief spikes to 100% usage are normal during intensive operations, sustained high CPU usage typically indicates a hung process that needs intervention.

The key challenge is distinguishing between temporary high usage (normal operation) and permanent high usage (crashed state). We need to monitor processes over a time window (like 30 seconds) before taking action.

Here's an enhanced version of the script you found, modified to handle multiple processes and configurable thresholds:

#!/bin/bash
# Process killer script - monitors multiple processes by name

# Configuration
PROCESS_NAMES=("srcds" "mysqld" "java") # Processes to monitor
CPU_THRESHOLD=98                        # Percentage considered "high"
DURATION_THRESHOLD=30                   # Seconds above threshold before kill
CHECK_INTERVAL=5                        # Seconds between checks

# Main monitoring loop
while true; do
    for proc in "${PROCESS_NAMES[@]}"; do
        # Get PID and CPU usage
        pid_info=$(ps -C "$proc" -o pid=,pcpu=)
        if [ -n "$pid_info" ]; then
            pid=$(echo "$pid_info" | awk '{print $1}')
            cpu_usage=$(echo "$pid_info" | awk '{print $2}' | cut -d. -f1)
            
            # Check if above threshold
            if [ "$cpu_usage" -ge "$CPU_THRESHOLD" ]; then
                if [ -z "${high_cpu_start[$pid]}" ]; then
                    high_cpu_start[$pid]=$(date +%s)
                    echo "$(date): $proc (PID $pid) exceeded CPU threshold"
                else
                    duration=$(( $(date +%s) - ${high_cpu_start[$pid]} ))
                    if [ "$duration" -ge "$DURATION_THRESHOLD" ]; then
                        echo "$(date): Killing $proc (PID $pid) - over threshold for $duration seconds"
                        kill -9 "$pid"
                        unset high_cpu_start[$pid]
                    fi
                fi
            else
                unset high_cpu_start[$pid]
            fi
        fi
    done
    sleep $CHECK_INTERVAL
done

For more sophisticated monitoring, consider these options:

Systemd service configuration: Add CPU usage limits directly in service files
cgroups: Create control groups with CPU usage limits
Monit: Lightweight monitoring tool with process watching capabilities

When implementing automated process killing:

Log all kills for debugging purposes
Consider implementing automatic restarts after killing
Set up alerts when processes are killed frequently
Test thresholds thoroughly for each application type

In Linux server administration, especially when running game servers or other long-running applications, we often encounter processes that occasionally crash and consume 100% CPU indefinitely. These zombie processes can:

Degrade overall system performance
Cause cascading failures in dependent services
Lead to unnecessary hosting costs

Basic approaches like killall or one-time ps checks don't work because:

# Bad approach - kills valid high-CPU processes
pkill -f "my_game_server"

Game servers legitimately spike to 100% CPU during operations like map changes or player loads. We need duration-based monitoring.

Here's a Python implementation that monitors multiple processes by name:

#!/usr/bin/env python3
import psutil
import time
from datetime import datetime

TARGET_PROCS = ["srcds_linux", "minecraft_server"]
MAX_DURATION = 30  # seconds
CHECK_INTERVAL = 5  # seconds

process_trackers = {}

while True:
    for proc in psutil.process_iter(['pid', 'name', 'cpu_percent']):
        if proc.info['name'] in TARGET_PROCS:
            pid = proc.info['pid']
            cpu = proc.info['cpu_percent']
            
            if cpu >= 99:  # 100% is rarely exact
                if pid not in process_trackers:
                    process_trackers[pid] = time.time()
                    print(f"{datetime.now()} - High CPU detected for PID {pid}")
                else:
                    duration = time.time() - process_trackers[pid]
                    if duration >= MAX_DURATION:
                        proc.kill()
                        print(f"{datetime.now()} - Killed PID {pid} after {duration:.1f}s")
                        del process_trackers[pid]
            else:
                if pid in process_trackers:
                    del process_trackers[pid]
    
    time.sleep(CHECK_INTERVAL)

For enterprise environments, consider adding:

Logging to syslog or file
Email/SMS alerts before killing
CPU core count awareness (100% on 8 cores ≠ 100% on 1 core)
Process restart automation

For those preferring shell scripts, this Bash version works well:

#!/bin/bash
PROCESS_NAMES=("java" "hl2_linux")
THRESHOLD_SECONDS=30
INTERVAL=10

while true; do
    for proc_name in "${PROCESS_NAMES[@]}"; do
        pids=$(pgrep "$proc_name")
        for pid in $pids; do
            usage=$(ps -p "$pid" -o %cpu --no-headers)
            if (( $(echo "$usage >= 99" | bc -l) )); then
                if [[ -f "/tmp/highcpu_$pid" ]]; then
                    start_time=$(cat "/tmp/highcpu_$pid")
                    duration=$(( $(date +%s) - start_time ))
                    if (( duration >= THRESHOLD_SECONDS )); then
                        kill -9 "$pid"
                        rm "/tmp/highcpu_$pid"
                        logger -t cpuwatch "Killed $proc_name (PID:$pid) after ${THRESHOLD_SECONDS}s"
                    fi
                else
                    date +%s > "/tmp/highcpu_$pid"
                fi
            else
                rm -f "/tmp/highcpu_$pid"
            fi
        done
    done
    sleep $INTERVAL
done

For production servers, configure as a systemd service:

[Unit]
Description=CPU Process Monitor
After=network.target

[Service]
ExecStart=/usr/local/bin/cpu_monitor.py
Restart=always
User=root

[Install]
WantedBy=multi-user.target

ServerDevWorker

How to Automatically Kill High CPU Usage Processes After Threshold Time on Linux

Related Articles