When a Linux server hits 100% CPU utilization, SSH connections often fail with timeout errors or become unresponsive. This creates a frustrating catch-22 situation where you can't troubleshoot because you can't access the system. The root cause typically lies in how Linux's process scheduler handles CPU allocation under heavy load.
Linux uses the Completely Fair Scheduler (CFS) by default, which attempts to distribute CPU time fairly among all processes. Under extreme load, this "fairness" means SSH gets starved along with other processes. We need to modify this behavior through several approaches:
# Check current CPU usage breakdown
top -b -n 1 | head -n 12
# Alternative using ps
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head
Edit the SSHd service unit file to give it higher scheduling priority:
# Create override directory if it doesn't exist
sudo mkdir -p /etc/systemd/system/ssh.service.d/
# Create priority configuration
echo "[Service]
CPUSchedulingPolicy=fifo
CPUSchedulingPriority=50" | sudo tee /etc/systemd/system/ssh.service.d/priority.conf
# Reload systemd and restart SSH
sudo systemctl daemon-reload
sudo systemctl restart ssh
We can leverage the nice value system to prioritize SSH-related processes:
# Set high priority for existing SSH processes
sudo renice -n -10 -p $(pgrep sshd)
# Make this persistent by editing limits.conf
echo "* hard priority -10" | sudo tee -a /etc/security/limits.conf
For more sophisticated control, use cgroups to reserve resources for SSH:
# Install cgroup tools (if needed)
sudo dnf install libcgroup-tools
# Create cgroup configuration
echo "group ssh_priority {
cpu {
cpu.shares = 1024;
}
}" | sudo tee /etc/cgconfig.d/ssh.conf
# Apply configuration
sudo cgconfigparser -l /etc/cgconfig.d/ssh.conf
# Add SSH to the cgroup
echo "service sshd {
group ssh_priority;
}" | sudo tee /etc/cgrules.d/ssh.rules
When all else fails, most cloud providers offer console access:
- AWS: EC2 Instance Connect or Session Manager
- GCP: Serial Console
- Azure: Serial Console or Run Command
Implement these monitoring solutions to prevent future lockouts:
# Simple monitoring script example
#!/bin/bash
CPU_THRESHOLD=90
if [[ $(awk '{print $1}' /proc/loadavg) > $CPU_THRESHOLD ]]; then
# Automatically renice SSH when threshold exceeded
renice -n -20 -p $(pgrep sshd)
logger "CPU threshold exceeded - SSH priority increased"
fi
Consider setting up alternative access methods like Web Console (cockpit) or out-of-band management interfaces for critical systems.
When a Linux server hits 100% CPU utilization, the default process scheduling behavior often makes SSH connections unreliable or completely unavailable. This becomes particularly problematic for sysadmins needing emergency access to diagnose issues like runaway processes or DDoS attacks.
The Linux kernel's Completely Fair Scheduler (CFS) can be configured to prioritize SSH through these methods:
# Set SSH process priority via nice (static adjustment)
sudo nice -n -20 /usr/sbin/sshd
# Alternative: Use renice on running process
sudo renice -n -20 -p $(pgrep sshd)
For modern systems using systemd, create an override file:
sudo mkdir -p /etc/systemd/system/sshd.service.d
sudo tee /etc/systemd/system/sshd.service.d/priority.conf <
Then reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart sshd
Create a dedicated cgroup for SSH in /etc/cgconfig.conf:
group ssh_priority {
cpu {
cpu.shares = 1000;
}
memory {
memory.limit_in_bytes = 512M;
memory.memsw.limit_in_bytes = 512M;
}
}
Then configure systemd to use it:
sudo systemctl set-property sshd.service CPUAccounting=true MemoryAccounting=true
sudo systemctl set-property sshd.service CPUShares=1000 MemoryLimit=512M
When SSH remains inaccessible, consider these backup strategies:
- Serial console access via GRUB configuration
- Out-of-band management (IPMI/iDRAC)
- Pre-configured emergency watchdog scripts
Implement proactive measures to prevent complete lockouts:
# Cron job to log high-CPU processes
*/5 * * * * ps -eo pid,user,%cpu,cmd --sort=-%cpu | head -n 10 >> /var/log/highcpu.log