How to Ensure SSH Access During Memory Exhaustion: Investigating Swap Underutilization in Linux Servers


3 views

Recently, I encountered a baffling situation where my GoDaddy server would periodically become completely unresponsive to SSH connections. What made this particularly frustrating was that swap space remained largely unused even when RAM was completely exhausted. The server would essentially freeze despite having available swap capacity.

Through careful logging using a cron job that captured top output every 5 minutes, I identified the pattern:

# Normal operation (healthy state)
top - 15:13:21 up  3:12,  2 users,  load average: 0.15, 0.30, 0.33
Mem:   2064980k total,  1611252k used,   453728k free,    45852k buffers
Swap:  2096472k total,        0k used,  2096472k free,   790212k cached

# Just before crash (critical state)
top - 14:45:08 up 15:20,  0 users,  load average: 0.27, 0.16, 0.10
Mem:   2064980k total,  2007652k used,    57328k free,    60496k buffers
Swap:  2096472k total,      100k used,  2096372k free,   689584k cached

The root cause was leaking application connections - each consuming ~30MB RAM. After ~40 connections, the server would crash despite available swap space.

Linux's memory management involves several key thresholds that affect when and how swap is used:

# Check current swappiness value
cat /proc/sys/vm/swappiness

The default value of 60 means the kernel will start swapping before RAM is completely exhausted, but in my case, the OOM killer was likely being triggered prematurely.

Here are three technical solutions to maintain SSH access during memory pressure:

1. Create a CGroup for SSH

# Install cgroup tools if needed
sudo apt-get install cgroup-tools

# Create memory-limited cgroup for sshd
sudo cgcreate -g memory:sshd-limit
sudo cgset -r memory.limit_in_bytes=512M sshd-limit
sudo cgset -r memory.memsw.limit_in_bytes=512M sshd-limit

# Modify sshd service to use the cgroup
sudo systemctl edit sshd
[Service]
MemoryAccounting=yes
MemoryLimit=512M

2. Adjust OOM Killer Priorities

# Make sshd process less likely to be killed
sudo echo '-17' > /proc/$(pgrep -f '/usr/sbin/sshd')/oom_adj

# Permanent solution via systemd
sudo systemctl edit sshd
[Service]
OOMScoreAdjust=-500

3. Implement Early Warning System

#!/bin/bash
# memory_monitor.sh
threshold=90
while true; do
    mem_usage=$(free | awk '/Mem/{printf("%d"), $3/$2*100}')
    if [ $mem_usage -gt $threshold ]; then
        /usr/sbin/sshd -D -p 2222 &  # Start secondary sshd on alt port
        break
    fi
    sleep 30
done

While the above measures help maintain access, the real fix requires addressing the application memory leaks:

// Example connection cleanup in Node.js
const cleanup = () => {
  connections.forEach(conn => {
    if (conn.isIdle) conn.close()
  });
};

setInterval(cleanup, 30000);  // Run cleanup every 30 seconds

Implement proper connection pooling and timeouts to prevent unbounded memory growth.

When all else fails, consider these alternatives:

  • Configure serial console access (requires hardware support)
  • Set up a watchdog timer to automatically reboot unresponsive systems
  • Implement out-of-band management (iDRAC, iLO, IPMI)

When a Linux server hits an Out-of-Memory (OOM) state, SSH access often fails despite available swap space. The core issue isn't just memory exhaustion, but how the kernel's OOM killer prioritizes processes. Even with 100KB swap usage (as shown in your logs), critical system processes may get terminated.

Your logs show a critical pattern:

Mem:   2064980k total,  2007652k used,    57328k free
Swap:  2096472k total,      100k used

Three key observations:

  • Free memory drops below 60MB (danger zone for process spawning)
  • Swap remains virtually unused despite memory pressure
  • Buffer/cache memory can't be reclaimed fast enough for new processes

1. cgroups v2 Memory Protection:

# Create protected slice for SSH
sudo mkdir /sys/fs/cgroup/system.slice/sshd.slice
echo "100M" > /sys/fs/cgroup/system.slice/sshd.slice/memory.min

2. Systemd Unit Hardening:

# /etc/systemd/system/sshd.service.d/oomprotect.conf
[Service]
MemoryMin=100M
MemoryHigh=150M
OOMScoreAdjust=-500

Emergency Console Access:

# In /etc/default/grub add:
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8"

Low-Memory SSH Alternative:

# Install dropbear for emergency access
sudo apt install dropbear
echo "DROPBEAR_OPTIONS=\"-p 2222 -s\"" >> /etc/default/dropbear

Python monitoring script example:

#!/usr/bin/env python3
import psutil, socket, smtplib

def check_memory():
    mem = psutil.virtual_memory()
    if mem.available < 100 * 1024 * 1024:  # 100MB threshold
        send_alert()

def send_alert():
    with socket.create_connection(('localhost', 25)) as smtp:
        smtp.sendall(b"Subject: Memory Critical\\r\\n\\r\\nSSH may become unavailable")

Critical /etc/sysctl.conf adjustments:

vm.oom_kill_allocating_task = 0
vm.panic_on_oom = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 80

For your application issue (40 connections x 30MB):

# Node.js connection limit example
const server = require('http').createServer();
server.maxConnections = 25;  // Keep below (total_mem / connection_mem)