Preventing Linux OOM Killer from Terminating Critical Processes: Memory Management Strategies


2 views

When Linux systems exhaust both physical memory and swap space, the Out-of-Memory (OOM) killer activates as a last-resort mechanism. This daemon selects processes to terminate based on complex heuristics that often appear arbitrary from an admin's perspective. The fundamental equation it uses is:

badness = memory_usage * oom_score_adj / total_memory

For critical production systems, consider these /proc tunables:

# Temporarily set for current session
echo 1 > /proc/sys/vm/overcommit_memory
echo 80 > /proc/sys/vm/overcommit_ratio

# Persistent configuration in /etc/sysctl.conf
vm.overcommit_memory = 1
vm.overcommit_ratio = 80
vm.swappiness = 10
vm.oom_kill_allocating_task = 1

Modify oom_score_adj for essential services (lower values reduce kill probability):

# Protect MySQL server
echo -1000 > /proc/$(pgrep mysqld)/oom_score_adj

# Systemd service example (create /etc/systemd/system/mysql.service.d/oomadjust.conf):
[Service]
OOMScoreAdjust=-1000

Modern Linux systems offer memory control groups for fine-grained allocation:

# Create cgroup with 4GB memory limit
cgcreate -g memory:/important_apps
echo 4G > /sys/fs/cgroup/memory/important_apps/memory.limit_in_bytes

# Add process to cgroup
cgclassify -g memory:important_apps $(pgrep nginx)

Implement proactive monitoring with this shell snippet:

#!/bin/bash
threshold=90
while true; do
  mem_used=$(free | awk '/Mem/{printf("%.0f"), $3/$2*100}')
  if [ $mem_used -ge $threshold ]; then
    logger -t memalert "Memory usage $mem_used% - taking action"
    # Add mitigation steps here
  fi
  sleep 30
done

When adding swap isn't possible, consider these alternatives:

# Create high-performance zswap (requires kernel 3.11+)
modprobe zswap
echo 1 > /sys/module/zswap/parameters/enabled
echo zstd > /sys/module/zswap/parameters/compressor

# Alternative: Use fast storage for swap
fallocate -l 4G /fastswap
mkswap /fastswap
swapon -p 100 /fastswap

When Linux systems exhaust both physical memory and swap space, the Out-of-Memory (OOM) killer springs into action. This mechanism selects processes to terminate based on complex heuristics, often resulting in crucial applications being killed unexpectedly.

Before implementing solutions, we need to understand Linux's memory management:


# View current memory usage
free -h
# Check swappiness value
cat /proc/sys/vm/swappiness
# Check OOM score adjustments
cat /proc/[pid]/oom_score_adj

1. Process Prioritization

Protect critical processes by adjusting their OOM scores:


# Make a process less likely to be killed (-1000 to 1000)
echo -1000 > /proc/[pid]/oom_score_adj
# For permanent configuration, use systemd:
[Service]
OOMScoreAdjust=-500

2. Swap Space Optimization

While not a complete solution, proper swap configuration helps:


# Create additional swap file
fallocate -l 2G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
# Make permanent in /etc/fstab
/swapfile none swap sw 0 0

3. Cgroup Memory Limits

Contain memory usage per application using cgroups:


# Create memory-limited cgroup
cgcreate -g memory:/limited_group
echo 2G > /sys/fs/cgroup/memory/limited_group/memory.limit_in_bytes
# Assign process to cgroup
cgclassify -g memory:limited_group [pid]

4. Early Warning System

Implement monitoring to prevent OOM situations:


#!/bin/bash
THRESHOLD=90
while true; do
    MEM_USED=$(free | awk '/Mem:/ {print $3/$2 * 100}')
    if (( $(echo "$MEM_USED > $THRESHOLD" | bc -l) )); then
        logger "WARNING: Memory usage at $MEM_USED%"
        # Trigger cleanup or alert
    fi
    sleep 60
done

1. Kernel Parameter Tuning


# Adjust OOM killer aggressiveness
sysctl vm.panic_on_oom=1
sysctl vm.overcommit_memory=2
sysctl vm.overcommit_ratio=80

2. Application-Level Solutions

For developers, implement memory management in applications:


// C example: check malloc failures
void *ptr = malloc(large_size);
if (ptr == NULL) {
    // Handle error gracefully
    cleanup_resources();
    exit(EXIT_FAILURE);
}

3. Memory Pressure Monitoring


# Use PSI (Pressure Stall Information) metrics
cat /proc/pressure/memory
# Monitor with tools like earlyoom or oomd

If you still encounter OOM kills despite these measures:

  • Analyze OOM killer logs: dmesg | grep -i oom
  • Profile application memory usage with valgrind or pmap
  • Consider horizontal scaling for memory-intensive applications