When your Linux system faces severe memory pressure, the Out-Of-Memory (OOM) killer activates to prevent complete system failure by terminating selected processes. To understand recent incidents:
# Check OOM killer events in kernel logs
dmesg | grep -i "oom-killer"
dmesg | grep -i "killed process"
# Alternative log locations
grep -i "oom-killer" /var/log/messages
journalctl -k --grep="oom-killer"
The first process killed isn't necessarily the root cause. Use these tools to investigate memory usage patterns:
# Real-time memory monitoring
vmstat -SM 1 10
free -mh
# Process-level memory analysis
ps aux --sort=-%mem | head -n 15
top -b -o +%MEM -n 1 | head -n 20
For production servers, we need deeper diagnostics:
# Install and run smem for detailed reporting
yum install smem -y
smem -t -k -p | grep -E "www|mysql|postfix"
# Check slab memory usage
cat /proc/meminfo | grep -E "Slab|SReclaimable|SUnreclaim"
# Analyze process memory maps (replace PID)
pmap -x | sort -n -k3
Implement these proactive measures:
# Configure OOM killer adjustments
echo -17 > /proc//oom_adj # Protect critical processes
sysctl -w vm.overcommit_memory=2 # More conservative allocation
# MySQL-specific tuning (example)
[mysqld]
performance_schema=ON
innodb_buffer_pool_size = 256M # Adjust based on available RAM
Automate incident documentation with this script:
#!/bin/bash
LOG_FILE="/var/log/oom_analysis_$(date +%Y%m%d).log"
{
echo "===== OOM Killer Analysis Report ====="
date
echo -e "\nMemory Status:"
free -m
echo -e "\nRecent OOM Events:"
dmesg | grep -i "oom-killer" | tail -n 10
echo -e "\nTop Memory Consumers:"
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -n 15
} > $LOG_FILE
Last Tuesday at 3:47 AM, my monitoring system alerted me that both Apache and SSH became unresponsive on our production VPS. The smoking gun in /var/log/messages
:
Jan 14 03:47:12 vps01 kernel: Out of memory: Kill process 2156 (httpd) score 887 or sacrifice child Jan 14 03:47:12 vps01 kernel: Killed process 2156, UID 48, (httpd) total-vm:245728kB, anon-rss:142892kB, file-rss:428kB
First, we need to reconstruct the memory state before OOM-killer struck. The /var/log/messages
contains gold mines:
grep -i 'out of memory' /var/log/messages grep -i 'oom-killer' /var/log/messages | awk -F'(' '{print $2}' | awk -F')' '{print $1}' | sort | uniq -c
The second command reveals frequent victims - in my case, httpd and mysqld kept appearing.
Install and run dmesg -T | grep -i oom
to see kernel-level OOM events with timestamps. For a more detailed post-mortem:
# Install crash utility yum install crash -y # Analyze vmcore (if configured) crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/crash> kmem -i crash> log
OOM-killer uses a badness score algorithm. To see what processes were candidates:
# View OOM score of running processes for f in /proc/*/oom_score; do pid=${f#/proc/} pid=${pid%/oom_score} echo "$(cat $f) $(ps -p $pid -o comm=)" done | sort -nr | head
In my case, this revealed PHP-FPM processes consuming abnormal memory after a WordPress plugin update.
For critical services like SSH, add protection in /etc/sysctl.conf
:
# Protect SSH from OOM echo -17 > /proc/$(pgrep sshd)/oom_adj # System-wide config vm.overcommit_memory = 2 vm.overcommit_ratio = 80 vm.panic_on_oom = 0
Now I use this cron job to log memory trends every 5 minutes:
*/5 * * * * echo $(date +\%s) $(free -m | awk '/Mem:/ {print $3,$4,$7}') >> /var/log/mem.log
Combine with this Grafana alert when available memory drops below 10%:
- alert: LowMemory expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 for: 5m labels: severity: warning annotations: summary: "Low memory on {{ $labels.instance }}"