Decoding OOM-Killer Logs: Diagnosing and Troubleshooting Memory Issues on Ubuntu Servers


1 views

When examining your syslog containing OOM-killer messages, we can see several critical pieces of information:

Oct 25 07:28:04 nldedip4k031 kernel: [87946.529514] irqbalance invoked oom-killer: gfp_mask=0x80d0, order=0, oom_adj=0, oom_score_adj=0
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529516] irqbalance cpuset=/ mems_allowed=0

The memory statistics reveal an unusual pattern:

  • HighMem free: 5168196kB (plenty available)
  • Normal zone free: only 44052kB (dangerously low)
  • Active file cache: 2815 (very small)
  • Inactive file cache: 6849119 (extremely large)

The key indicator is the massive inactive file cache (6.8GB) that isn't being properly reclaimed. The system has plenty of HighMem available but the Normal zone (where kernel allocations occur) is exhausted. This suggests either:

  1. A memory leak in kernel-space allocations
  2. Improper memory pressure settings causing cache not to reclaim

To proactively monitor memory usage before OOM strikes:

# Check current memory usage
sudo cat /proc/meminfo | grep -E 'MemTotal|MemFree|Buffers|Cached|Swap'

# Check slab usage (kernel memory)
sudo cat /proc/slabinfo | awk '{if($3*$4/1024/1024 > 10) print $1,$3*$4/1024"MB"}'

# Check process memory
ps aux --sort=-%mem | head -n 10

Add these to /etc/sysctl.conf:

# More aggressive cache reclaim
vm.vfs_cache_pressure = 500
vm.swappiness = 10

# OOM killer adjustments
vm.oom_kill_allocating_task = 1
vm.overcommit_memory = 2
vm.overcommit_ratio = 80

Create a cron job to monitor memory pressure:

#!/bin/bash
THRESHOLD=90
MEM_USED=$(free | awk '/Mem/{printf("%d"), $3/$2*100}')

if [ $MEM_USED -gt $THRESHOLD ]; then
    echo "Memory usage critical: $MEM_USED%" | mail -s "Memory Alert" admin@example.com
    # Force cache cleanup
    sync; echo 3 > /proc/sys/vm/drop_caches
fi

For persistent issues, enable detailed memory tracing:

echo 1 > /proc/sys/vm/block_dump
dmesg -wH | grep -i 'block_dump\|out_of_memory'

This will show exactly which processes are causing disk I/O and memory allocation patterns leading to OOM situations.


When analyzing your OOM-killer syslog entries, the critical section begins with:


Oct 25 07:28:04 nldedip4k031 kernel: [87946.529514] irqbalance invoked oom-killer: gfp_mask=0x80d0, order=0, oom_adj=0, oom_score_adj=0

This reveals that the irqbalance process (PID 948) triggered the OOM killer when attempting to allocate memory with specific flags (gfp_mask=0x80d0). The subsequent memory statistics show alarming patterns:

The detailed memory report shows:


active_anon:5523 inactive_anon:354
active_file:2815 inactive_file:6849119
free:1304125
slab_reclaimable:104672 slab_unreclaimable:3419

Despite having 1.3GB free memory, the system triggered OOM because:

  • 6.8 million pagecache pages (file cache) consuming memory
  • Slab allocation shows 104MB reclaimable but only 3.4MB unreclaimable
  • No swap usage despite available swap space

The key metrics revealing memory pressure:


Normal free:44052kB min:44216kB

The "Normal" zone (where kernel allocations occur) had only 44MB free when it needed at least 44.2MB. This explains why the OOM killer activated despite overall free memory.

Create this bash script to monitor memory pressure:


#!/bin/bash
watch -n 1 '
echo -e "\\nMEMORY PRESSURE:";
grep -E "MemFree|MemAvailable|Active|Inactive|Slab|SReclaimable|SUnreclaim" /proc/meminfo;
echo -e "\\nTOP MEMORY USERS:";
ps -eo pid,user,comm,%mem --sort=-%mem | head -n 10;
echo -e "\\nSLAB INFO:";
sudo cat /proc/slabinfo | awk '\''{if($3*$4/1024 > 10) print $1,$3*$4/1024 "KB"}'\'' | sort -rnk2 | head
'

Add these kernel parameters to /etc/sysctl.conf:


vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.swappiness = 10
vm.vfs_cache_pressure = 500

For Apache tuning (visible in your logs):



    StartServers            2
    MinSpareServers         2
    MaxSpareServers         5
    MaxRequestWorkers       50
    MaxConnectionsPerChild  10000

Install and run oomd for proactive monitoring:


sudo apt install systemd-oomd
sudo systemctl enable --now systemd-oomd

Create custom OOM rules:


# /etc/systemd/oomd.conf
[OOM]
DefaultMemoryPressureLimit=60%
DefaultMemoryPressureDurationSec=20s