Debugging Linux OOM Killer Triggers in 32-bit Kernel with High Memory Systems


3 views

The kernel log shows a classic Out-of-Memory (OOM) situation where the squid process (PID 15533) triggered the OOM killer. What's particularly interesting is that this occurs on a 32-bit kernel system with 36GB RAM - a configuration that's prone to memory zone exhaustion issues despite having ample physical memory.

Dec 27 09:19:05 2013 kernel: : [277622.359064] squid invoked oom-killer: gfp_mask=0x42d0, order=3, oom_score_adj=0
Dec 27 09:19:05 2013 kernel: : [277622.359069] squid cpuset=/ mems_allowed=0

The key issue stems from how 32-bit Linux kernels handle memory zones. The system divides memory into three zones:

  • DMA: First 16MB (mostly empty in our case)
  • Normal: Up to 896MB (heavily used)
  • HighMem: Everything above 896MB

The critical detail in the logs:

Dec 27 09:19:05 2013 kernel: : [277622.359382] DMA free:2332kB min:36kB low:44kB high:52kB
Dec 27 09:19:05 2013 kernel: : [277622.359384] lowmem_reserve[]: 0 573 36539 36539

Let's examine the memory statistics at crash time using a diagnostic script:

#!/bin/bash
# Memory zone analyzer for 32-bit systems

echo "===== Zone Info ====="
grep -A 50 "Mem-Info:" /var/log/kern.log | grep -E "DMA|Normal|HighMem"

echo -e "\n===== Memory Statistics ====="
free -m
echo -e "\nBuffer/Cache:"
cat /proc/meminfo | grep -E "Buffers|Cached"

echo -e "\n===== Slab Info ====="
cat /proc/slabinfo | sort -rn -k2 | head -20

The system has plenty of free memory (6911872 pages free), but the issue lies in the Normal zone (DMA and Normal combined form LowMem). Kernel allocations for network operations (like TCP buffers) must come from LowMem, which becomes exhausted despite HighMem availability.

Key indicators from the log:

Dec 27 09:19:05 2013 kernel: : [277622.359371] active_anon:658515 inactive_anon:54399
Dec 27 09:19:05 2013 kernel: : [277622.359371] active_file:1172176 inactive_file:323606
Dec 27 09:19:05 2013 kernel: : [277622.359371] free:6911872

Immediate mitigation:

# Increase LowMem ratio (requires reboot)
echo "vm.lowmem_reserve_ratio = 256 256 32" >> /etc/sysctl.conf

# Adjust OOM killer behavior
echo "vm.overcommit_memory = 2" >> /etc/sysctl.conf
echo "vm.overcommit_ratio = 80" >> /etc/sysctl.conf

Long-term solution:

# Monitor LowMem usage
watch -n 1 'cat /proc/zoneinfo | grep -A5 "Node 0 zone Normal"'

# Consider upgrading to 64-bit kernel if possible
# If not, implement these squid optimizations:
squid -k reconfigure -f /etc/squid/squid.conf.optimized

For systems that must run 32-bit kernels with large memory, consider these kernel boot parameters:

# In /etc/default/grub
GRUB_CMDLINE_LINUX="highmem=force numa=off vmalloc=256MB"

And implement this monitoring script to watch LowMem pressure:

#!/usr/bin/python
import re

def check_lowmem():
    with open('/proc/zoneinfo') as f:
        data = f.read()
    
    normal_zone = re.search(r'Node 0, zone Normal(.*?)pages free', data, re.DOTALL)
    if normal_zone:
        stats = normal_zone.group(1)
        free_pages = re.search(r'free\s+(\d+)', stats)
        if free_pages and int(free_pages.group(1)) < 10000:
            print("WARNING: LowMem critically low!")
            return 1
    return 0

if __name__ == "__main__":
    exit(check_lowmem())

When analyzing your kernel panic logs, the key red flag is the memory zone information showing:

Mem-Info:
DMA free:2332kB min:36kB
Normal free:114488kB min:3044kB
HighMem per-cpu: multiple active allocations

This reveals a classic 32-bit Linux memory management issue where the kernel cannot properly utilize all 36GB physical RAM due to address space limitations. The 3:1 (userspace:kernel) split creates only ~896MB of "Normal" zone memory (visible in present:894968kB).

The critical log line:

squid invoked oom-killer: gfp_mask=0x42d0, order=3, oom_score_adj=0

Shows the OOM killer triggered when Squid attempted a __GFP_HIGHMEM allocation (0x42d0 mask) but couldn't satisfy the order-3 (16KB) request. This happened despite having 6.9GB free memory overall - proof of zone exhaustion.

Before crashes occur, monitor memory zones with:

# Real-time zone monitoring
watch -n 1 'cat /proc/zoneinfo | grep -E "Node|zone|free|min|low|high"' 

# Check process memory mapping
pmap -x $(pgrep squid)

# Capture OOM-prone processes
echo 2 > /proc/sys/vm/oom_dump_tasks

For immediate relief without 64-bit migration:

# Increase Normal zone allocation ratio
echo 2048 > /proc/sys/vm/lowmem_reserve_ratio

# Prioritize keeping free pages in Normal zone
echo 1 > /proc/sys/vm/zone_reclaim_mode

# Example sysctl.conf persistence
cat <> /etc/sysctl.conf
vm.lowmem_reserve_ratio = 256 256 2048
vm.zone_reclaim_mode = 1
EOF

Modify squid.conf to reduce memory fragmentation:

# Limit maximum object size
maximum_object_size 4 MB

# Enable memory pools
memory_pools on
memory_pools_limit 768 MB

# Reduce caching threads
cache_dir aufs /var/spool/squid 8000 16 256

For systems with >4GB RAM, compile a PAE-enabled kernel or switch to 64-bit. Verify CPU compatibility:

grep -q lm /proc/cpuinfo && echo "64-bit capable" || echo "32-bit only"

Example GRUB configuration for PAE kernel:

menuentry 'Debian GNU/Linux (3.10.24-pae)' {
    linux /boot/vmlinuz-3.10.24-pae root=UUID=... ro mem=36G highmem=force
}