Debugging OOM-Killed Processes: Core Dumps and ABRT Integration Techniques


2 views

When the Linux out-of-memory (OOM) killer terminates a process, traditional debugging becomes challenging because:

  • The process is killed abruptly without proper shutdown
  • Core dumps are often not generated by default
  • Critical memory state information is lost immediately

To capture core dumps from OOM-killed processes, configure these system settings:

# Set core pattern to store in /var/crash
echo "/var/crash/core.%e.%p.%h.%t" | sudo tee /proc/sys/kernel/core_pattern

# Ensure unlimited core dump size
ulimit -c unlimited

# Enable OOM dumping (kernel 4.11+)
echo 1 | sudo tee /proc/sys/vm/oom_dump_tasks

Configure ABRT (Automated Bug Reporting Tool) to handle OOM kills:

# Install ABRT components
sudo yum install abrt abrt-addon-ccpp abrt-addon-kerneloops

# Configure ABRT for OOM events
sudo sysctl -w kernel.oom_dump_tasks=1
sudo service abrtd restart

# Verify configuration
abrt-cli list

Once you have the core dump, analyze it with:

# Basic analysis with gdb
gdb /path/to/executable /path/to/core.dump

# Check memory allocation patterns
(gdb) info proc mappings
(gdb) x/100x &malloc_stats

# For containerized environments
docker run --ulimit core=-1 -d your_application

Implement these safeguards in your code:

// C++ example: Set new handler
#include 
#include 

void custom_new_handler() {
    std::cerr << "Memory allocation failed!" << std::endl;
    // Log state or trigger diagnostic
    std::abort(); // Forces core dump
}

int main() {
    std::set_new_handler(custom_new_handler);
    // Your application code
}

Adjust OOM killer behavior via sysctl:

# Make specific process less likely to be killed
echo -15 > /proc/[PID]/oom_score_adj

# Adjust overall OOM killer aggressiveness
sysctl -w vm.panic_on_oom=1
sysctl -w vm.overcommit_memory=2

When the Linux kernel's Out-of-Memory (OOM) killer terminates a process, it typically does so abruptly without generating core dumps by default. This makes debugging memory-related issues particularly challenging. The OOM killer selects processes based on their oom_score (calculated from memory usage and other factors) and kills them with SIGKILL (signal 9), which cannot be caught or handled by the process.

To capture core dumps before OOM killer strikes, you'll need to configure several system parameters:

# Enable core dumps
echo "1" > /proc/sys/kernel/core_uses_pid
echo "/tmp/core-%e-%p-%t" > /proc/sys/kernel/core_pattern
ulimit -c unlimited

# Configure OOM killer to use SIGABRT instead of SIGKILL
echo "2" > /proc/sys/vm/oom_dump_tasks
echo "1" > /proc/sys/vm/oom_kill_allocating_task

For critical applications, consider implementing an early warning system that triggers core dumps when memory pressure is high but before OOM killer activates:

#!/bin/bash
# Monitor memory pressure and trigger core dump
THRESHOLD=90  # 90% memory usage
while true; do
    MEM_USED=$(free | awk '/Mem:/ {print $3/$2 * 100}')
    if (( $(echo "$MEM_USED > $THRESHOLD" | bc -l) )); then
        PID=$(pgrep -f "your_application")
        kill -ABRT $PID
        break
    fi
    sleep 5
done

Once you have the core dump, use gdb to analyze it:

gdb /path/to/your/executable /tmp/core-your_app-12345-1620000000
(gdb) bt full  # Show full backtrace
(gdb) info registers  # Examine register values
(gdb) x/100x $sp  # Examine stack contents

For production systems, consider these additional settings in /etc/sysctl.conf:

vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.panic_on_oom = 0
vm.oom_kill_allocating_task = 0
kernel.core_pattern = |/usr/local/bin/core_helper %e %p %t

The core_helper script can filter which processes get core dumps and where they're stored:

#!/bin/bash
# core_helper script
APP_NAME=$1
PID=$2
TIMESTAMP=$3

if [[ $APP_NAME == "critical_app" ]]; then
    /usr/bin/gzip -c > /mnt/core_dumps/core-$APP_NAME-$PID-$TIMESTAMP.gz
fi

If your process runs under systemd, these settings can help in OOM situations:

[Service]
MemoryAccounting=yes
MemoryHigh=90%
MemoryMax=95%
OOMPolicy=kill|continue|stop