The kernel's Out-of-Memory (OOM) killer uses a sophisticated scoring system rather than simply relying on raw memory consumption. The oom_score
value in /proc/[pid]/oom_score
is calculated using this formula:
oom_score = total_vm + (swappiness × swap_used) + (oom_score_adj × total_memory / 1000)
The scoring algorithm considers multiple factors:
- Total virtual memory (total_vm): The complete address space of the process
- Swappiness factor: Accounts for swap usage (typically 1-60)
- Swap used: Amount of memory swapped to disk
- User-adjustable oom_score_adj: Ranges from -1000 to +1000
Using Resident Set Size (RSS) alone would be problematic because:
// Example of why RSS isn't sufficient
if (process->is_critical_service) {
// Should survive OOM even with high memory usage
adjust_oom_score(-500);
}
Critical processes might need protection despite high memory usage, while memory-hogging user applications should be prioritized for killing.
Check current scores for all processes:
#!/bin/bash
for proc in $(find /proc -maxdepth 1 -type d -name "[0-9]*"); do
if [ -f "$proc/oom_score" ]; then
pid=$(basename "$proc")
comm=$(cat "$proc/comm")
score=$(cat "$proc/oom_score")
printf "PID %6d: %20s - Score: %4d\n" "$pid" "$comm" "$score"
fi
done
Adjust scores for important processes:
// Protect MySQL from OOM killer
echo "-100" > /proc/$(pgrep mysqld)/oom_score_adj
// Make Chrome more likely to be killed
echo "500" > /proc/$(pgrep chrome)/oom_score_adj
The actual calculation happens in mm/oom_kill.c
:
unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) {
long points;
long adj;
adj = (long)p->signal->oom_score_adj;
points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS);
points = points * 1000 / totalpages;
points += adj;
return points > 0 ? points : 1;
}
When debugging out-of-memory (OOM) situations on Linux systems, many developers notice the /proc/[pid]/oom_score
value but struggle to understand its calculation. Unlike the straightforward memory consumption metrics, the OOM score incorporates multiple factors to make smarter kill decisions.
The kernel calculates the OOM score by considering:
/*
* The badness heuristic assigns a value to each candidate task ranging from 0
* (never kill) to 1000 (always kill). The main factors are:
* - memory size (resident set size)
* - process CPU time (utime + stime)
* - process age (start_time)
* - oom_score_adj value
* - whether it's a child of the OOM-invoking task
*/
static unsigned long oom_badness(struct task_struct *p, unsigned long totalpages)
{
long points;
long adj;
if (oom_unkillable_task(p))
return 0;
adj = (long)p->signal->oom_score_adj;
points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS);
points = points * 1000 / totalpages;
adj *= totalpages / 1000;
points += adj;
return points > 0 ? points : 1;
}
Raw memory usage (RSS) alone doesn't tell the complete story. Consider these scenarios:
- A long-running database process consuming 10GB vs. a memory leak in a new process consuming 2GB
- A parent process spawning many children that collectively consume memory
- System-critical processes vs. user applications
Let's examine two processes with different characteristics:
# Process A - Memory-hungry application
PID: 1234
RSS: 800MB
OOM_SCORE_ADJ: 0
CPU Time: 5 minutes
Age: 10 minutes
# Process B - Critical system service
PID: 5678
RSS: 1.2GB
OOM_SCORE_ADJ: -500
CPU Time: 3 hours
Age: 2 days
Despite Process B using more memory, its negative oom_score_adj and long runtime would give it a lower oom_score, protecting it from being killed.
Administrators can influence the scoring through:
# Protect important process
echo -100 > /proc/[pid]/oom_score_adj
# Make process preferred candidate
echo 1000 > /proc/[pid]/oom_score_adj
For better OOM debugging, consider these commands:
# List processes by OOM score
ps -eo pid,comm,rss,oom_score,oom_score_adj | sort -k4 -n
# Detailed OOM information
dmesg | grep -i "oom killer"