Looking at the memory statistics right before the OOM event:
Mem: 15339640k total, 15268304k used, 71336k free, 3152k buffers
Swap: 0k total, 0k used, 0k free, 6608384k cached
While top
shows 6.6GB cached, the key detail appears in the OOM killer's memory dump:
Node 0 Normal free:11428kB min:11548kB low:14432kB high:17320kB
active_anon:10689464kB inactive_anon:19164kB active_file:528kB
mapped:4999580kB shmem:4997080kB
The system shows classic symptoms of memory fragmentation despite having cached memory:
- High anonymous memory usage (10.6GB active_anon)
- Extremely low free memory in the Normal zone (11MB vs required 11.5MB minimum)
- Massive shared memory usage (4.9GB shmem)
The current configuration may be problematic:
shared_buffers = 6GB
effective_cache_size = 8GB
Recommended adjustments for Linux systems:
# Calculate based on available RAM (example for 16GB system)
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
work_mem = 64MB # Per-operation memory
maintenance_work_mem = 1GB # For VACUUM, CREATE INDEX
random_page_cost = 1.1 # For SSD storage
Add these to /etc/sysctl.conf
:
# Reduce tendency to reclaim FS cache
vm.vfs_cache_pressure = 50
# More aggressive swapping behavior
vm.swappiness = 10
# Allow memory overcommit (risky but may help)
vm.overcommit_memory = 2
vm.overcommit_ratio = 95
Instead of top
, use this command for real-time monitoring:
watch -n1 "grep -E '^(MemFree|Cached|Active|Inactive|Swap)' /proc/meminfo"
For PostgreSQL-specific monitoring:
SELECT pid, query,
pg_size_pretty(pg_total_relation_size(relid)) as relation_size,
now() - query_start as duration
FROM pg_stat_activity
WHERE state = 'active' ORDER BY duration DESC;
To identify memory fragmentation:
# Install kernel debug symbols first
sudo apt-get install linux-image-$(uname -r)-dbgsym
# Then check memory fragmentation:
sudo cat /proc/buddyinfo
sudo cat /proc/pagetypeinfo
For continuous monitoring, set up this cron job:
*/5 * * * * /bin/echo -n "date: " >> /var/log/memfrag.log; \
cat /proc/buddyinfo >> /var/log/memfrag.log
What we're seeing here is a classic case where the system reports available memory (primarily FS cache) while still triggering the OOM killer. The key metrics from your logs show:
Mem: 15339640k total, 15268304k used, 71336k free
Swap: 0k total, 0k used, 0k free, 6608384k cached
The memory breakdown from syslog reveals critical details:
active_anon:3616567 inactive_anon:4798
active_file:98 inactive_file:168
free:16921 slab_reclaimable:17631 slab_unreclaimable:7534
The main issue is the imbalance between anonymous memory (application heap) and file cache, combined with memory fragmentation.
Your PostgreSQL configuration shows:
shared_buffers = 6GB
effective_cache_size = 8GB
This setup is trying to allocate large contiguous memory chunks which can trigger OOM even when total free memory exists.
Immediate mitigation:
# Reduce pressure on low memory zones
echo 1 > /proc/sys/vm/overcommit_memory
echo 80 > /proc/sys/vm/overcommit_ratio
Long-term fixes:
# Adjust PostgreSQL memory settings
shared_buffers = 4GB
effective_cache_size = 10GB
work_mem = 32MB
maintenance_work_mem = 256MB
Add these to /etc/sysctl.conf:
vm.min_free_kbytes = 524288
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0
vm.swappiness = 10
Create this monitoring script for real-time analysis:
#!/bin/bash
while true; do
echo "===== $(date) ====="
free -m
cat /proc/meminfo | grep -E 'MemFree|Buffers|Cached|Active|Inactive'
echo "--- Slab ---"
cat /proc/slabinfo | head -20 | sort -n -k2
sleep 5
done
The root cause appears to be memory fragmentation in the Normal zone combined with PostgreSQL's large contiguous memory requests. The solutions focus on better memory distribution and reducing fragmentation pressure.