Linux Kernel Memory Management Issue: Disk Cache Not Releasing Under Memory Pressure Despite 3GB Free Memory

I've been wrestling with a peculiar memory management issue on our Ubuntu servers running 2.6.31-302 x86-64 kernel. The system consistently maintains about 1.4GB of disk cache that refuses to be released even when applications desperately need more memory, leading to OOM killer activation.

Here's what free -m typically shows:

# free -m
             total       used       free     shared    buffers     cached
Mem:          7186       5615       1571          0          7       1409
-/+ buffers/cache:       4198       2988
Swap:            0          0          0

The "cached" memory remains stubbornly allocated
OOM killer triggers despite buffers/cache showing 3GB free
drop_caches only reclaims minimal memory
The problem worsens over time (observed up to 2GB "stuck" on other servers)

The memory allocation details reveal more:

# cat /proc/meminfo
Active:          5524644 kB
Active(anon):    5492108 kB
Active(file):      32536 kB
Inactive(file):    41380 kB

This shows nearly all active memory is anonymous (application) memory, with very little file cache that could be easily reclaimed.

1. Swap Configuration

While some sysadmins prefer no swap, it's actually crucial for memory management:

# Recommended minimum swap:
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

2. VM Tunables

Try adjusting these kernel parameters:

# Increase cache pressure
echo 100 > /proc/sys/vm/vfs_cache_pressure
echo 80 > /proc/sys/vm/swappiness

# More aggressive reclaim settings
echo 1 > /proc/sys/vm/overcommit_memory
echo 50 > /proc/sys/vm/overcommit_ratio

3. Tracking Down the Stuck Cache

Use slabtop to identify kernel objects consuming memory:

watch -n 1 slabtop -sc

For critical applications, consider memory cgroups:

# Create memory cgroup
cgcreate -g memory:/myapp
echo 4G > /sys/fs/cgroup/memory/myapp/memory.limit_in_bytes
echo 3G > /sys/fs/cgroup/memory/myapp/memory.soft_limit_in_bytes

The 2.6.31 kernel is quite old. Modern kernels (4.x+) have significantly improved memory management. Consider testing with a newer LTS kernel.

Here's a script to monitor the situation:

#!/bin/bash
while true; do
    echo "===== $(date) ====="
    free -m
    echo "Active file cache: $(grep Active.file /proc/meminfo)"
    echo "Inactive file cache: $(grep Inactive.file /proc/meminfo)"
    echo "Slab usage: $(awk '/SReclaimable/ {print $2}' /proc/meminfo) kB reclaimable"
    sleep 30
done

I've been troubleshooting a critical memory management issue on Ubuntu (kernel 2.6.31-302 x86_64) where the system fails to properly release disk cache when memory pressure increases. Despite having ~1.4GB shown as "free" in the buffers/cache calculation, the OOM killer activates when physical memory is exhausted.

# Sample memory output showing the discrepancy
$ free -m
             total       used       free     shared    buffers     cached
Mem:          7186       5614       1572          0          7       1410
-/+ buffers/cache:       4196       2990
Swap:            0          0          0

The most puzzling aspects of this behavior:

Manual cache dropping (echo 3 > /proc/sys/vm/drop_caches) only reclaims minimal memory
The "stuck" cache grows over time on multiple servers
Complete absence of swap space (controversial sysadmin decision)
OOM killer triggers despite apparent available memory

From examining /proc/meminfo, several interesting metrics stand out:

Active:          5524644 kB
Active(anon):    5492108 kB
Active(file):      32536 kB
Inactive(file):    41380 kB

The extremely high Active(anon) value suggests most memory is tied to anonymous mappings (process memory), while file cache (Active(file)) remains surprisingly low. This imbalance might explain why the kernel struggles to reclaim cache.

1. Adjusting VM Swappiness

Even without swap, swappiness affects cache reclamation:

# Check current value
$ cat /proc/sys/vm/swappiness

# Temporary adjustment
$ sudo sysctl vm.swappiness=10

# Permanent change
$ echo "vm.swappiness = 10" >> /etc/sysctl.conf

2. Modifying Cache Pressure

Increase the tendency to reclaim file cache:

# Default is 100, try increasing
$ sudo sysctl vm.vfs_cache_pressure=150

3. Implementing Cgroup Memory Limits

For critical applications, use cgroups to guarantee memory:

# Create cgroup
$ sudo cgcreate -g memory:myapp

# Set memory limit
$ echo "4G" > /sys/fs/cgroup/memory/myapp/memory.limit_in_bytes

# Launch application in cgroup
$ cgexec -g memory:myapp /path/to/application

These parameters in /proc/sys/vm/ might need adjustment:

extra_free_kbytes - Minimum amount of emergency memory
min_free_kbytes - Absolute minimum free memory
page-cluster - Controls swap readahead behavior

Create a monitoring script to track memory patterns:

#!/bin/bash
while true; do
    echo "===== $(date) ====="
    grep -E 'MemFree|Cached|Active|Inactive' /proc/meminfo
    ps -eo pid,comm,%mem --sort=-%mem | head -n 5
    sleep 30
done >> /var/log/memory_debug.log

While Linux memory management is generally excellent, edge cases like this demonstrate the importance of understanding low-level kernel behavior. The combination of no swap space and specific application memory patterns appears to create this pathological case where the kernel's normal cache reclamation heuristics fail.

For production systems, I'd recommend either implementing proper swap space (even just 1GB) or moving to a newer kernel version where memory management has seen significant improvements.

ServerDevWorker