Diagnosing and Resolving Linux Kernel Page Allocation Failures in Low-Memory Xen DomU Instances


2 views

The log entry swapper: page allocation failure. order:0, mode:0x20 indicates your Linux kernel failed to allocate a single page (order:0) of memory with GFP_ATOMIC flags (mode:0x20). This typically occurs when:

  • The kernel needs memory for critical operations but can't wait for reclaim
  • Memory fragmentation prevents contiguous allocation
  • Available memory is critically low despite swap appearing underutilized

With only 512MB RAM and 512MB swap in your Xen DomU, you're operating in a tight memory environment. The low swap usage doesn't necessarily mean sufficient memory because:

# Check actual memory pressure (not just free memory)
$ cat /proc/meminfo | grep -E 'MemAvailable|SwapCached|Active'
MemAvailable:    128436 kB
SwapCached:      2048 kB
Active:          384216 kB

This shows most memory is actively used, with only ~128MB truly available. The kernel prefers keeping pages in RAM until absolutely necessary, which explains low swap usage while still experiencing allocation failures.

First, confirm memory pressure with OOM killer metrics and slab info:

# Check OOM killer activity
$ dmesg | grep -i "oom"
$ grep -i oom /var/log/kern.log

# Examine slab allocations
$ cat /proc/slabinfo | awk '{if($3*$4/1024 > 1024) print $1,$3*$4/1024}' | sort -nk2

For Xen-specific analysis:

# Check balloon driver memory
$ xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  4096     4     r-----  123456.7
your-domu                                  123   512     1     -b----   7890.1

Immediate mitigation:

# Reduce cache pressure
$ echo 3 > /proc/sys/vm/drop_caches

# Adjust swappiness temporarily
$ echo 60 > /proc/sys/vm/swappiness

Long-term fixes:

  • Increase DomU memory allocation in Xen configuration:
    memory = 1024
    maxmem = 1024
  • Implement cgroups for critical processes:
    # Create memory-limited cgroup
    cgcreate -g memory:/critical_apps
    cgset -r memory.limit_in_bytes=256M critical_apps

For persistent issues, enable detailed memory logging:

# Enable extra debugging (temporary)
$ echo 1 > /proc/sys/vm/oom_dump_tasks
$ echo 1 > /proc/sys/vm/panic_on_oom

# Check per-process memory
$ ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -n 10

Consider adding kernel parameters to your boot configuration:

vm.min_free_kbytes=65536
vm.overcommit_memory=2
vm.overcommit_ratio=80

When you see swapper: page allocation failure. order:0, mode:0x20 in your kernel logs, this indicates the Linux kernel's memory allocator failed to fulfill a memory request. Let's break down the components:

order:0 → Requesting 2^0 = 1 page (typically 4KB)
mode:0x20 → GFP_ATOMIC allocation flag (urgent, cannot sleep)
swapper → Context: kernel's idle thread

Not necessarily. The key factors to examine:

  • Swap utilization: Your 10% usage suggests physical memory pressure exists but isn't critical
  • OOM killer inactivity: No process terminations indicate the system recovered
  • Xen-specific considerations: Dom0 might be overcommitting memory to DomUs

Run these to gather more evidence:

# Check memory fragmentation
cat /proc/buddyinfo

# Monitor slab allocations
cat /proc/slabinfo | awk '{if($3*$4/1024 > 1024) print $1,$3*$4/1024}'

# Xen memory stats (DomU perspective)
xm list
xentop

For a similar case with Java applications, we added these to /etc/sysctl.conf:

vm.min_free_kbytes = 65536
vm.swappiness = 10
vm.overcommit_ratio = 50

Then reload with:

sysctl -p

Enable more detailed logging temporarily:

echo 1 > /proc/sys/vm/oom_dump_tasks
echo 1 > /proc/sys/vm/panic_on_oom

For persistent tracking, consider implementing a monitoring script:

#!/bin/bash
while true; do
  date >> /var/log/mem_debug.log
  cat /proc/meminfo >> /var/log/mem_debug.log
  cat /proc/buddyinfo >> /var/log/mem_debug.log
  sleep 300
done

In your Dom0 configuration file (/etc/xen/your-domu.cfg), ensure proper memory allocation:

memory = 512
maxmem = 768
memory_slack = 64