Detecting and Diagnosing Memory Fragmentation in Linux: Huge Pages Performance Impact Analysis


2 views

Memory fragmentation in Linux often manifests as gradual performance degradation in long-running server processes. Many administrators notice significant performance improvements after restarting affected processes. This issue becomes particularly noticeable when using Linux huge pages, as these large contiguous memory blocks are more susceptible to fragmentation over time.

While /proc/buddyinfo provides basic fragmentation information, several more powerful tools exist:


# Check page allocation status
cat /proc/pagetypeinfo

# Monitor huge page statistics
cat /proc/meminfo | grep Huge

# Advanced fragmentation metrics (requires kernel 4.9+)
cat /proc/vmstat | grep -E 'compact|frag'

Huge pages (typically 2MB or 1GB) require contiguous physical memory. Unlike standard 4KB pages, the allocation fails if sufficient contiguous space isn't available. The fragmentation occurs when:

  • Memory gets allocated and freed in varying sizes
  • Long-running processes hold memory for extended periods
  • Memory pressure causes frequent allocations/deallocations

This Python script provides continuous fragmentation monitoring:


#!/usr/bin/env python3
import time

def monitor_fragmentation(interval=60):
    while True:
        with open('/proc/buddyinfo', 'r') as f:
            print("\n" + "="*40)
            print(f"Timestamp: {time.ctime()}")
            print("="*40)
            print(f.read())
        time.sleep(interval)

if __name__ == "__main__":
    monitor_fragmentation()

When fragmentation becomes problematic:


# Compact memory manually (requires CAP_SYS_ADMIN)
echo 1 > /proc/sys/vm/compact_memory

# Adjust huge page settings
echo 1024 > /proc/sys/vm/nr_hugepages
sysctl vm.hugetlb_shm_group=your_app_group

For deep investigation, SystemTap scripts can track memory allocation patterns:


probe vm.pagefault {
    if (pid() == target()) {
        printf("Page fault at %p\n", address)
    }
}

probe kernel.function("__alloc_pages_nodemask") {
    printf("Alloc order=%d flags=%x\n", $order, $gfp_flags)
}

Adjust these parameters to reduce fragmentation impact:

  • vm.extfrag_threshold (default: 500) - lower values trigger compaction sooner
  • vm.compact_unevictable_allowed (default: 1) - controls compaction of unevictable pages
  • vm.watermark_scale_factor (default: 10) - affects reclaim aggressiveness

When your long-running Linux servers experience gradual performance degradation that resolves after process restart, memory fragmentation is often the culprit - especially when using hugepages. The issue manifests through:

  • Increased page faults despite sufficient free memory
  • Degrading throughput over time
  • Hugepage allocation failures (check /var/log/messages for "HugeTLB: allocated failed" errors)

Beyond /proc/buddyinfo, these tools provide deeper insights:


# Comprehensive fragmentation report
cat /proc/pagetypeinfo

# NUMA-aware fragmentation analysis
numactl --hardware | grep "node sizes"
cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

# Real-time monitoring with ftrace
echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
cat /sys/kernel/debug/tracing/trace_pipe

Hugepages (2MB/1GB pages) are indeed more fragmentation-prone because:

  1. They require physically contiguous memory regions
  2. Long-running systems accumulate small allocations that break up contiguous zones
  3. Transparent Hugepages (THP) can actually exacerbate fragmentation through aggressive merging/splitting

For production systems, consider these approaches:


# Reserve hugepages at boot (in GRUB config)
GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepagesz=1G hugepages=16"

# Disable THP for latency-sensitive apps
echo never > /sys/kernel/mm/transparent_hugepage/enabled

# Implement periodic defrag (cron job)
echo 1 > /proc/sys/vm/compact_memory

This Python script checks fragmentation metrics:


#!/usr/bin/python3
import re

def check_fragmentation():
    with open('/proc/buddyinfo') as f:
        for line in f:
            if 'DMA32' in line:  # Most relevant for x86_64
                chunks = [int(x) for x in re.findall(r'\d+', line.split(':')[1])]
                fragmentation = 1 - (chunks[-1] / sum(chunks))
                print(f"Fragmentation index: {fragmentation:.2%}")
                return fragmentation > 0.25

if __name__ == "__main__":
    if check_fragmentation():
        print("Warning: High fragmentation detected!")

These /proc/sys/vm parameters help manage fragmentation:

  • vm.extfrag_threshold (500-1000 optimal)
  • vm.compact_unevictable_allowed (1 for aggressive compaction)
  • vm.hugepages_treat_as_movable (1 for more flexible hugepage allocation)