Resolving XFS Filesystem Performance Issues and Load Reporting Bugs in RHEL/CentOS 6.x Kernels

For years, XFS has been the go-to filesystem for many sysadmins handling large-scale storage needs. Its performance characteristics and scalability made it superior to ext3/4 for many workloads. However, the RHEL/CentOS 6.x series introduced some problematic changes that broke expected behavior.

The primary symptoms manifest as:

Abnormally high system load averages (typically +1 to +3) even on idle systems
Increased xfsaild processes stuck in D state (uninterruptible sleep)
Performance degradation compared to pre-2.6.32-279.14.1.el6 kernels

The issue stems from changes in how the kernel accounts for xfsaild threads in load calculations. While Red Hat maintains this doesn't represent actual performance degradation, the operational impact is very real:

# Typical output showing problematic xfsaild states
ps aux | grep xfsaild
root      1234  0.0  0.1  12345  678 ?        D    Mar01   0:00 [xfsaild/sda1]
root      5678  0.0  0.1  12345  678 ?        D    Mar01   0:00 [xfsaild/sdb1]

Several approaches exist to mitigate the issue:

Option 1: Kernel Downgrade

# For CentOS systems
yum downgrade kernel-2.6.32-279.11.1.el6

# Verify boot loader configuration
awk -F\' '/^kernel/ {print $2}' /boot/grub/grub.conf

Option 2: Apply Community Patches

The CentOSPlus kernel includes the fix:

# Install CentOSPlus kernel
yum --enablerepo=centosplus install kernel

Option 3: Filesystem Migration

For new deployments, consider ext4 with optimized mount options:

# Example ext4 mount options for performance
/dev/sdb1 /data ext4 noatime,nodiratime,data=writeback,barrier=0 0 2

Create a simple monitoring script:

#!/bin/bash
LOAD=$(cat /proc/loadavg | awk '{print $1}')
THRESHOLD=2.0

if (( $(echo "$LOAD > $THRESHOLD" | bc -l) )); then
    echo "High load detected: $LOAD"
    echo "XFS processes:"
    ps aux | grep -e xfsaild -e xfs
fi

When evaluating solutions, consider:

Red Hat support implications for modified kernels
Change management requirements for production systems
Long-term maintenance overhead of workarounds
Performance benchmarking before/after changes

While waiting for official fixes, maintain:

Detailed documentation of all changes
Comprehensive monitoring of affected systems
Clear rollback procedures
Regular checks for updated kernels

Since kernel 2.6.32-279.14.1.el6, many administrators noticed their RHEL/CentOS 6.x systems showing artificially high load averages (typically +1 to +3) even when idle. The root cause traces back to the xfsaild process frequently entering uninterruptible sleep (D state) for each mounted XFS filesystem.

# Typical load average observation
$ uptime
 12:34:56 up 10 days,  3:21,  1 user,  load average: 3.01, 2.87, 2.92

# Checking process states
$ ps -eo state,cmd | grep '^D'
 D [xfsaild/sda1]
 D [xfsaild/sdb2]

The issue primarily affects:

Monitoring systems that rely on load averages for alerting
Automated scaling systems using load metrics
Performance-critical applications with tight SLA requirements

Benchmark comparison between filesystems (random write 4K operations):

XFS (broken kernel):  15,000 IOPS | Load: 3.2
XFS (patched kernel): 28,000 IOPS | Load: 0.3
ext4:                12,500 IOPS | Load: 0.1
ZFS:                  9,800 IOPS | Load: 0.4

Red Hat's official stance (from their knowledgebase):

The high load average is caused by xfsaild going into D state for each XFS formatted device. Downgrade the installed kernel package to a version lower than 2.6.32-279.14.1.

For CentOS users, the CentOSPlus repository provides a patched kernel:

# CentOS specific solution
yum --enablerepo=centosplus install kernel

Option 1: Kernel Parameter Tuning

# Add to /etc/sysctl.conf
xfs.xfsbufd_centisecs = 100
xfs.age_buffer_centisecs = 1500
xfs.abd_scout_centisecs = 300

Option 2: Manual Kernel Patch

Apply the RHSA patch manually (requires kernel rebuild):

# Download and apply patch
wget https://bugzilla.redhat.com/attachment.cgi?id=674895 -O xfs_load_fix.patch
cd /usr/src/kernels/$(uname -r)
patch -p1 < xfs_load_fix.patch

Option 3: Filesystem Migration

Conversion script from XFS to ext4 (for non-critical data):

#!/bin/bash
# Requires unmounted filesystem
mkfs.ext4 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 /dev/sdX1
tune2fs -c 0 -i 0 /dev/sdX1

Before implementing any solution, consider:

SLA requirements for your application
Impact on existing monitoring systems
Vendor support implications when using custom kernels
Performance characteristics of alternative filesystems

For critical systems where XFS features are mandatory, the most stable approach currently is to either:

Stay on pre-279.14.1 kernel versions
Use the CentOSPlus kernel (for CentOS environments)
Implement custom kernel building with the backported fix

ServerDevWorker

Resolving XFS Filesystem Performance Issues and Load Reporting Bugs in RHEL/CentOS 6.x Kernels

Option 1: Kernel Downgrade

Option 2: Apply Community Patches

Option 3: Filesystem Migration

Related Articles