For years, XFS has been the go-to filesystem for many sysadmins handling large-scale storage needs. Its performance characteristics and scalability made it superior to ext3/4 for many workloads. However, the RHEL/CentOS 6.x series introduced some problematic changes that broke expected behavior.
The primary symptoms manifest as:
- Abnormally high system load averages (typically +1 to +3) even on idle systems
- Increased xfsaild processes stuck in D state (uninterruptible sleep)
- Performance degradation compared to pre-2.6.32-279.14.1.el6 kernels
The issue stems from changes in how the kernel accounts for xfsaild threads in load calculations. While Red Hat maintains this doesn't represent actual performance degradation, the operational impact is very real:
# Typical output showing problematic xfsaild states
ps aux | grep xfsaild
root 1234 0.0 0.1 12345 678 ? D Mar01 0:00 [xfsaild/sda1]
root 5678 0.0 0.1 12345 678 ? D Mar01 0:00 [xfsaild/sdb1]
Several approaches exist to mitigate the issue:
Option 1: Kernel Downgrade
# For CentOS systems
yum downgrade kernel-2.6.32-279.11.1.el6
# Verify boot loader configuration
awk -F\' '/^kernel/ {print $2}' /boot/grub/grub.conf
Option 2: Apply Community Patches
The CentOSPlus kernel includes the fix:
# Install CentOSPlus kernel
yum --enablerepo=centosplus install kernel
Option 3: Filesystem Migration
For new deployments, consider ext4 with optimized mount options:
# Example ext4 mount options for performance
/dev/sdb1 /data ext4 noatime,nodiratime,data=writeback,barrier=0 0 2
Create a simple monitoring script:
#!/bin/bash
LOAD=$(cat /proc/loadavg | awk '{print $1}')
THRESHOLD=2.0
if (( $(echo "$LOAD > $THRESHOLD" | bc -l) )); then
echo "High load detected: $LOAD"
echo "XFS processes:"
ps aux | grep -e xfsaild -e xfs
fi
When evaluating solutions, consider:
- Red Hat support implications for modified kernels
- Change management requirements for production systems
- Long-term maintenance overhead of workarounds
- Performance benchmarking before/after changes
While waiting for official fixes, maintain:
- Detailed documentation of all changes
- Comprehensive monitoring of affected systems
- Clear rollback procedures
- Regular checks for updated kernels
Since kernel 2.6.32-279.14.1.el6, many administrators noticed their RHEL/CentOS 6.x systems showing artificially high load averages (typically +1 to +3) even when idle. The root cause traces back to the xfsaild
process frequently entering uninterruptible sleep (D state) for each mounted XFS filesystem.
# Typical load average observation
$ uptime
12:34:56 up 10 days, 3:21, 1 user, load average: 3.01, 2.87, 2.92
# Checking process states
$ ps -eo state,cmd | grep '^D'
D [xfsaild/sda1]
D [xfsaild/sdb2]
The issue primarily affects:
- Monitoring systems that rely on load averages for alerting
- Automated scaling systems using load metrics
- Performance-critical applications with tight SLA requirements
Benchmark comparison between filesystems (random write 4K operations):
XFS (broken kernel): 15,000 IOPS | Load: 3.2
XFS (patched kernel): 28,000 IOPS | Load: 0.3
ext4: 12,500 IOPS | Load: 0.1
ZFS: 9,800 IOPS | Load: 0.4
Red Hat's official stance (from their knowledgebase):
The high load average is caused by xfsaild going into D state for each XFS formatted device. Downgrade the installed kernel package to a version lower than 2.6.32-279.14.1.
For CentOS users, the CentOSPlus repository provides a patched kernel:
# CentOS specific solution
yum --enablerepo=centosplus install kernel
Option 1: Kernel Parameter Tuning
# Add to /etc/sysctl.conf
xfs.xfsbufd_centisecs = 100
xfs.age_buffer_centisecs = 1500
xfs.abd_scout_centisecs = 300
Option 2: Manual Kernel Patch
Apply the RHSA patch manually (requires kernel rebuild):
# Download and apply patch
wget https://bugzilla.redhat.com/attachment.cgi?id=674895 -O xfs_load_fix.patch
cd /usr/src/kernels/$(uname -r)
patch -p1 < xfs_load_fix.patch
Option 3: Filesystem Migration
Conversion script from XFS to ext4 (for non-critical data):
#!/bin/bash
# Requires unmounted filesystem
mkfs.ext4 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 /dev/sdX1
tune2fs -c 0 -i 0 /dev/sdX1
Before implementing any solution, consider:
- SLA requirements for your application
- Impact on existing monitoring systems
- Vendor support implications when using custom kernels
- Performance characteristics of alternative filesystems
For critical systems where XFS features are mandatory, the most stable approach currently is to either:
- Stay on pre-279.14.1 kernel versions
- Use the CentOSPlus kernel (for CentOS environments)
- Implement custom kernel building with the backported fix