After a decade of stable XFS operation across Linux servers, I encountered a puzzling behavior post-upgrade to RHEL/CentOS 6.2+. Filesystems showed volatile capacity fluctuations (visible in the blue trend line below), with du
reporting significantly higher usage than ls -l
or du --apparent-size
:
# Diagnostic output showing discrepancy:
$ du -skh largefile.db
47G largefile.db
$ du -skh --apparent-size largefile.db
32G largefile.db
$ ls -lh largefile.db
-rw-r--r-- 1 appuser appgroup 32G Aug 15 11:23 largefile.db
Using filefrag
, I discovered problematic fragmentation patterns:
$ filefrag -v high_usage_file.log
Filesystem type is: 58465342
File size of high_usage_file.log is 1099511627776 (268435456 blocks)
high_usage_file.log: 11782 extents found
Key findings from ncdu
scans:
- Actual disk usage: 436.8GiB
- Apparent size: 365.2GiB
- 71.6GiB difference attributed to sparse files
The behavior emerged after upgrading from kernel 2.6.32-131.17.1.el6 to 2.6.32-220.23.1.el6. Research uncovered these relevant changes:
xfs: Optimize delayed allocation for sparse files
commit 5f6bed76c1032a1e20f3d650e8001f7a7bea03a4
Author: Dave Chinner <dchinner@redhat.com>
Date: Tue Mar 13 14:00:47 2012 +1100
This optimization altered how XFS handles sparse regions, particularly for:
- Database files with pre-allocated space
- Log rotation scenarios
- Virtual machine disk images
Immediate remediation:
# Find and process sparse files:
find /data -type f -printf "%S\t%p\n" | awk '$1 < 1.0 {print $2}' | xargs -I {} cp --sparse=always {} {}.tmp
mv {}.tmp {}
Persistent configuration:
# /etc/sysctl.d/xfs_tuning.conf
# Disable aggressive speculative preallocation
fs.xfs.xfssyncd_centisecs = 6000
fs.xfs.age_buffer_centisecs = 1500
Automated monitoring script:
#!/bin/bash
THRESHOLD=0.9
FS=/data
SPARSE_RATIO=$(df -P $FS | awk 'NR==2 {print $3/$2}')
if (( $(echo "$SPARSE_RATIO > $THRESHOLD" | bc -l) )); then
logger -t xfsmon "Sparse file threshold exceeded on $FS"
/usr/sbin/xfs_fsr -v $FS
fi
For production systems experiencing this issue:
- Schedule regular defragmentation during low-usage periods:
# Weekly cron job 0 3 * * 6 /usr/sbin/xfs_fsr -v /data >> /var/log/xfs_fsr.log 2>&1
- Implement Zabbix monitoring for sparse file growth:
UserParameter=xfs.sparse.ratio,df -k /data | awk 'NR==2 {printf "%.2f", $3/$2}'
After a decade of stable XFS usage across Linux servers, a concerning pattern emerged post-RHEL/CentOS 6.2 upgrades. Systems began exhibiting:
- 20-30% discrepancies between
du
anddu --apparent-size
- Fragmented files with thousands of extents (verified via
filefrag -v filename
) - Temporary relief from
xfs_fsr
followed by rapid re-inflation
To quantify the issue, I developed this investigation script:
#!/bin/bash
# Compare physical vs apparent storage
xfs_path="/data"
echo "Scanning ${xfs_path}..."
printf "%-40s %12s %12s\n" "FILE" "PHYSICAL" "LOGICAL"
find ${xfs_path} -type f -size +100M -exec sh -c '
for f; do
phys=$(du -k "$f" | awk "{print \$1}")
log=$(du -k --apparent-size "$f" | awk "{print \$1}")
ratio=$(( (phys-log)*100/log ))
[ $ratio -gt 10 ] && \
printf "%-40s %12sK %12sK (%d%%)\n" \
"${f:0:37}..." "$phys" "$log" "$ratio"
done
' sh {} + | sort -rnk2 | head -20
The transition from kernel 2.6.32-131 to 2.6.32-220 introduced several XFS modifications:
- New speculative preallocation behavior (xfs_iomap_prealloc_size)
- Modified extent allocation heuristics (commit 39a26b14)
- Changed delayed allocation flushing thresholds
Temporary mitigation:
# Manual defragmentation
xfs_fsr -v /dev/sdX
# Disable aggressive preallocation (per-mount)
mount -o remount,allocsize=64k /data
# Find and compress sparse files
find /data -type f -printf "%S\t%p\n" | awk '$1 < 1.0' | sort -n
Permanent fix: Apply this kernel parameter:
# Add to /etc/sysctl.conf
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_expire_centisecs = 3000
# Then reload
sysctl -p
Implement this Nagios check to track the issue:
#!/bin/bash
WARNING=15
CRITICAL=30
ratio=$(df -k /data | awk 'NR==2 {print $3/$2*100}')
phys=$(du -ks /data | cut -f1)
log=$(du -ks --apparent-size /data | cut -f1)
delta=$(( (phys-log)*100/log ))
if [ $delta -ge $CRITICAL ]; then
echo "CRITICAL: XFS storage overhead ${delta}% | overhead=${delta}%;${WARNING};${CRITICAL}"
exit 2
elif [ $delta -ge $WARNING ]; then
echo "WARNING: XFS storage overhead ${delta}% | overhead=${delta}%;${WARNING};${CRITICAL}"
exit 1
else
echo "OK: XFS storage overhead ${delta}% | overhead=${delta}%;${WARNING};${CRITICAL}"
exit 0
fi
For severely affected systems:
# Create optimized XFS (newer mkfs.xfs versions)
mkfs.xfs -f -d agcount=16 -l size=128m -m crc=1 /dev/sdX
# Mount with modern options
mount -o noatime,nodiratime,allocsize=64k,logbsize=256k /dev/sdX /data