Proactive XFS Filesystem Health Monitoring in Linux: Tools and Techniques for Live Systems


2 views

After my painful experience with ext3 filesystem corruption, I switched to XFS specifically for its journaling capabilities and better handling of large files. However, unlike ext3's fsck, XFS requires different monitoring approaches since xfs_check demands an unmounted filesystem - a non-starter for production servers.

XFS provides several powerful utilities for live monitoring:


# Check filesystem metadata consistency
xfs_metadump /dev/sda1 | xfs_mdrestore - /tmp/metadump.log

# Analyze allocation groups
xfs_db -r /dev/sda1
xfs_db> agf 0
xfs_db> p

The xfs_stats utility provides crucial metrics:


# Sample xfs_stats output for read/write operations:
xfs_stats -c "read write" /proc/fs/xfs/stat

# Continuous monitoring (refresh every 2 seconds)
watch -n 2 xfs_stats /proc/fs/xfs/stat

For proactive error detection (RHEL/CentOS 7.4+ or newer distros):


# Schedule monthly scrubbing via cron
0 3 1 * * /usr/sbin/xfs_scrub /mountpoint >> /var/log/xfs_scrub.log 2>&1

# Check scrub status
xfs_scrub -v /mountpoint

Combine XFS monitoring with disk health checks:


#!/bin/bash
# Check both disk and XFS health
smartctl -H /dev/sda
xfs_info /mountpoint
xfs_spaceman -df /mountpoint

Here's a Python script I use for comprehensive XFS monitoring:


import subprocess
import time
from datetime import datetime

def check_xfs_health(mountpoint):
    try:
        # Check free space
        df = subprocess.run(['df', '-h', mountpoint], capture_output=True, text=True)
        
        # Check inode usage
        inodes = subprocess.run(['xfs_quota', '-x', '-c', f'report -h {mountpoint}'], 
                              capture_output=True, text=True)
        
        # Log results
        with open('/var/log/xfs_monitor.log', 'a') as f:
            f.write(f"{datetime.now()}\n{df.stdout}\n{inodes.stdout}\n")
            
    except Exception as e:
        print(f"Monitoring error: {str(e)}")

if __name__ == "__main__":
    while True:
        check_xfs_health('/data')
        time.sleep(3600)  # Run hourly

Keep these in your toolkit:


# Check filesystem structure
xfs_repair -n /dev/sda1

# Defragment files (when needed)
xfs_fsr /mountpoint

# Free preallocated space
xfs_freeze -u /mountpoint

After my painful experience with ext3 filesystem corruption, I switched to XFS precisely for its robustness and journaling capabilities. However, I quickly discovered that XFS demands different maintenance practices than traditional Linux filesystems. Unlike ext3/4 where you can run fsck on mounted filesystems, XFS requires more proactive monitoring.

The most critical command for online XFS monitoring is xfs_db, which allows inspection without unmounting:


# Check filesystem metadata consistency
sudo xfs_db -c check -v /dev/sdX

# Verify free space accounting
sudo xfs_db -c "freesp -s" /dev/sdX

# Check for corruption indicators
sudo xfs_db -c "blockget -n" /dev/sdX

For production systems, I recommend setting up regular checks via cron. This script captures critical metrics:


#!/bin/bash
DEVICE="/dev/sdX"
LOG="/var/log/xfs_health.log"

{
  date
  xfs_db -c "freesp -s" $DEVICE
  xfs_quota -x -c "report -h" $DEVICE
  xfs_spaceman -c "df -i" $DEVICE
} >> $LOG 2>&1

Modern Linux kernels (4.9+) support online XFS scrubbing:


# Schedule monthly scrubs
sudo xfs_scrub -v /mount/point

# Check scrub status
sudo xfs_scrub -p /mount/point
  • Rapid growth of metadata blocks (check with xfs_db -c "metadump")
  • Unexpected free space discrepancies
  • Growing number of stale inodes (check with xfs_repair -n)

If you suspect corruption despite monitoring, first try:


# Force a clean unmount if possible
sudo umount -fl /mount/point

# Run repair (WARNING: requires unmounted FS)
sudo xfs_repair -v /dev/sdX