Engineering Controlled XFS Corruption: Reliable Methods for Testing xfs_repair on Multi-Terabyte Filesystems


4 views

When dealing with 50TB XFS filesystems, we hit a critical threshold where memory requirements for xfs_repair become non-trivial. Historical benchmarks show:

// Memory estimation formula from XFS source analysis
required_ram = (fs_size_in_tb * 2GB) + (inode_count / 1e6 * 150MB);

This means our 50TB test case could demand 100GB+ RAM during repair operations - a perfect scenario for controlled corruption testing.

For reproducible corruption scenarios, consider these methods:

# Method 1: Direct superblock manipulation
dd if=/dev/zero of=/dev/sdX bs=512 count=1 seek=32 conv=notrunc

# Method 2: Inode table corruption
debugfs -w /dev/sdX -R "modify_inode <INODE_NUMBER> i_mode 0"

# Method 3: Journal sabotage
xfs_db -x /dev/sdX << EOF
sb
write uuid 00000000-0000-0000-0000-000000000000
EOF

For systematic testing, I've developed this Python snippet that creates controlled damage patterns:

import os
import struct

def corrupt_xfs(device, pattern):
    with open(device, 'r+b') as f:
        # Superblock corruption
        if pattern == 'SB':
            f.seek(32 * 512)  # Primary SB offset
            f.write(b'\x00' * 512)
        
        # AG Free space corruption
        elif pattern == 'AGF':
            f.seek(64 * 512)
            f.write(struct.pack('>Q', 0xFFFFFFFFFFFFFFFF))
            
        # Inode corruption
        elif pattern == 'INODE':
            f.seek(1024 * 512)
            f.write(b'\xDE\xAD\xBE\xEF')

Based on production failures, these are the most valuable test scenarios:

  • Partial journal writes during power failure (simulate with dd if=/dev/urandom of=journal bs=4k count=1 seek=1024)
  • AG header mismatches (use xfs_db to modify allocation group counters)
  • Cross-linked inodes (manipulate di_next_unlinked fields)

After creating corruption, verify it's detectable but repairable:

# Check corruption detection
xfs_repair -n /dev/sdX

# Measure repair resources
/usr/bin/time -v xfs_repair /dev/sdX

The output should show both the corruption detection and memory usage patterns we need to validate.


When testing filesystem repair tools like xfs_repair on massive (50TB+) XFS filesystems, creating predictable corruption scenarios is crucial for:

  • Validating repair effectiveness at scale
  • Benchmarking memory requirements (historically ~2GB/TB)
  • Reproducing edge cases for development

Here are three technically sound approaches I've used in production testing environments:

1. Direct Superblock Manipulation

The most reliable method for consistent corruption:

# Identify primary superblock location
xfs_db -c "sb 0" -c "print" /dev/sdX | grep sbblkno

# Backup original (REQUIRED)
dd if=/dev/sdX of=superblock.bak bs=4096 count=1 seek=[sbblkno]

# Corrupt with zeroes
dd if=/dev/zero of=/dev/sdX bs=4096 count=1 seek=[sbblkno] conv=notrunc

2. Inode Table Damage Simulation

Simulates common corruption patterns found in large filesystems:

# Find inode cluster location
xfs_db -c "inode 1" -c "print core.fsbno" /dev/sdX

# Corrupt random inode ranges (adjust count for severity)
dd if=/dev/urandom of=/dev/sdX bs=4096 count=1000 seek=[fsbno] conv=notrunc

3. Journal Corruption

Creates recoverable but complex corruption scenarios:

# Locate journal (typically block 0 for XFS)
xfs_db -c "log" -c "print" /dev/sdX

# Partial journal overwrite (preserves some recovery data)
echo "CORRUPTED_JOURNAL_HEADER" | dd of=/dev/sdX bs=512 count=1 seek=0 conv=notrunc

Based on historical data from 29TB repairs requiring 75GB RAM+swap:

Filesystem Size Estimated RAM Required Inode Factor
10TB 20GB +100MB/million inodes
50TB 100GB +500MB/million inodes
100TB 200GB+ +1GB/million inodes

After creating controlled corruption:

# Check corruption detection
xfs_repair -n /dev/sdX

# Force repair with memory limits
XFS_REPAIR_MEMORY_LIMIT=80G xfs_repair /dev/sdX

# Verify structure
xfs_check /dev/sdX
  • Always work on unmounted filesystems
  • Maintain complete block device backups (dd or LVM snapshots)
  • Document exact corruption coordinates for reproducibility
  • Consider using test partitions before production systems