XFS Filesystem Corruption Recovery: Proper xfs_repair Usage After Power Failure


1 views

This is a classic scenario after abrupt power loss where XFS metadata needs journal recovery. The key misunderstanding is that xfs_repair isn't failing - it's telling you exactly what it needs:

# Typical error message:
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.

Here's the full workflow when encountering this state:

  1. Boot from rescue media (confirm your version supports XFS)
  2. Attempt initial repair:
    xfs_repair /dev/sdX
  3. When prompted about the log:
    mount /dev/sdX /mnt
    umount /mnt
  4. Retry repair:
    xfs_repair /dev/sdX

If mounting fails or you suspect deeper corruption:

Option 1: Force log clearing (last resort)

xfs_repair -L /dev/sdX

Warning: This destroys uncommitted transactions

Option 2: Check hardware first

smartctl -a /dev/sdX
badblocks -sv /dev/sdX

For critical systems, consider adding to /etc/default/grub:

GRUB_CMDLINE_LINUX="... xfs_nocrc=0"

And ensure proper journaling:

mkfs.xfs -l size=64m /dev/sdX

Here's how we recovered a production server last quarter:

# After failed xfs_repair attempts
dmesg | grep XFS # Showed journal errors
mount -o ro,norecovery /dev/sdc1 /mnt # Read-only attempt
xfs_db -c "check" /dev/sdc1 # Metadata check
xfs_repair -v /dev/sdc1 # Final repair

When an XFS filesystem becomes corrupt due to power failure, you're essentially dealing with incomplete journal transactions. The filesystem's metadata remains in an inconsistent state until the journal is properly replayed. Here's what's happening in your scenario:

# Typical error message you'll encounter
xfs_repair: cannot repair a mounted filesystem
xfs_repair: please unmount the filesystem and try again
xfs_repair: hint: mount the filesystem to replay the log first

Here's the proper procedure when xfs_repair asks you to mount the filesystem:

# Step 1: Mount with norecovery option (if needed)
mount -o norecovery /dev/sdX /mnt/repair

# Step 2: Then immediately unmount
umount /mnt/repair

# Step 3: Run repair with special flags
xfs_repair -v -L /dev/sdX

For particularly stubborn cases, you might need to use these additional techniques:

# Force zeroing of the log (use with caution!)
xfs_repair -L /dev/sdX

# For metadata corruption
xfs_repair -m /dev/sdX

# Reconstruct the secondary superblock
xfs_repair -S /dev/sdX

To avoid future issues, consider implementing these safeguards:

# Add these options to /etc/fstab for critical partitions
defaults,logbsize=256k,logbufs=8,noatime,nobarrier

# Regular filesystem checks
xfs_db -c "check" /dev/sdX

# Journal size optimization (for large filesystems)
mkfs.xfs -l size=1024m /dev/sdX

If repair attempts consistently fail, try these last-resort options:

# Extract data using xfs_copy
xfs_copy /dev/sdX /mnt/rescue/image.xfs

# Attempt repair on the copy
xfs_repair -v /mnt/rescue/image.xfs

# Mount read-only for data recovery
mount -o ro,norecovery /dev/sdX /mnt/recovery