Technical Deep Dive: Why Filesystem Read-Only Checks (fsck -n) Fail on Mounted Partitions


2 views

When I first ran fsck -n /dev/sda1 on a mounted ext4 partition during a late-night debugging session, the output shocked me:

fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
/dev/sda1 is mounted.

WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage.

This seems counterintuitive - if we're just reading the filesystem, why the apocalyptic warning?

Even read-only operations can trigger catastrophic failures because:

  1. Metadata Cache Inconsistency: The kernel maintains cached versions of inodes, directory entries, and journal state that may differ from disk
  2. Journaling Race Conditions: An active journal might be in mid-operation when fsck scans the disk
  3. Writeback Mechanisms: Some filesystems (like XFS) may initiate writebacks during read operations

Let's demonstrate with a dangerous (don't try this in production) experiment:

# Create test environment
dd if=/dev/zero of=testfs.img bs=1M count=100
mkfs.ext4 testfs.img
mkdir -p /mnt/test
mount -o loop testfs.img /mnt/test

# Simulate concurrent access
(
    while true; do
        echo "test" > /mnt/test/tempfile
        rm /mnt/test/tempfile
    done
) &

Now attempt a read-only check:

fsck -n testfs.img

You'll often see false corruption reports or even kernel panics.

For production systems needing live checks:

  • XFS: Use xfs_db -c check which understands mounted state
  • Btrfs: btrfs check --readonly works while mounted
  • ZFS: Built-in scrubbing handles online verification

For traditional filesystems, the only safe approaches are:

# Method 1: Remount read-only
mount -o remount,ro /dev/sda1
fsck /dev/sda1
mount -o remount,rw /dev/sda1

# Method 2: Use snapshot technology
lvcreate -s -n rootsnap -L 1G /dev/vg00/root
fsck /dev/vg00/rootsnap
lvremove /dev/vg00/rootsnap

For emergency debugging, consider these least-dangerous options:

# 1. Dump critical metadata first
debugfs -R "stats" /dev/sda1 > fs_metadata_backup.txt

# 2. Use filesystem-specific safe queries
tune2fs -l /dev/sda1 | grep -i "state\|errors"

# 3. Check only specific suspicious inodes
debugfs -R "check_inode <inode_number>" /dev/sda1

Running fsck on a mounted partition is like performing open-heart surgery on a running engine. The fundamental problem lies in the filesystem's state being dynamic during operation. Even read-only checks can produce false positives because:


# Example of dangerous fsck attempt
$ sudo fsck /dev/sda1
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
/dev/sda1 is mounted.
WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage.

At first glance, read-only operations might seem harmless, but consider these technical realities:

  • Metadata caching: The kernel maintains in-memory copies of critical structures like inodes and directory entries
  • Atomic operations: Filesystem transactions may be in mid-flight during the check
  • Journal replay: Journaling filesystems like ext4 maintain pending transactions

When you absolutely need to check a mounted filesystem, consider these safer alternatives:


# 1. Use the filesystem's built-in checking capability
$ sudo btrfs filesystem usage /mnt/point

# 2. Leverage lsof to identify active files
$ sudo lsof +f -- /dev/sda1 | awk '{print $1}' | sort | uniq

# 3. For XFS (read-only check)
$ sudo xfs_db -c 'check' /dev/sda1

For comprehensive checking, follow this bulletproof sequence:


# 1. Unmount cleanly
$ sudo umount /dev/sda1

# 2. Run fsck in interactive mode
$ sudo fsck -y /dev/sda1

# 3. Alternative: Force check on next reboot
$ sudo touch /forcefsck
$ sudo shutdown -r now

Different filesystems handle live checking differently:

Filesystem Live Check Support Recommended Command
ext4 No umount first
XFS Limited xfs_repair -n
Btrfs Yes btrfs scrub
ZFS Yes zpool scrub

Remember that even "safe" live checks can't replace proper offline verification for critical repairs.