How to Simulate or Trigger Block Device I/O Errors in Linux for Testing Purposes

As developers working with storage systems, filesystems, or fault-tolerant applications, we often need to test how our software behaves when underlying block devices fail. Real hardware failures are unpredictable, making controlled testing challenging. Here are several reliable methods to simulate I/O errors.

The device-mapper's 'flakey' target is perfect for simulating intermittent failures:

# Create a flakey device that fails every 2nd I/O
sudo dmsetup create flakey-device --table "0 102400 flakey /dev/sdb1 0 1 2"

This creates a device that will:

Work normally for the first I/O
Fail the second I/O with EIO
Repeat this pattern

If you're testing kernel modules or low-level systems:

# Enable fault injection for block devices
echo 1 > /sys/kernel/debug/fail_make_request/fail-nth
echo 100 > /sys/kernel/debug/fail_make_request/probability

For application-level testing without root access:

#include 

static int read_error(const char *path, char *buf, size_t size) {
    return -EIO; // Simulate read error
}

static struct fuse_operations ops = {
    .read = read_error,
};

// Mount with: ./errorfs /mnt/fuse -f

For simulating actual media errors:

sudo badblocks -sv -b 4096 -e 1 /dev/sdX

For distributed systems testing, combine dm-flakey with network partitioning:

# On node1:
sudo dmsetup create flaky-disk --table "0 $(blockdev --getsz /dev/sdb) flakey /dev/sdb 0 1 1"

# On node2:
sudo iptables -A INPUT -p tcp --dport 3260 -j DROP

Remember to cleanup test devices after use:

sudo dmsetup remove flakey-device
sudo umount /mnt/fuse

Testing error handling is crucial for robust storage systems. Developers need to verify how their applications behave when disks fail or return errors. Here are common scenarios where you might want to simulate I/O errors:

Filesystem error handling verification
Database recovery testing
Distributed storage system fault tolerance checks
RAID controller failure scenarios

The most reliable method is using Linux's Device Mapper to create a virtual block device that produces errors:

# Create a 1GB error device
dd if=/dev/zero of=error_image bs=1M count=1024
losetup /dev/loop0 error_image

# Set up dm-error
echo "0 2097152 linear /dev/loop0 0
2097152 512 error" | dmsetup create error-dev

This creates a device where the first 2GB works normally, then hits a 512-byte error sector. Any I/O to the error sector will fail.

For more dynamic control, use the kernel's fault injection capabilities:

# Enable fault injection
echo 1 > /sys/block/sdX/make-it-fail

# Configure failure parameters (probability, times, etc.)
echo 10 > /sys/block/sdX/fail_make_request
echo 5 > /sys/block/sdX/fail_nth

For ext2/3/4 filesystems, debugfs can corrupt specific blocks:

debugfs -w /dev/sdX1
debugfs: clri /path/to/file  # Clears inode
debugfs: bd /path/to/file   # Marks block as bad

When testing network storage (iSCSI, NBD), use tc for network errors:

tc qdisc add dev eth0 root netem loss 10% corrupt 5%

When implementing these methods:

Always use separate test machines or VMs
Monitor dmesg for actual error messages
Test both read and write failures
Combine with stress-ng for realistic scenarios

Here's how to test PostgreSQL failure recovery:

# Create error device
dmsetup create pg-error --table "0 1048576 linear /dev/sdb1 0 1048576 error"

# Mount as PostgreSQL data directory
mount /dev/mapper/pg-error /var/lib/postgresql/14/main

# The database will fail when accessing corrupted areas
systemctl start postgresql

ServerDevWorker

How to Simulate or Trigger Block Device I/O Errors in Linux for Testing Purposes

Related Articles