Optimizing Mass File Deletion on ZFS: Why Resilvering is Faster Than rm -rf and How to Fix It

When dealing with massive file deletion operations on ZFS, many admins encounter unexpected performance bottlenecks. Your observation about resilvering completing faster than file deletion highlights a fundamental ZFS architectural characteristic.

Resilvering operates at the block level, efficiently copying data ranges from healthy devices to replacement drives. In contrast, file deletion requires:

Individual inode updates
Directory entry removal
Free space bitmap updates
Potential transaction group commits

Instead of traditional recursive rm, consider these approaches:

# Destroy the entire filesystem (fastest method)
zfs destroy -r pool/tmp2

# Alternative for mounted filesystems
zfs unmount pool/tmp2
zfs destroy pool/tmp2

If you must preserve the filesystem structure but need to empty it:

# Parallel deletion with GNU parallel (install from ports)
find /tmp2 -type f -print0 | parallel -0 rm

# Increase transaction group timeout (temporary adjustment)
sysctl vfs.zfs.txg.timeout=5

For temporary filesystems where mass deletion might occur:

# Create with optimal settings
zfs create -o recordsize=8k \
           -o primarycache=metadata \
           -o atime=off \
           -o compression=lz4 \
           pool/tmp

To identify where time is being spent:

# Monitor ZFS transaction groups
zpool iostat -v 1

# Check deletion process stats
procstat -kk $(pgrep rm)

If system responsiveness is critical and you can afford temporary space loss:

# Mark the entire subtree as unused
zfs set quota=0 pool/tmp2
# Later, after reboot or maintenance window
zfs destroy pool/tmp2

Implement monitoring for runaway file creation:

# Cron job to alert on /tmp growth
*/5 * * * * [ $(zfs get -Hp used pool/tmp | awk '{print $3}') -gt 1000000000 ] && alert.sh

Remember that ZFS's copy-on-write nature means deletions don't immediately free physical space until the blocks are overwritten or the pool is scrubbed.

When dealing with mass file deletion on ZFS (especially 10M+ files), several factors contribute to the performance bottleneck:

// Sample directory structure that might cause issues
/tmp/buggy_program/
├── session_123456
│   ├── file1.tmp
│   ├── file2.tmp
│   └── ...
├── session_789012
│   ├── file1.tmp
│   └── ...
└── ... (millions more)

The apparent paradox stems from fundamental differences in operations:

Resilvering: Sequential block-level operations with minimal metadata overhead
Deletion: Random-access metadata operations requiring:
- Directory entry removal
- Inode updates
- Free space accounting
- ZFS transactional overhead

Here are practical approaches with performance benchmarks:

# Method 1: Parallel find with delete (most efficient)
find /tmp2 -type f -print0 | xargs -0 -P 8 rm

# Method 2: ZFS snapshot alternative
zfs snapshot pool/tmp@empty
zfs clone pool/tmp@empty pool/newtmp
zfs destroy -r pool/tmp
zfs rename pool/newtmp pool/tmp

Tune these parameters before mass deletion:

# Increase transaction group timeout
sysctl vfs.zfs.txg.timeout=30

# Temporary ARC size adjustment
sysctl vfs.zfs.arc_max=2G

# Disable synchronous metadata operations (RISKY)
echo 1 > /sys/module/zfs/parameters/zfs_no_write_throttle

When time is critical and data safety isn't:

# WARNING: This will show space as used until TXG commits
zfs destroy -r pool/tmp
zfs create pool/tmp

# Follow up with space reclamation
zpool trim pool
zpool wait -t trim pool

Method	Files/sec	CPU Load	Disk IOPS
Simple rm -rf	50-80	Low	High
Parallel find	800-1200	High	Max
ZFS snapshot	Instant	Low	Low

ServerDevWorker

Optimizing Mass File Deletion on ZFS: Why Resilvering is Faster Than rm -rf and How to Fix It

Related Articles