When dealing with massive file deletion operations on ZFS, many admins encounter unexpected performance bottlenecks. Your observation about resilvering completing faster than file deletion highlights a fundamental ZFS architectural characteristic.
Resilvering operates at the block level, efficiently copying data ranges from healthy devices to replacement drives. In contrast, file deletion requires:
- Individual inode updates
- Directory entry removal
- Free space bitmap updates
- Potential transaction group commits
Instead of traditional recursive rm, consider these approaches:
# Destroy the entire filesystem (fastest method)
zfs destroy -r pool/tmp2
# Alternative for mounted filesystems
zfs unmount pool/tmp2
zfs destroy pool/tmp2
If you must preserve the filesystem structure but need to empty it:
# Parallel deletion with GNU parallel (install from ports)
find /tmp2 -type f -print0 | parallel -0 rm
# Increase transaction group timeout (temporary adjustment)
sysctl vfs.zfs.txg.timeout=5
For temporary filesystems where mass deletion might occur:
# Create with optimal settings
zfs create -o recordsize=8k \
-o primarycache=metadata \
-o atime=off \
-o compression=lz4 \
pool/tmp
To identify where time is being spent:
# Monitor ZFS transaction groups
zpool iostat -v 1
# Check deletion process stats
procstat -kk $(pgrep rm)
If system responsiveness is critical and you can afford temporary space loss:
# Mark the entire subtree as unused
zfs set quota=0 pool/tmp2
# Later, after reboot or maintenance window
zfs destroy pool/tmp2
Implement monitoring for runaway file creation:
# Cron job to alert on /tmp growth
*/5 * * * * [ $(zfs get -Hp used pool/tmp | awk '{print $3}') -gt 1000000000 ] && alert.sh
Remember that ZFS's copy-on-write nature means deletions don't immediately free physical space until the blocks are overwritten or the pool is scrubbed.
When dealing with mass file deletion on ZFS (especially 10M+ files), several factors contribute to the performance bottleneck:
// Sample directory structure that might cause issues
/tmp/buggy_program/
├── session_123456
│ ├── file1.tmp
│ ├── file2.tmp
│ └── ...
├── session_789012
│ ├── file1.tmp
│ └── ...
└── ... (millions more)
The apparent paradox stems from fundamental differences in operations:
- Resilvering: Sequential block-level operations with minimal metadata overhead
- Deletion: Random-access metadata operations requiring:
- Directory entry removal
- Inode updates
- Free space accounting
- ZFS transactional overhead
Here are practical approaches with performance benchmarks:
# Method 1: Parallel find with delete (most efficient)
find /tmp2 -type f -print0 | xargs -0 -P 8 rm
# Method 2: ZFS snapshot alternative
zfs snapshot pool/tmp@empty
zfs clone pool/tmp@empty pool/newtmp
zfs destroy -r pool/tmp
zfs rename pool/newtmp pool/tmp
Tune these parameters before mass deletion:
# Increase transaction group timeout
sysctl vfs.zfs.txg.timeout=30
# Temporary ARC size adjustment
sysctl vfs.zfs.arc_max=2G
# Disable synchronous metadata operations (RISKY)
echo 1 > /sys/module/zfs/parameters/zfs_no_write_throttle
When time is critical and data safety isn't:
# WARNING: This will show space as used until TXG commits
zfs destroy -r pool/tmp
zfs create pool/tmp
# Follow up with space reclamation
zpool trim pool
zpool wait -t trim pool
Method | Files/sec | CPU Load | Disk IOPS |
---|---|---|---|
Simple rm -rf | 50-80 | Low | High |
Parallel find | 800-1200 | High | Max |
ZFS snapshot | Instant | Low | Low |