While both scrubbing and resilvering serve data integrity purposes in ZFS, their operational contexts differ fundamentally. A scrub is a proactive maintenance operation that reads all data in the pool to verify checksums against known good copies. In contrast, resilvering occurs during drive replacement when the system reconstructs data to populate a new drive.
# Basic scrub initiation
zpool scrub tank
# Resilver occurs automatically during replacement
zpool replace tank ata-ST3000DM001-9YN166_S1F0KDGY ata-ST3000DM001-9YN166_S1F0JKRR
During resilvering, ZFS does perform checksum verification but with important limitations:
- Only data written to the new disk is verified
- The process prioritizes speed over thoroughness
- Blocks not allocated in the replacement drive's region aren't checked
In our production environment with 40TB pools, we observed:
Operation | Duration | Checks Verified |
---|---|---|
Scrub | 18 hours | 100% of data |
Resilver | 9 hours | ~30% of data (varies by fragmentation) |
When checksum errors appear during scrubbing:
- Note the affected files using
zpool status -v
- Initiate drive replacement
- Run a full scrub after resilvering completes
# Recommended workflow example:
zpool scrub tank
# If errors found:
zpool offline tank ata-ST3000DM001-9YN166_S1F0KDGY
zpool replace tank ata-ST3000DM001-9YN166_S1F0KDGY ata-ST3000DM001-9YN166_S1F0JKRR
zpool scrub tank
For enterprise deployments, consider implementing these ZFS event monitoring scripts:
#!/bin/bash
# Monitor resilver progress with verification stats
zpool status -v | awk '/resilver/ {print "Resilver progress:", $NF; exit}'
zpool status -v | grep -A 10 "errors:" | grep -v "errors:" | grep -v "^$"
While both ZFS scrub and resilver operations involve data verification, their fundamental purposes differ:
# Scrub process (manual verification) zpool scrub tank # Resilver process (automatic during replacement) zpool replace tank old_drive new_drive
A scrub performs comprehensive checksum validation on all blocks in the pool, while a resilver only verifies checksums for blocks that:
- Belong to the replaced device
- Are actively referenced by the filesystem
- Have write activity during the resilver
Consider this common workflow when errors appear during scrub:
# Scenario: Error detection during scrub zpool status tank scan: scrub in progress, 15% done, 0h12m to go errors: 12 data errors # Recommended procedure zpool scrub -s tank # Stop the scrub zpool offline tank faulty_drive zpool replace tank faulty_drive new_drive
Resilvering typically completes faster than scrubbing because:
- It only processes active data (ignoring free space)
- Operates at higher priority than background scrubs
- Can utilize modern drive's TRIM information
The resilver process will verify checksums for:
# Check resilver verification coverage zpool status -v tank | grep -A 10 "scan: resilver"
However, it won't detect latent errors in:
- Blocks not allocated to the replaced device
- Free space areas
- Metadata not associated with the replaced device
After completing a resilver operation, always schedule a full scrub:
# Complete data integrity workflow zpool replace tank faulty_drive new_drive # Wait for resilver completion zpool wait -t resilver tank # Initiate full verification zpool scrub tank