When dealing with ZFS storage pools, you might encounter situations where a disk replacement gets stuck in progress. This typically happens when:
- The original disk shows SMART errors but isn't completely failed
- The replacement disk develops issues (like DMA_WRITE errors) during resilvering
- The process keeps restarting at certain percentages
After running zpool scrub -s tank
, the scrub operation stops but the disks remain in "replacing" state. This prevents initiating another replacement operation. The pool status might look something like this:
pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered.
action: Wait for the resilver to complete.
scan: resilver in progress since [timestamp]
10.1G scanned at 42.4M/s, 1.01G issued at 4.24M/s, 10.1G total
1.01G resilvered, 10.00% done
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
replacing-2 DEGRADED 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 4
da4 ONLINE 0 0 0
To properly cancel the replacement and return to the original disk:
- First, detach the replacement disk:
zpool detach tank da3
- Then, clear the replacing state:
zpool replace -w tank da2
- Verify the status:
zpool status tank
If you want to temporarily use a USB disk instead:
# First cancel existing replacement as above
zpool detach tank da3
zpool replace -w tank da2
# Then offline the problematic disk
zpool offline tank da2
# Finally, attach the USB disk
zpool replace tank da2 /dev/daX
- Always have good backups before performing storage operations
- Monitor SMART status regularly with
smartctl -a /dev/daX
- Consider setting up email alerts for ZFS events
- For production systems, consider using hot spares instead of manual replacements
If the above commands don't work, try exporting and reimporting the pool:
zpool export tank
zpool import tank
For persistent issues, you might need to force the import:
zpool import -f tank
When a ZFS disk replacement gets stuck mid-process (especially due to hardware errors), you might encounter a situation where disks remain permanently marked as "replacing" in zpool status
:
# zpool status tank
pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered...
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Oct 23 14:32:46 2023
10.1G scanned at 12.4M/s, 1.01G issued at 1.24M/s, 10.1G total
0 resilvered, 10.00% done
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-0 OFFLINE 0 0 0
ada1 OFFLINE 0 0 0 (DMA_WRITE errors)
ada2 ONLINE 0 0 0 (resilvering, DMA_WRITE errors)
ada3 ONLINE 0 0 0
ada4 ONLINE 0 0 0
ada5 ONLINE 0 0 0
The typical zpool scrub -s
only stops the resilvering process but doesn't clear the replacement state. Attempting to detach either disk results in:
# zpool detach tank ada1
cannot detach ada1: no valid replicas
Here's the step-by-step solution for FreeBSD:
# First export the pool (ensure no active operations)
zpool export tank
# Import with the original disk only, forcing reversion
zpool import -d /dev/ tank -f -F
# Verify the original disk is back in normal state
zpool status tank
# Now safely attach your USB temporary disk
zpool attach tank ada1 da0
If the above doesn't work, edit the ZFS config cache:
# Locate the cache file (FreeBSD specific)
find / -name "*.cache" -exec grep -l "tank" {} \;
# Edit with vi/nano to remove replacing entries
nano /boot/zfs/zpool.cache
Look for lines containing replacing-
and remove the entire device subtree.
When dealing with problematic disks:
- Always pre-test replacement disks:
badblocks -ws /dev/da0
- Use
-o replace=on
for safer replacements:zpool replace -o replace=on tank ada1 ada2
- Monitor with:
zpool status -v tank & smartctl -a /dev/ada1
Remember that forced operations may require a reboot to fully clear kernel device states.