How to Cancel an In-Progress ZFS Pool Disk Replacement: Resolving Stuck Replacing State

When dealing with ZFS storage pools, you might encounter situations where a disk replacement gets stuck in progress. This typically happens when:

The original disk shows SMART errors but isn't completely failed
The replacement disk develops issues (like DMA_WRITE errors) during resilvering
The process keeps restarting at certain percentages

After running zpool scrub -s tank, the scrub operation stops but the disks remain in "replacing" state. This prevents initiating another replacement operation. The pool status might look something like this:

pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered.
action: Wait for the resilver to complete.
scan: resilver in progress since [timestamp]
    10.1G scanned at 42.4M/s, 1.01G issued at 4.24M/s, 10.1G total
    1.01G resilvered, 10.00% done
config:

NAME                     STATE     READ WRITE CKSUM
tank                     DEGRADED     0     0     0
  raidz1-0               DEGRADED     0     0     0
    da0                  ONLINE       0     0     0
    da1                  ONLINE       0     0     0
    replacing-2          DEGRADED     0     0     0
      da2                ONLINE       0     0     0
      da3                ONLINE       0     0     4
    da4                  ONLINE       0     0     0

To properly cancel the replacement and return to the original disk:

First, detach the replacement disk:
```
zpool detach tank da3
```
Then, clear the replacing state:
```
zpool replace -w tank da2
```
Verify the status:
```
zpool status tank
```

If you want to temporarily use a USB disk instead:

# First cancel existing replacement as above
zpool detach tank da3
zpool replace -w tank da2

# Then offline the problematic disk
zpool offline tank da2

# Finally, attach the USB disk
zpool replace tank da2 /dev/daX

Always have good backups before performing storage operations
Monitor SMART status regularly with smartctl -a /dev/daX
Consider setting up email alerts for ZFS events
For production systems, consider using hot spares instead of manual replacements

If the above commands don't work, try exporting and reimporting the pool:

zpool export tank
zpool import tank

For persistent issues, you might need to force the import:

zpool import -f tank

When a ZFS disk replacement gets stuck mid-process (especially due to hardware errors), you might encounter a situation where disks remain permanently marked as "replacing" in zpool status:

# zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered...
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Oct 23 14:32:46 2023
        10.1G scanned at 12.4M/s, 1.01G issued at 1.24M/s, 10.1G total
        0 resilvered, 10.00% done

config:

    NAME                      STATE     READ WRITE CKSUM
    tank                      DEGRADED     0     0     0
      raidz1-0                DEGRADED     0     0     0
        replacing-0           OFFLINE      0     0     0
          ada1                OFFLINE      0     0     0  (DMA_WRITE errors)
          ada2                ONLINE       0     0     0  (resilvering, DMA_WRITE errors)
        ada3                  ONLINE       0     0     0
        ada4                  ONLINE       0     0     0
        ada5                  ONLINE       0     0     0

The typical zpool scrub -s only stops the resilvering process but doesn't clear the replacement state. Attempting to detach either disk results in:

# zpool detach tank ada1
cannot detach ada1: no valid replicas

Here's the step-by-step solution for FreeBSD:

# First export the pool (ensure no active operations)
zpool export tank

# Import with the original disk only, forcing reversion
zpool import -d /dev/ tank -f -F

# Verify the original disk is back in normal state
zpool status tank

# Now safely attach your USB temporary disk
zpool attach tank ada1 da0

If the above doesn't work, edit the ZFS config cache:

# Locate the cache file (FreeBSD specific)
find / -name "*.cache" -exec grep -l "tank" {} \;

# Edit with vi/nano to remove replacing entries
nano /boot/zfs/zpool.cache

Look for lines containing replacing- and remove the entire device subtree.

When dealing with problematic disks:

Always pre-test replacement disks: badblocks -ws /dev/da0
Use -o replace=on for safer replacements: zpool replace -o replace=on tank ada1 ada2
Monitor with: zpool status -v tank & smartctl -a /dev/ada1

Remember that forced operations may require a reboot to fully clear kernel device states.

ServerDevWorker

How to Cancel an In-Progress ZFS Pool Disk Replacement: Resolving Stuck Replacing State

Related Articles