When expanding a ZFS striped mirror (RAID10 equivalent) from 2 disks to 4 disks, the existing data remains concentrated on the original mirror pair. This creates an unbalanced workload where the new disks remain underutilized. Unlike traditional RAID systems, ZFS doesn't automatically redistribute existing data when adding disks to a vdev.
Here are three practical approaches to redistribute data evenly across all mirrors:
# Method 1: ZFS Send/Receive
# Create a new pool with desired configuration
zpool create newpool mirror disk3 disk4 mirror disk5 disk6
# Send the data to the new pool
zfs send tank/data@snapshot | zfs receive newpool/data
# Destroy old pool and rename new pool
zpool destroy tank
zpool export newpool
zpool import newpool tank
For more controlled redistribution, consider using ZFS channel programs:
-- Lua script to balance blocks across mirrors
args = ...
pool = args["pool"]
ds = args["dataset"]
zfs.sync.snapshot(pool.."/"..ds.."@balance_start")
for i,block in ipairs(zfs.list.blocks(pool.."/"..ds)) do
if i % 2 == 0 then -- Distribute even blocks to new mirror
zfs.sync.rewrite(pool.."/"..ds, block, "new_mirror")
end
end
Use these commands to verify data distribution:
# Check disk utilization
zpool iostat -v 5
# View physical block distribution
zdb -Pbbb poolname | grep mirror- | awk '{print $1,$4}'
Rebalancing large datasets can impact performance. Consider these best practices:
- Schedule during low-usage periods
- Set appropriate zfs_dirty_data_max
- Use compression to reduce transfer size
- Monitor ARC hit ratio during process
When expanding a ZFS pool with additional mirrored VDEVs (effectively converting from a 2-disk to 4-disk RAID-10 configuration), new writes will automatically distribute across all available mirrors. However, existing data remains concentrated on the original mirror pair until manually redistributed. This creates suboptimal performance where the expanded capacity isn't fully utilized for read operations.
The most effective method involves sending the entire dataset to a new location and restoring it:
# Create a recursive snapshot of the dataset
zfs snapshot -r tank/data@preredistribute
# Send/receive to new location (could be same pool)
zfs send -R tank/data@preredistribute | zfs receive -F tank/newdata
# Verify checksums
zfs list -t snapshot
zfs diff tank/data@preredistribute tank/newdata@preredistribute
# Swap datasets
zfs rename tank/data tank/olddata
zfs rename tank/newdata tank/data
# Cleanup
zfs destroy -r tank/olddata
zfs destroy tank/data@preredistribute
For systems where temporary storage isn't available, consider this online method:
# Create new filesystem with desired properties
zfs create -o recordsize=1M -o compression=lz4 tank/temp
# Use rsync for live migration (preserves permissions)
rsync -avxHAX --progress /tank/data/ /tank/temp/
# Verify data integrity
diff -r /tank/data/ /tank/temp/
# Swap filesystems
mv /tank/data /tank/olddata
mv /tank/temp /tank/data
# Old dataset cleanup
zfs destroy -r tank/olddata
During redistribution operations, monitor system performance with:
zpool iostat -v 5
arcstat.py 1
iostat -xm 5
Key parameters to tune during large redistributions:
zfs set primarycache=metadata tank/data
(reduces ARC impact)zfs set sync=disabled tank/temp
(for temporary datasets)- Adjust
vfs.zfs.send_holes
andvfs.zfs.receive_checksums
sysctls
For frequent rebalancing needs, consider this Python framework (requires ZFS 0.8+):
import subprocess
import logging
from concurrent.futures import ThreadPoolExecutor
def rebalance_dataset(dataset, temp_location="/rebalance"):
"""Smart dataset rebalancer"""
snap_name = f"{dataset.replace('/', '_')}_rebalance"
try:
# Create atomic snapshot
subprocess.run(["zfs", "snapshot", "-r", f"{dataset}@{snap_name}"], check=True)
# Parallel send/receive with mbuffer
send_cmd = f"zfs send -R {dataset}@{snap_name} | mbuffer -q -s 128k -m 1G | "
recv_cmd = f"zfs receive -F -u {temp_location}"
with ThreadPoolExecutor() as executor:
executor.submit(subprocess.run, send_cmd + recv_cmd, shell=True, check=True)
# Verification and atomic swap
verify_integrity(dataset, temp_location)
perform_swap(dataset, temp_location)
except subprocess.CalledProcessError as e:
logging.error(f"Rebalance failed: {e}")
cleanup_failure(dataset, temp_location, snap_name)