Optimizing ZFS Replication vs Rsync for Large Offsite Backups Over Low-Bandwidth WAN Connections


4 views

When dealing with 15-60GB weekly photography transfers over a 700Kb/s uplink, every byte counts. The receiving end's 30Mb/s download helps, but the real bottleneck is the asymmetric DSL connection. This immediately rules out full dataset transfers and demands incremental solutions.

# Example incremental ZFS send with resume capability
SOURCE_POOL="tank/photos"
DEST_HOST="backup-nas"
LAST_SNAP=$(ssh $DEST_HOST "zfs list -t snapshot -o name -s creation | grep $SOURCE_POOL | tail -1")

# New snapshot creation
zfs snapshot $SOURCE_POOL@$(date +%Y%m%d_%H%M)

# Resume-capable send (FreeBSD specific)
zfs send -v -I $LAST_SNAP $SOURCE_POOL@$(date +%Y%m%d_%H%M) | \
  mbuffer -q -s 128k -m 1G | \
  ssh -C $DEST_HOST "mbuffer -q -s 128k -m 1G | zfs receive -Fvu $SOURCE_POOL"

Key advantages:

  • Block-level deduplication during transfer
  • Atomic snapshot integrity
  • mbuffer prevents TCP stalls (critical for slow WAN)
  • -I flag allows resuming from last common snapshot
# ZFS-aware rsync with bandwidth control
rsync -azhX --progress --partial \
  --bwlimit=500 \
  --delete-after \
  --rsync-path="sudo rsync" \
  /tank/photos/ \
  backup-user@backup-nas:/tank/photos/

# Post-transfer snapshot on destination
ssh backup-nas "sudo zfs snapshot tank/photos@sync_$(date +%Y%m%d)"

Consider adding these to your rsync workflow:

  • --inplace to reduce disk writes
  • --fuzzy for better rename detection
  • zstd compression (--compress --compress-choice=zstd --compress-level=3)

For both methods, implement these reliability measures:

# Network tuning for long-haul transfers
sysctl -w net.inet.tcp.delayed_ack=0
sysctl -w net.inet.tcp.mssdflt=1460
sysctl -w net.inet.tcp.recvspace=65536
sysctl -w net.inet.tcp.sendspace=65536

On pfSense/FreeBSD routers:

# /etc/pf.conf snippet
altq on $ext_if cbq bandwidth 700Kb queue { primary, backup }
queue primary bandwidth 600Kb cbq(default)
queue backup bandwidth 100Kb cbq(borrow)

Choose ZFS replication when:

  • Dataset has many small files (better metadata handling)
  • You need point-in-time recovery guarantees
  • Source and destination ZFS versions match

Opt for rsync when:

  • Files change partially (only syncs changed portions)
  • You need cross-platform compatibility
  • Non-root access is required

When dealing with 15-60GB photography datasets over a 700Kb/s upload DSL connection, traditional backup methods struggle. The receiving end's 30Mb/s download helps, but we need a solution that handles:

  • Frequent interruptions
  • Minimal bandwidth consumption
  • True incremental transfers
  • Hands-off operation

ZFS replication shines with its snapshot-based differential transfers. Here's a basic implementation:

# On source NAS:
zfs snapshot tank/photos@$(date +%Y%m%d)
zfs send -R -i tank/photos@previous_snapshot tank/photos@current_snapshot | \
  ssh backup-server "zfs receive -Fduv backup/photos"

# For resumable transfers (FreeBSD 13+):
zfs send -R -i @previous @current | mbuffer -s 128k -m 1G | \
  ssh backup-server "mbuffer -s 128k -m 1G | zfs receive -Fduv backup/photos"

Key advantages:

  • Block-level deduplication
  • Atomic transfers preserving filesystem state
  • Compression during transfer (add -c flag)

For more granular control, rsync remains viable:

rsync -azh --partial --progress --bwlimit=500 \
  --log-file=/var/log/photo_backup.log \
  /tank/photos/ backup-server:/backup/photos/

# With checksum verification:
rsync -azh --checksum --partial --progress \
  --bwlimit=500 --temp-dir=/backup/.temp \
  /tank/photos/ backup-server:/backup/photos/

When rsync makes sense:

  • When dealing with many small files
  • When you need file-level resume capability
  • When ZFS version mismatch exists between systems

For both methods, consider these tweaks:

# SSH tuning:
echo "Compression yes" >> /etc/ssh/ssh_config
echo "Ciphers aes128-ctr" >> /etc/ssh/ssh_config
echo "MACs umac-64@openssh.com" >> /etc/ssh/ssh_config

# Network QoS (pf.conf example):
altq on $ext_if bandwidth 700Kb queue { primary, backup }
queue primary bandwidth 600Kb priority 7 qlimit 50
queue backup bandwidth 100Kb priority 0
match out log proto tcp from $nas_ip to $backup_ip port 22 queue backup

Implement proper logging for both solutions:

# ZFS replication monitoring script
#!/bin/sh
LOG=/var/log/zfs_replicate.log
SNAPSHOT=$(date +%Y%m%d)

zfs snapshot tank/photos@$SNAPSHOT || exit 1
LAST=$(zfs list -t snapshot -o name -H | grep tank/photos@ | tail -2 | head -1)

{
  echo "Starting replication $(date)"
  zfs send -R -i ${LAST} tank/photos@$SNAPSHOT 2>&1 | \
    mbuffer -s 128k -m 512M 2>&1 | \
    ssh backup-server "mbuffer -s 128k -m 512M | zfs receive -Fduv backup/photos 2>&1"
  echo "Completed $(date)"
} >> $LOG 2>&1
Criteria ZFS Replication Rsync
Transfer efficiency ★★★★★ ★★★☆☆
Resume capability ★★☆☆☆ (needs mbuffer) ★★★★★
Filesystem integrity ★★★★★ ★★★☆☆
Small file handling ★★★☆☆ ★★★★★
Setup complexity ★★★★☆ ★★☆☆☆

For most ZFS-to-ZFS scenarios, I recommend starting with ZFS replication using mbuffer for resilience. Keep rsync in your toolkit for edge cases where file-level control is needed.