Implementing Robust Incremental ZFS Backups: Automated Snapshot Management with Sanoid


1 views

While ZFS's built-in zfs send and zfs receive provide excellent foundation for backups, managing snapshots manually becomes tedious in production environments. Here's what we typically need to handle:

# Basic manual workflow
zfs snapshot tank/data@backup-$(date +%Y%m%d)
zfs send -i tank/data@previous-snap tank/data@backup-$(date +%Y%m%d) | \
  ssh backup-server "zfs receive backup-pool/data"

Sanoid solves the automation problem while following ZFS best practices. Despite being lesser-known, it's actively maintained (as of 2023) and used in production by many organizations.

Installation on Debian-based systems:

sudo apt install -y debhelper libconfig-inifiles-perl git
git clone https://github.com/jimsalterjrs/sanoid.git
cd sanoid
git checkout $(git tag -l | sort -V | tail -n 1)
sudo ln -s $(pwd)/sanoid /usr/local/bin/
sudo ln -s $(pwd)/syncoid /usr/local/bin/
sudo mkdir -p /etc/sanoid

Create /etc/sanoid/sanoid.conf:

[tank/data]
    use_template = production
    recursive = yes

[template_production]
    frequently = 0
    hourly = 6
    daily = 7
    monthly = 3
    yearly = 0
    autosnap = yes
    autoprune = yes

For continuous offsite backups, combine with Syncoid (Sanoid's companion tool):

syncoid --sshkey=/root/.ssh/backup_key \
  --no-sync-snap \
  --recursive \
  tank/data backup-server:backup-pool/data

Set up regular jobs in /etc/cron.d/zfs_backups:

# Snapshot every hour
0 * * * * root /usr/local/bin/sanoid --take-snapshots --quiet

# Replicate every 4 hours
0 */4 * * * root /usr/local/bin/syncoid [options] >> /var/log/syncoid.log 2>&1

Implement these checks to ensure backup integrity:

# Check last snapshot date
zfs list -t snapshot -o name,creation -s creation | grep tank/data | tail -n1

# Verify remote replication
ssh backup-server "zfs list -t snapshot backup-pool/data | wc -l"

# Test restore
zfs send backup-pool/data@recent-snap | zfs receive -F tank/restore-test

While Sanoid is recommended, these alternatives exist:

  • ZFS Auto Backup: Simple Python script (limited features)
  • Znapzend: Perl-based (complex configuration)
  • Borg+zfs-autobackup: For non-ZFS destinations

Maintaining reliable offsite backups for ZFS pools requires careful design to balance automation with data integrity. While ZFS's native zfs send and zfs receive provide excellent foundation, production environments demand proper tooling around snapshot lifecycle management.

Contrary to concerns about niche tools, Sanoid has become the de facto standard for ZFS snapshot management, used by major enterprises and cloud providers. Its key features include:

# Example Sanoid configuration (/etc/sanoid/sanoid.conf)
[data/pool]
    use_template = production
    recursive = yes

[template_production]
    hourly = 24
    daily = 30
    monthly = 12
    autoprune = yes
    autosnap = yes

Here's a complete solution combining Sanoid with ZFS replication:

#!/bin/bash
# Step 1: Create new snapshots (triggered by Sanoid cron)
sanoid --cron

# Step 2: Identify latest snapshots for replication
SOURCE_SNAP=$(zfs list -t snapshot -o name -H -d 1 tank/data | grep @ | tail -1)
DEST_SNAP="backup/data@${SOURCE_SNAP##*@}"

# Step 3: Perform incremental send/receive
if zfs list -H "$DEST_SNAP" >/dev/null 2>&1; then
    # Incremental transfer
    PREV_SNAP=$(zfs list -t snapshot -o name -H -d 1 backup/data | grep @ | tail -2 | head -1)
    zfs send -i "$PREV_SNAP" "$SOURCE_SNAP" | ssh backup-host "zfs receive -F backup/data"
else
    # Initial transfer
    zfs send "$SOURCE_SNAP" | ssh backup-host "zfs receive -F backup/data"
fi

For enterprise deployments:

  • Implement mbuffer for network throughput optimization
  • Use SSH certificates instead of passwords
  • Monitor transfer integrity with zstreamdump
  • Consider using ZFS encryption for sensitive data

Create a verification script to run periodically:

#!/bin/bash
SOURCE_COUNT=$(zfs list -t snapshot -H -o name tank/data | wc -l)
DEST_COUNT=$(ssh backup-host "zfs list -t snapshot -H -o name backup/data" | wc -l)

if [ "$SOURCE_COUNT" -ne "$DEST_COUNT" ]; then
    echo "ERROR: Snapshot count mismatch!" | mail -s "ZFS Backup Alert" admin@example.com
fi