Open Source Deduplicating Backup Solutions for Multi-OS Server Environments: Evaluating Bacula and Alternatives


2 views

When dealing with heterogeneous server environments (CentOS + Windows Server 2003/2008) across data centers, several technical challenges emerge:

  • Cross-platform file system compatibility
  • Efficient bandwidth utilization over 1GbE links
  • Storage optimization through deduplication
  • Long-term retention policies (6 months for monthly, 1 month for weekly)

While Bacula doesn't natively support block-level deduplication, we can implement file-level deduplication through its catalog system. Here's a sample configuration snippet for the FileSet resource:

FileSet {
  Name = "DedupeFS"
  Include {
    Options {
      signature = MD5
      onefs = no
    }
    File = /path/to/data
  }
}

The signature option enables file-level checksum comparison, while onefs handles cross-platform path conversions. For Windows systems, you'll need the Bacula FD configured with:

FileDaemon {
  Name = win2008-fd
  WorkingDirectory = "C:\\Program Files\\Bacula\\working"
  Pid Directory = "C:\\Program Files\\Bacula\\working"
  Maximum Concurrent Jobs = 20
}

1. Duplicati

Command-line example for setting up automated backups with retention policies:

duplicati-cli backup \
  s3://bucket-name/backup-path \
  /path/to/source \
  --dbpath=/var/lib/duplicati/config.db \
  --encryption-module=aes \
  --retention-policy="1M:1D,6M:1W" \
  --dblock-size=50MB \
  --no-encryption

2. BorgBackup

Sample initialization and backup script for Linux servers:

#!/bin/bash
# Initialize repo
borg init --encryption=repokey /backup/repo

# Create backup with weekly retention
borg create \
  --stats --progress \
  /backup/repo::server-{now:%Y-%m-%d} \
  /etc /var/www

# Prune old backups
borg prune \
  --keep-monthly=6 \
  --keep-weekly=4 \
  /backup/repo

When implementing any deduplicating solution, consider these storage architecture factors:

# ZFS dataset configuration example for backup storage
zfs create -o compression=lz4 \
           -o dedup=on \
           -o recordsize=128K \
           backuppool/bacula

For Windows-centric environments, ReFS on Server 2016+ offers built-in deduplication:

# PowerShell commands to enable deduplication
Enable-WindowsOptionalFeature -Online -FeatureName 
  File-Services-Fsrm-Infrastructure,FS-Data-Deduplication

Essential commands to verify deduplication effectiveness:

# For ZFS systems
zpool list -v
zfs get used,compressratio,dedup backuppool/bacula

# For Windows Dedup
Get-DedupStatus | fl *
Get-DedupVolume | select path,usage* | ft -auto

Managing backups across 30+ servers with mixed operating systems (CentOS, Windows Server 2003/2008) presents unique technical challenges. The requirement for both full monthly backups (retained for 6 months) and weekly backups (retained for 1 month) demands a sophisticated solution with intelligent storage management.

While Bacula is a robust open-source backup solution, its native deduplication capabilities are limited. However, we can implement a workaround using its Volume Shadow Copy Service (VSS) integration and external deduplication tools:

# Sample Bacula FileDaemon configuration for Windows
FileDaemon {
  Name = winserver-fd
  FDport = 9102
  WorkingDirectory = "C:/bacula/working"
  Pid Directory = "C:/bacula/pid"
  Maximum Concurrent Jobs = 10
  FDAddress = 0.0.0.0
  Plugin Directory = "C:/bacula/plugins"
}

For true block-level deduplication across mixed environments, consider these alternatives:

  • Duplicati: Offers client-side deduplication with AES-256 encryption
  • Restic: Provides efficient snapshots and pruning capabilities
  • BorgBackup: Excellent for Linux environments with compression and encryption

For SAN-based backup targets, consider this Python script to manage symlinks and storage efficiency:

import os
import hashlib

def create_dedup_backup(source_path, dest_path):
    file_hash = hashlib.sha256()
    with open(source_path, 'rb') as f:
        while chunk := f.read(4096):
            file_hash.update(chunk)
    hash_str = file_hash.hexdigest()
    
    dest_file = os.path.join(dest_path, hash_str[:2], hash_str[2:4], hash_str)
    if not os.path.exists(dest_file):
        os.makedirs(os.path.dirname(dest_file), exist_ok=True)
        os.link(source_path, dest_file)
    return os.path.relpath(dest_file, dest_path)

For the requested 6-month full backup retention with 1-month weekly backups, consider this cron job configuration:

# Monthly full backups (kept for 6 months)
0 2 1 * * /usr/local/bin/backup-script --full --retain 6

# Weekly incremental backups (kept for 4 weeks)
0 2 * * 0 /usr/local/bin/backup-script --incremental --retain 4

Implement verification checks with this shell script snippet:

#!/bin/bash
BACKUP_DIR=/san/backups
REPORT_FILE=/var/log/backup_verify.log

find "$BACKUP_DIR" -type f -mtime -1 | while read -r file; do
    if ! tar -tzf "$file" >/dev/null 2>&1; then
        echo "[$(date)] Corrupt backup detected: $file" >> "$REPORT_FILE"
    fi
done