Optimizing Duplicity Backup: Managing Excessive Cache Growth in Large-Scale Linux Backups


4 views

When running Duplicity for large backups (like your 110GB/2TB CentOS setup), the cache directory (~/.cache/duplicity) can balloon unexpectedly. Your observation of 600GB cache for a 90GB backup indicates either misconfiguration or unoptimized parameters.

The cache stores several components during backup operations:

  • Signature files (file hashes)
  • Snapshot manifests
  • Temporary chunk files
  • Collection status files

In your case, three factors contribute to cache bloat:

1. Full root backup (excluding only /proc, /sys, /dev)
2. Default cache retention settings
3. Potential interrupted backup cycles

To immediately reclaim space while keeping your backup intact:

# Safe cache cleanup (preserves current session)
duplicity cleanup --force --extra-clean sftp://user@backupserver/path

For aggressive cleanup (when restarting backups):

# Nuclear option - clears ALL cache
rm -rf ~/.cache/duplicity/*
# Then verify your backup chain
duplicity verify sftp://user@backupserver/path /mnt/verify_target

Modify your Hetzner script with these parameters:

# Add to your existing duplicity command:
--archive-dir=/mnt/tmp/duplicity_cache \
--tempdir=/mnt/tmp \
--volsize=1024 \
--no-encryption \  # Only if acceptable for your security requirements

Key improvements this makes:

  • Redirects cache to larger storage volume
  • Optimizes chunk size (1GB volumes)
  • Reduces encryption overhead

Add this cron job (weekly):

0 3 * * 0 duplicity collection-status sftp://user@backupserver/path | grep "Orphaned" | awk '{print $2}' | xargs -I {} duplicity remove-older-than {} --force sftp://user@backupserver/path

Since you're backing up from root, explicitly exclude cache directories:

--exclude=/home/*/.cache \
--exclude=/root/.cache \
--exclude=/var/cache

Verify exclusions with:

duplicity --dry-run --exclude=**/.cache --include=/ --exclude=** / sftp://user@backupserver/path

For large systems, consider tiered backups:

# Full backup monthly
duplicity full / sftp://user@backupserver/path/monthly

# Weekly incrementals
duplicity incremental --full-if-older-than 1M / sftp://user@backupserver/path/weekly

# Daily metadata backup (smaller cache)
duplicity --no-compression --no-encryption --files-from <(find /etc /var/www -type f) /dev/null sftp://user@backupserver/path/metadata

Create this monitoring script (/usr/local/bin/check_duplicity_cache.sh):

#!/bin/bash
CACHE_LIMIT_MB=102400  # 100GB
CURRENT_SIZE=$(du -sm ~/.cache/duplicity | awk '{print $1}')

if [ "$CURRENT_SIZE" -gt "$CACHE_LIMIT_MB" ]; then
    logger "Duplicity cache exceeded $CACHE_LIMIT_MB MB (current: $CURRENT_SIZE MB)"
    duplicity cleanup --force sftp://user@backupserver/path
fi

Set executable permissions and add to cron:

chmod +x /usr/local/bin/check_duplicity_cache.sh
echo "*/30 * * * * /usr/local/bin/check_duplicity_cache.sh" | sudo tee /etc/cron.d/duplicity_cache_check

When running large backups with duplicity (in this case a 110GB CentOS system backing up to SFTP), the cache growth in ~/.cache/duplicity can become unexpectedly massive - in your case ballooning to 600GB. This occurs because:

  • Duplicity maintains a local cache of file signatures and manifests
  • For full system backups, the metadata overhead multiplies rapidly
  • The cache isn't automatically pruned during active operations

First, let's verify your current cache status:

du -sh ~/.cache/duplicity
ls -lh ~/.cache/duplicity | wc -l

For active backup operations, we can implement these optimizations:

# Add these parameters to your duplicity command:
--archive-dir=/tmp/duplicity_cache  # Redirect cache to tmpfs
--volsize=250                       # Reduce volume size for better memory handling
--verbosity warning                 # Reduce log overhead

Since you're backing up from root, you'll want to explicitly exclude the cache directory. Modify your script to include:

--exclude=/home/user/.cache/duplicity

For the Hetzner script you referenced, modify the EXCLUDE variable:

EXCLUDE="--exclude /proc --exclude /sys --exclude /dev --exclude /home/*/.cache/duplicity"

Duplicity won't automatically clean its cache. Implement a cleanup routine:

# Add to your backup script:
duplicity cleanup --force --extra-clean --archive-dir=/tmp/duplicity_cache sftp://user@backupserver/path

For systems with limited space, mount a temporary filesystem:

mkdir -p /mnt/tmp_cache
mount -t tmpfs -o size=5G tmpfs /mnt/tmp_cache
export TMPDIR=/mnt/tmp_cache

Create a monitoring script (check_duplicity_cache.sh):

#!/bin/bash
CACHE_SIZE=$(du -s ~/.cache/duplicity | awk '{print $1}')
THRESHOLD=1048576 # 1GB in KB

if [ "$CACHE_SIZE" -gt "$THRESHOLD" ]; then
    echo "Warning: Duplicity cache size $(($CACHE_SIZE/1024))MB exceeds threshold"
    # Add automatic cleanup logic here
fi