How to List All Historical Versions of a File in Duplicity Backups


2 views

When working with Duplicity's incremental backup system, one common challenge is identifying every historical version of a specific file across multiple backup sessions. Unlike simple file listings, this requires examining the metadata of all backup chains.

The most effective way to track file versions is through Duplicity's collection status feature combined with file listings. Here's the basic command structure:

duplicity collection-status file:///backup/path | grep "filename.ext"

To get detailed information about every version of a specific file (let's say "important_document.odt"), you'll need to:

# First, list all backup sets
duplicity collection-status file:///backup/path

# Then inspect each backup set for your file
duplicity list-current-files --file-prefix important_document \
    -t 3D file:///backup/path

duplicity list-current-files --file-prefix important_document \
    -t 1W file:///backup/path

For regular monitoring, you might want to create a script that automatically tracks file changes:

#!/bin/bash
TARGET_FILE="important_document.odt"
BACKUP_PATH="file:///backup/path"

echo "Version history for $TARGET_FILE:"
echo "--------------------------------"

duplicity collection-status $BACKUP_PATH | while read -r line; do
    if [[ $line =~ (Full|Incremental)[[:space:]]+([^[:space:]]+) ]]; then
        type=${BASH_REMATCH[1]}
        date=${BASH_REMATCH[2]}
        echo -n "Checking $type backup from $date: "
        duplicity list-current-files -t $date $BACKUP_PATH | grep -q "$TARGET_FILE" \
            && echo "PRESENT" || echo "ABSENT"
    fi
done

Another method is to attempt restoring the file at different points in time and compare checksums:

# Get list of all backup timestamps
TIMESTAMPS=$(duplicity collection-status file:///backup/path | \
    awk '/Full|Incremental/ {print $2}')

for ts in $TIMESTAMPS; do
    echo "Checking timestamp $ts"
    duplicity --file-to-restore important_document.odt \
        -t $ts file:///backup/path ./temp_restore
    if [ -f "./temp_restore" ]; then
        echo "File existed at $ts with size $(stat -c%s ./temp_restore)"
        md5sum ./temp_restore
        rm ./temp_restore
    fi
done

When implementing these solutions, remember that:

  • Processing all backup sets can be time-consuming for large repositories
  • You'll need sufficient temporary storage space for file comparison methods
  • The accuracy depends on your backup frequency and retention policy

When working with Duplicity's incremental backup system, tracking file versions requires understanding how the software stores metadata. Unlike version control systems, Duplicity doesn't maintain explicit file histories but we can reconstruct this information from backup manifests.

The key command for examining file versions is:

duplicity collection-status file:///backup/path

This outputs metadata about all backup chains. For remote backups (like S3), use:

duplicity collection-status s3://bucket/path

To list all versions of a specific file:

duplicity list-current-files --time 2024-01-01 file:///backup/path | grep filename.txt
duplicity list-current-files --time 2024-01-15 file:///backup/path | grep filename.txt
duplicity list-current-files --time 2024-02-01 file:///backup/path | grep filename.txt

For automated processing, this Bash script checks all backup points:

#!/bin/bash
BACKUP_PATH="file:///backup/path"
TARGET_FILE="important_document.txt"

duplicity collection-status $BACKUP_PATH | 
  awk '/Full|Incremental/{print $2}' |
  while read -r date; do
    echo "Checking $date"
    duplicity list-current-files --time "$date" $BACKUP_PATH 2>/dev/null |
      grep "$TARGET_FILE" &&
      echo "Found in backup from $date"
  done

Another approach is to restore the file from different time points and compare checksums:

for timestamp in $(duplicity collection-status $BACKUP_PATH | awk '/Full|Incremental/{print $2}'); do
  duplicity --file-to-restore path/to/file --time "$timestamp" $BACKUP_PATH /tmp/restored_file
  md5sum /tmp/restored_file
done

For large backup sets or remote storage, these operations can be slow. Consider:

  • Caching collection-status output
  • Running these commands during off-peak hours
  • Limiting the time range with --time-from/--time-to options

For production environments, you might want to integrate this with your monitoring:

import subprocess
from datetime import datetime

def get_file_versions(backup_path, filename):
    result = subprocess.run(['duplicity', 'collection-status', backup_path],
                          capture_output=True, text=True)
    versions = []
    for line in result.stdout.split('\n'):
        if 'Full' in line or 'Incremental' in line:
            parts = line.split()
            date_str = parts[1] + ' ' + parts[2]
            timestamp = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
            versions.append(timestamp)
    
    file_history = []
    for version in sorted(versions):
        dt_str = version.strftime('%Y-%m-%d %H:%M:%S')
        cmd = ['duplicity', 'list-current-files', '--time', dt_str, backup_path]
        res = subprocess.run(cmd, capture_output=True, text=True)
        if filename in res.stdout:
            file_history.append({
                'timestamp': dt_str,
                'present': True,
                'type': 'Full' if 'Full' in line else 'Incremental'
            })
    return file_history