When working with Duplicity's incremental backup system, one common challenge is identifying every historical version of a specific file across multiple backup sessions. Unlike simple file listings, this requires examining the metadata of all backup chains.
The most effective way to track file versions is through Duplicity's collection status feature combined with file listings. Here's the basic command structure:
duplicity collection-status file:///backup/path | grep "filename.ext"
To get detailed information about every version of a specific file (let's say "important_document.odt"), you'll need to:
# First, list all backup sets
duplicity collection-status file:///backup/path
# Then inspect each backup set for your file
duplicity list-current-files --file-prefix important_document \
-t 3D file:///backup/path
duplicity list-current-files --file-prefix important_document \
-t 1W file:///backup/path
For regular monitoring, you might want to create a script that automatically tracks file changes:
#!/bin/bash
TARGET_FILE="important_document.odt"
BACKUP_PATH="file:///backup/path"
echo "Version history for $TARGET_FILE:"
echo "--------------------------------"
duplicity collection-status $BACKUP_PATH | while read -r line; do
if [[ $line =~ (Full|Incremental)[[:space:]]+([^[:space:]]+) ]]; then
type=${BASH_REMATCH[1]}
date=${BASH_REMATCH[2]}
echo -n "Checking $type backup from $date: "
duplicity list-current-files -t $date $BACKUP_PATH | grep -q "$TARGET_FILE" \
&& echo "PRESENT" || echo "ABSENT"
fi
done
Another method is to attempt restoring the file at different points in time and compare checksums:
# Get list of all backup timestamps
TIMESTAMPS=$(duplicity collection-status file:///backup/path | \
awk '/Full|Incremental/ {print $2}')
for ts in $TIMESTAMPS; do
echo "Checking timestamp $ts"
duplicity --file-to-restore important_document.odt \
-t $ts file:///backup/path ./temp_restore
if [ -f "./temp_restore" ]; then
echo "File existed at $ts with size $(stat -c%s ./temp_restore)"
md5sum ./temp_restore
rm ./temp_restore
fi
done
When implementing these solutions, remember that:
- Processing all backup sets can be time-consuming for large repositories
- You'll need sufficient temporary storage space for file comparison methods
- The accuracy depends on your backup frequency and retention policy
When working with Duplicity's incremental backup system, tracking file versions requires understanding how the software stores metadata. Unlike version control systems, Duplicity doesn't maintain explicit file histories but we can reconstruct this information from backup manifests.
The key command for examining file versions is:
duplicity collection-status file:///backup/path
This outputs metadata about all backup chains. For remote backups (like S3), use:
duplicity collection-status s3://bucket/path
To list all versions of a specific file:
duplicity list-current-files --time 2024-01-01 file:///backup/path | grep filename.txt duplicity list-current-files --time 2024-01-15 file:///backup/path | grep filename.txt duplicity list-current-files --time 2024-02-01 file:///backup/path | grep filename.txt
For automated processing, this Bash script checks all backup points:
#!/bin/bash BACKUP_PATH="file:///backup/path" TARGET_FILE="important_document.txt" duplicity collection-status $BACKUP_PATH | awk '/Full|Incremental/{print $2}' | while read -r date; do echo "Checking $date" duplicity list-current-files --time "$date" $BACKUP_PATH 2>/dev/null | grep "$TARGET_FILE" && echo "Found in backup from $date" done
Another approach is to restore the file from different time points and compare checksums:
for timestamp in $(duplicity collection-status $BACKUP_PATH | awk '/Full|Incremental/{print $2}'); do duplicity --file-to-restore path/to/file --time "$timestamp" $BACKUP_PATH /tmp/restored_file md5sum /tmp/restored_file done
For large backup sets or remote storage, these operations can be slow. Consider:
- Caching collection-status output
- Running these commands during off-peak hours
- Limiting the time range with --time-from/--time-to options
For production environments, you might want to integrate this with your monitoring:
import subprocess from datetime import datetime def get_file_versions(backup_path, filename): result = subprocess.run(['duplicity', 'collection-status', backup_path], capture_output=True, text=True) versions = [] for line in result.stdout.split('\n'): if 'Full' in line or 'Incremental' in line: parts = line.split() date_str = parts[1] + ' ' + parts[2] timestamp = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S') versions.append(timestamp) file_history = [] for version in sorted(versions): dt_str = version.strftime('%Y-%m-%d %H:%M:%S') cmd = ['duplicity', 'list-current-files', '--time', dt_str, backup_path] res = subprocess.run(cmd, capture_output=True, text=True) if filename in res.stdout: file_history.append({ 'timestamp': dt_str, 'present': True, 'type': 'Full' if 'Full' in line else 'Incremental' }) return file_history