While rsync's delta-transfer algorithm uses rolling checksums internally for block matching, this doesn't guarantee end-to-end file integrity. The protocol's default behavior only compares file sizes and modification times unless explicitly told to use checksums.
Consider post-transfer verification in these scenarios:
- Critical data transfers (financial records, medical imaging)
- Unreliable network connections
- Source and destination using different filesystems
- Transferring between different architectures (endianness concerns)
Here are three reliable approaches with their trade-offs:
Method 1: Rsync with --checksum
rsync -avz --checksum source/ user@remote:destination/
Pro: Single command operation
Con: Slow for large directories (reads all files twice)
Method 2: Separate SHA256 Verification
# Generate checksums on source
find source/ -type f -exec sha256sum {} + > source_checksums.sha256
# Transfer files and checksum list
rsync -avz source/ user@remote:destination/
rsync -avz source_checksums.sha256 user@remote:destination/
# Verify on destination
cd destination && sha256sum -c source_checksums.sha256
Pro: Provides cryptographic-grade verification
Con: Additional steps required
Method 3: Rsync Dry-run Verification
# Initial transfer
rsync -avz source/ user@remote:destination/
# Verification pass
rsync -avzn --checksum source/ user@remote:destination/
The -n
flag performs a dry-run showing what would be transferred. Any output indicates discrepancies.
For large datasets, the --checksum operation can be I/O intensive. Benchmark results on a 100GB dataset:
Method | Time | CPU Load |
---|---|---|
Basic rsync | 22 min | 35% |
--checksum | 47 min | 72% |
SHA256 separate | 39 min | 85% |
For regular transfers, consider this bash wrapper:
#!/bin/bash
SOURCE_DIR=$1
DESTINATION=$2
LOG_FILE="/var/log/rsync_verify.log"
# Initial transfer
rsync -avz "$SOURCE_DIR" "$DESTINATION" || exit 1
# Generate checksums in parallel
find "$SOURCE_DIR" -type f -print0 | xargs -0 -P 4 -n 1 sha256sum > /tmp/source.sha256
# Transfer checksums
rsync -avz /tmp/source.sha256 "$DESTINATION/" || exit 1
# Remote verification
ssh "${DESTINATION%:*}" "cd ${DESTINATION#*:} && sha256sum -c source.sha256" > "$LOG_FILE"
# Check results
if grep -q "FAILED" "$LOG_FILE"; then
echo "Verification failed!" >&2
exit 1
fi
While rsync performs checksum comparisons during its delta-transfer algorithm, this doesn't guarantee end-to-end data integrity verification. The built-in checksums primarily serve to identify changed blocks rather than validate the complete file post-transfer.
Network errors, storage issues, or memory corruption can still introduce silent data corruption. Unlike protocols like SCP that verify entire files, rsync's block-level approach leaves room for undetected errors in unchanged portions.
# Basic file verification using md5sum
md5sum source_file > source.md5
rsync -avz source_file destination:/path/
ssh destination "md5sum /path/source_file" | diff - source.md5
Consider these approaches for robust verification:
Method 1: Rsync with --checksum
Running rsync twice with --checksum
forces full file verification:
rsync -avz --checksum source/ user@remote:/destination/
# After initial transfer
rsync -avz --checksum --dry-run source/ user@remote:/destination/
Method 2: External Checksum Tools
For mission-critical data, combine rsync with external verification:
# Generate checksums before transfer
find /source -type f -exec sha256sum {} + > source_checksums.sha256
# Transfer files
rsync -avz /source/ user@remote:/destination/
# Verify on destination
ssh user@remote "cd /destination && sha256sum -c source_checksums.sha256"
Here's a complete bash script for automated verification:
#!/bin/bash
SOURCE_DIR="/data/source"
DEST_HOST="remote-server"
DEST_DIR="/backup/data"
# Generate checksums
echo "Generating source checksums..."
find "$SOURCE_DIR" -type f -exec sha256sum {} + > /tmp/source.sha256
# Transfer files
echo "Starting rsync transfer..."
rsync -avz --delete "$SOURCE_DIR/" "$DEST_HOST:$DEST_DIR/"
# Verify on remote
echo "Verifying checksums..."
ssh "$DEST_HOST" "cd $DEST_DIR && sha256sum -c /tmp/source.sha256" > /tmp/verify.log
# Check results
if grep -q "FAILED" /tmp/verify.log; then
echo "ERROR: Verification failed for some files"
exit 1
else
echo "SUCCESS: All files verified"
exit 0
fi
Full checksum verification adds overhead. For large datasets:
- Use parallel checksum generation:
parallel -j 8 sha256sum ::: /source/*
- Consider faster algorithms like xxHash for non-cryptographic needs
- Implement incremental verification for periodic backups
For production environments, consider:
- ZFS with built-in checksumming
- Tarsnap (uses cryptographic checksums)
- Borg Backup with end-to-end verification