When working with backup systems using rsync, verifying directory consistency between source and destination is crucial. The requirement extends beyond simple file existence checks to comparing metadata like:
- File modification timestamps
- File sizes in bytes
- Permission attributes
- Checksums for critical verification
Rsync itself provides several built-in comparison modes:
# Dry-run with itemized changes
rsync -avun --delete source/ destination/
# Checksum comparison (slow but thorough)
rsync -avc --dry-run source/ destination/
However, these outputs require parsing for programmatic use.
For more control, here's a Python script using os.walk()
and filecmp
:
import os
import filecmp
from datetime import datetime
def compare_dirs(src, dst):
diff_report = []
for root, _, files in os.walk(src):
rel_path = os.path.relpath(root, src)
dst_path = os.path.join(dst, rel_path)
for file in files:
src_file = os.path.join(root, file)
dst_file = os.path.join(dst_path, file)
if not os.path.exists(dst_file):
diff_report.append(f"{src_file} - MISSING IN DESTINATION")
continue
src_stat = os.stat(src_file)
dst_stat = os.stat(dst_file)
if src_stat.st_mtime != dst_stat.st_mtime or src_stat.st_size != dst_stat.st_size:
diff_report.append(
f"{src_file} (mtime: {datetime.fromtimestamp(src_stat.st_mtime)}, size: {src_stat.st_size}) | "
f"{dst_file} (mtime: {datetime.fromtimestamp(dst_stat.st_mtime)}, size: {dst_stat.st_size}) | "
"MODIFIED"
)
return diff_report
For absolute certainty, implement MD5/SHA1 comparison:
import hashlib
def get_file_hash(filepath):
hasher = hashlib.md5()
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b""):
hasher.update(chunk)
return hasher.hexdigest()
# Add this to the comparison logic:
src_hash = get_file_hash(src_file)
dst_hash = get_file_hash(dst_file)
if src_hash != dst_hash:
diff_report.append(f"CONTENT DIFF: {src_file} != {dst_file}")
When dealing with large directories:
- Use multiprocessing for parallel hash calculations
- Cache previous comparison results
- Implement directory snapshotting for incremental checks
# Tree comparison
tree -Dugps /source > source_tree.txt
tree -Dugps /backup > backup_tree.txt
diff source_tree.txt backup_tree.txt
# Using specialized tools
sudo apt install meld # GUI diff tool
dirdiff -r source/ backup/
When dealing with directory synchronization and backup validation, rsync stands out as one of the most powerful tools in a Unix/Linux administrator's arsenal. The beauty of rsync lies not just in its synchronization capabilities, but also in its verbose comparison output options.
rsync -avn --itemize-changes /source/path/ /destination/path/
This command provides detailed output showing exactly what would change during synchronization. The -n
flag makes it a dry run, while --itemize-changes
breaks down each comparison result.
Understanding rsync's output codes is crucial for accurate directory comparison:
>f..t...... file.txt
>f.s...... file2.txt
cd++++++++ new_dir/
f
indicates a file comparison.
means the attribute is the samet
shows timestamp differencess
indicates size differences+
marks files/directories that would be created
For those needing more customized comparison logic, here's a Python solution using os.walk() and filecmp:
import os
import filecmp
from datetime import datetime
def compare_dirs(dir1, dir2):
comparison = filecmp.dircmp(dir1, dir2)
print("Files only in", dir1)
for item in comparison.left_only:
print(os.path.join(dir1, item))
print("\nFiles only in", dir2)
for item in comparison.right_only:
print(os.path.join(dir2, item))
print("\nCommon files with differences:")
for item in comparison.diff_files:
path1 = os.path.join(dir1, item)
path2 = os.path.join(dir2, item)
size1 = os.path.getsize(path1)
size2 = os.path.getsize(path2)
mtime1 = datetime.fromtimestamp(os.path.getmtime(path1))
mtime2 = datetime.fromtimestamp(os.path.getmtime(path2))
print(f"{path1} ({mtime1}) ({size1} bytes) | {path2} ({mtime2}) ({size2} bytes)")
# Example usage:
compare_dirs('/local/path', '/remote/path')
For absolute certainty in file comparison, even when metadata appears identical, consider adding SHA256 checksum verification:
import hashlib
def get_file_hash(filepath):
hasher = hashlib.sha256()
with open(filepath, 'rb') as f:
while chunk := f.read(4096):
hasher.update(chunk)
return hasher.hexdigest()
# Add this to the comparison function:
hash1 = get_file_hash(path1)
hash2 = get_file_hash(path2)
print(f"Checksums: {hash1} vs {hash2}")
For those preferring GUI solutions:
- Meld: Excellent graphical diff tool with directory comparison
- Beyond Compare: Powerful commercial option with detailed reporting
- KDiff3: Open-source alternative with merge capabilities
Command-line enthusiasts might prefer vimdiff
for side-by-side file comparison, though it's less suited for entire directory structures.
For ongoing monitoring, consider setting up a cron job that logs differences:
0 2 * * * rsync -avn --itemize-changes /source/ /backup/ > /var/log/backup_diff_$(date +\%Y\%m\%d).log
This runs daily at 2 AM and saves output to dated log files for historical tracking.