Unlike modern filesystems like btrfs or ZFS, ext4 doesn't natively support file-level checksums for data integrity verification. This creates a challenge when you need to validate data correctness before critical operations like backups. Here are some practical approaches to implement checksum verification on ext4:
The most straightforward method is to generate and store checksums for individual files:
# Generate SHA-256 checksums for all files in a directory
find /path/to/data -type f -exec sha256sum {} \; > checksums.sha256
# Verify checksums later
sha256sum -c checksums.sha256
For more sophisticated needs, consider implementing a database of file checksums:
#!/bin/bash
# Checksum database manager
DB_FILE="/var/lib/checksum_db"
verify_integrity() {
while IFS= read -r line; do
file=$(echo "$line" | awk '{print $2}')
if [ ! -f "$file" ]; then
echo "Missing: $file"
continue
fi
echo "$line" | sha256sum -c --quiet 2>/dev/null || echo "Corrupt: $file"
done < "$DB_FILE"
}
update_database() {
find /path/to/data -type f -exec sha256sum {} \; > "$DB_FILE.tmp"
mv "$DB_FILE.tmp" "$DB_FILE"
}
The BSD mtree format provides a robust way to track file attributes and checksums:
# Create mtree database
mtree -c -K cksum,sha256 -p /path/to/data > /backup/data.mtree
# Verify against database
mtree -f /backup/data.mtree -p /path/to/data
For mission-critical systems, consider these enterprise-grade solutions:
- Implement a custom FUSE layer that transparently handles checksums
- Use dm-verity with device mapper for block-level verification
- Deploy auditd rules to monitor file modifications
Most modern backup tools support pre-backup verification hooks. Here's an example for BorgBackup:
#!/bin/bash
# Pre-backup verification script for Borg
if ! sha256sum -c /backup/checksums.sha256 >/dev/null 2>&1; then
logger "Backup aborted: checksum verification failed"
exit 1
fi
For large datasets, consider parallelizing checksum generation with GNU parallel:
find /data -type f | parallel -j8 sha256sum > checksums.sha256
Unlike modern filesystems like btrfs or ZFS, ext4 lacks built-in checksum functionality for data blocks. The filesystem only maintains checksums for metadata (since Linux 4.18 via the metadata_csum
feature), leaving user data vulnerable to silent corruption.
Here are three reliable approaches to verify data integrity before backups on ext4:
1. File-level Hashing
# Generate SHA-256 checksums for all files
find /path/to/backup -type f -exec sha256sum {} + > checksums.txt
# Verify later
sha256sum -c checksums.txt
2. Block Device Verification
# Create binary checksums of raw device blocks
sudo dd if=/dev/sdX bs=1M | sha256sum > device_checksum.sha256
# Verification requires unmounting
sudo umount /dev/sdX
sudo dd if=/dev/sdX bs=1M | sha256sum -c device_checksum.sha256
For continuous protection, consider using Linux's device mapper integrity target:
# Setup dm-integrity on a block device
sudo integritysetup format /dev/sdX --integrity=hmac-sha256
sudo integritysetup open /dev/sdX int-sdX --integrity=hmac-sha256
# Create filesystem on the protected device
sudo mkfs.ext4 /dev/mapper/int-sdX
Here's a Python script that implements differential verification:
#!/usr/bin/env python3
import hashlib
import os
from pathlib import Path
def generate_checksums(directory):
checksums = {}
for filepath in Path(directory).rglob('*'):
if filepath.is_file():
with open(filepath, 'rb') as f:
checksums[str(filepath)] = hashlib.sha256(f.read()).hexdigest()
return checksums
def verify_checksums(original, current):
for path, original_hash in original.items():
if path not in current:
print(f"File missing: {path}")
continue
if current[path] != original_hash:
print(f"Checksum mismatch: {path}")
# Usage:
prev_state = generate_checksums('/backup/data')
# ... after some time ...
current_state = generate_checksums('/backup/data')
verify_checksums(prev_state, current_state)
When implementing verification for backup systems:
- Store checksums separately from the backup data
- Consider using par2 for redundancy
- For rsync backups, use
-c
flag for checksum verification - Cloud storage users should enable object-level checksums