When performing rsync operations between an ext4 source and XFS destination filesystem, I encountered a significant size discrepancy with one particular directory. While most transfers showed minimal size differences (<1GB), this specific folder reported 264GB on source but 286GB after transfer - a 22GB (8.3%) increase.
Source FS: ext4 (Linux kernel 5.4.0-xx)
Destination: XFS (Linux kernel 5.4.0-xx)
Rsync version: 3.1.3
Command used:
rsync -rltDv --prune-empty-dirs /source/path/ /dest/path/
Three complete transfer attempts yielded identical results. Standard diagnostic tools showed:
# Source check:
du -sh --apparent-size /source/path # 264GB
du -sh /source/path # 264GB
# Destination check:
du -sh --apparent-size /dest/path # 286GB
du -sh /dest/path # 286GB
After eliminating basic transfer errors, these possibilities emerged:
- Filesystem block size differences (ext4 default 4KB vs XFS often 4KB-64KB)
- Different handling of sparse files
- Extended attributes (xattrs) storage overhead
- Metadata representation variations
Comparing filesystem characteristics revealed:
# Check source block size:
tune2fs -l /dev/sdX | grep 'Block size'
# Check destination block size:
xfs_info /mount/point | grep bsize
The target XFS filesystem was configured with 64KB blocks (common for large file storage), while ext4 used 4KB blocks. This explained most of the discrepancy:
# Calculate potential overhead:
(286*1024 - 264*1024) / (264*1024/64) ≈ 5.3KB per file overhead
Created a test case with 1000 files (1MB each):
# On ext4 (4KB blocks):
dd if=/dev/zero of=testfile bs=1M count=1
du -sh testdir # Shows ~1GB
# After rsync to XFS (64KB blocks):
du -sh testdir # Shows ~64MB (1000*64KB)
For accurate space comparison:
# Use consistent measurement:
rsync -rltDv --prune-empty-dirs --preallocate /source/ /dest/
# Alternative for sparse files:
rsync -rltDv --sparse --prune-empty-dirs /source/ /dest/
# For precise byte counts:
rsync -rltDv --numeric-ids --info=progress2 /source/ /dest/
For production systems where exact space usage matters, consider:
- Reformatting destination with matching block size
- Using
tar
pipes for exact byte preservation - Implementing post-transfer verification scripts
#!/bin/bash
SRC="/source/path"
DST="/dest/path"
# Compare file counts
find "$SRC" -type f | wc -l > src_count.txt
find "$DST" -type f | wc -l > dst_count.txt
# Compare total bytes (apparent size)
find "$SRC" -type f -printf "%s\n" | awk '{sum+=$1} END{print sum}' > src_bytes.txt
find "$DST" -type f -printf "%s\n" | awk '{sum+=$1} END{print sum}' > dst_bytes.txt
diff {src,dst}_count.txt || echo "File count mismatch"
diff {src,dst}_bytes.txt || echo "Byte count mismatch"
When transferring data between filesystems, particularly from ext4 to XFS, several factors can contribute to size discrepancies:
# Original rsync command used:
rsync -rltDv --prune-empty-dirs /source/path/ /destination/path/
The 22GB difference (264GB → 286GB) suggests fundamental filesystem handling variations:
- Block size allocation differences (ext4 default 4KB vs XFS often 4KB-64KB)
- Different approaches to sparse files handling
- Extended attributes (xattrs) storage overhead
- Filesystem metadata accounting variations
First verify the source data integrity:
# Get accurate source size (block count)
du -s --block-size=1 /source/path
# Compare with apparent size
du -s --apparent-size /source/path
# Check for sparse files
find /source/path -type f -printf "%S\t%p\n" | awk '$1 < 1.0'
Try these enhanced parameters for more precise transfer:
rsync -rltDv --prune-empty-dirs --inplace --no-whole-file \
--preallocate --sparse /source/path/ /destination/path/
XFS may allocate extra space for:
# Check XFS allocation parameters
xfs_info /destination/path
# Typical output shows allocation groups and block sizes
# Example:
meta-data=/dev/sdX isize=512 agcount=32, agsize=some-value
data = bsize=4096 blocks=value, imaxpct=25
Post-transfer validation techniques:
# Compare file counts
find /source/path -type f | wc -l
find /destination/path -type f | wc -l
# Generate checksums for critical files
find /source/path -type f -exec sha256sum {} + > source_checksums
find /destination/path -type f -exec sha256sum {} + > dest_checksums
If rsync discrepancies persist, consider:
# Tar pipe method
(cd /source/path && tar cf - .) | (cd /destination/path && tar xvf -)
# With progress monitoring
tar cf - /source/path -P | pv -s $(du -sb /source/path | awk '{print $1}') \
| tar xf - -C /destination/path