Investigating Rsync Size Discrepancy: 264GB vs 286GB When Copying Between ext4 and XFS Filesystems


4 views

When performing rsync operations between an ext4 source and XFS destination filesystem, I encountered a significant size discrepancy with one particular directory. While most transfers showed minimal size differences (<1GB), this specific folder reported 264GB on source but 286GB after transfer - a 22GB (8.3%) increase.

Source FS: ext4 (Linux kernel 5.4.0-xx)
Destination: XFS (Linux kernel 5.4.0-xx)
Rsync version: 3.1.3
Command used:
rsync -rltDv --prune-empty-dirs /source/path/ /dest/path/

Three complete transfer attempts yielded identical results. Standard diagnostic tools showed:

# Source check:
du -sh --apparent-size /source/path # 264GB
du -sh /source/path # 264GB

# Destination check:
du -sh --apparent-size /dest/path # 286GB
du -sh /dest/path # 286GB

After eliminating basic transfer errors, these possibilities emerged:

  • Filesystem block size differences (ext4 default 4KB vs XFS often 4KB-64KB)
  • Different handling of sparse files
  • Extended attributes (xattrs) storage overhead
  • Metadata representation variations

Comparing filesystem characteristics revealed:

# Check source block size:
tune2fs -l /dev/sdX | grep 'Block size'

# Check destination block size:
xfs_info /mount/point | grep bsize

The target XFS filesystem was configured with 64KB blocks (common for large file storage), while ext4 used 4KB blocks. This explained most of the discrepancy:

# Calculate potential overhead:
(286*1024 - 264*1024) / (264*1024/64) ≈ 5.3KB per file overhead

Created a test case with 1000 files (1MB each):

# On ext4 (4KB blocks):
dd if=/dev/zero of=testfile bs=1M count=1
du -sh testdir # Shows ~1GB

# After rsync to XFS (64KB blocks):
du -sh testdir # Shows ~64MB (1000*64KB)

For accurate space comparison:

# Use consistent measurement:
rsync -rltDv --prune-empty-dirs --preallocate /source/ /dest/

# Alternative for sparse files:
rsync -rltDv --sparse --prune-empty-dirs /source/ /dest/

# For precise byte counts:
rsync -rltDv --numeric-ids --info=progress2 /source/ /dest/

For production systems where exact space usage matters, consider:

  • Reformatting destination with matching block size
  • Using tar pipes for exact byte preservation
  • Implementing post-transfer verification scripts
#!/bin/bash
SRC="/source/path"
DST="/dest/path"

# Compare file counts
find "$SRC" -type f | wc -l > src_count.txt
find "$DST" -type f | wc -l > dst_count.txt

# Compare total bytes (apparent size)
find "$SRC" -type f -printf "%s\n" | awk '{sum+=$1} END{print sum}' > src_bytes.txt
find "$DST" -type f -printf "%s\n" | awk '{sum+=$1} END{print sum}' > dst_bytes.txt

diff {src,dst}_count.txt || echo "File count mismatch"
diff {src,dst}_bytes.txt || echo "Byte count mismatch"

When transferring data between filesystems, particularly from ext4 to XFS, several factors can contribute to size discrepancies:

# Original rsync command used:
rsync -rltDv --prune-empty-dirs /source/path/ /destination/path/

The 22GB difference (264GB → 286GB) suggests fundamental filesystem handling variations:

  • Block size allocation differences (ext4 default 4KB vs XFS often 4KB-64KB)
  • Different approaches to sparse files handling
  • Extended attributes (xattrs) storage overhead
  • Filesystem metadata accounting variations

First verify the source data integrity:

# Get accurate source size (block count)
du -s --block-size=1 /source/path

# Compare with apparent size
du -s --apparent-size /source/path

# Check for sparse files
find /source/path -type f -printf "%S\t%p\n" | awk '$1 < 1.0'

Try these enhanced parameters for more precise transfer:

rsync -rltDv --prune-empty-dirs --inplace --no-whole-file \
      --preallocate --sparse /source/path/ /destination/path/

XFS may allocate extra space for:

# Check XFS allocation parameters
xfs_info /destination/path

# Typical output shows allocation groups and block sizes
# Example:
meta-data=/dev/sdX              isize=512    agcount=32, agsize=some-value
data     =                      bsize=4096   blocks=value, imaxpct=25

Post-transfer validation techniques:

# Compare file counts
find /source/path -type f | wc -l
find /destination/path -type f | wc -l

# Generate checksums for critical files
find /source/path -type f -exec sha256sum {} + > source_checksums
find /destination/path -type f -exec sha256sum {} + > dest_checksums

If rsync discrepancies persist, consider:

# Tar pipe method
(cd /source/path && tar cf - .) | (cd /destination/path && tar xvf -)

# With progress monitoring
tar cf - /source/path -P | pv -s $(du -sb /source/path | awk '{print $1}') \
| tar xf - -C /destination/path