When dealing with thousands of small files (typically <1KB each), rsync's default configuration becomes inefficient due to:
- High protocol overhead per file
- Excessive metadata operations
- Frequent checksum calculations
For my production servers, I use this optimized command:
rsync -azHX --delete --numeric-ids \
--info=progress2 --no-i-r \
--partial-dir=.rsync-partial \
--bwlimit=0 --compress-level=1 \
/home/user/ user@10.1.1.1::backup
Compression Strategy (-z with level 1):
--compress-level=1 # Faster than default (6)
Transfer Protocol Choice:
# For LAN transfers (faster but insecure):
rsync:// protocol
# For WAN transfers (slower but secure):
rsync -e "ssh -T -c aes128-gcm@openssh.com -o Compression=no -x"
For critical deployments, I combine rsync with GNU parallel:
find /home/user/ -type f | parallel -j 8 -X rsync -azHX {} user@10.1.1.1::backup
Tool | Small Files (10k) | Transfer Time |
---|---|---|
rsync (optimized) | ~500MB | 3m42s |
tar over ssh | ~500MB | 2m15s |
fpart + rsync | ~500MB | 1m53s |
For our enterprise backup system, we use this wrapper script:
#!/bin/bash
TARGET="user@10.1.1.1::backup"
THREADS=$(nproc)
LOG="/var/log/rsync_$(date +%Y%m%d).log"
fpart -f 1000 -x "*.tmp" -s 10M /home/user/ \
| parallel -j $THREADS -X rsync -azHX \
--files-from={} --delete / $TARGET >> $LOG 2>&1
When dealing with thousands of small files, rsync's default behavior can become inefficient due to:
- High overhead from per-file metadata operations
- Excessive protocol negotiation
- Unnecessary checksum calculations
Here are the most effective rsync flags for small file optimization:
rsync -az --partial --inplace --no-whole-file \
--max-size=1M --min-size=1 \
--info=progress2 --human-readable \
--delete --compress-level=1 \
user@10.1.1.1::backup /home/user/
Key optimizations:
--compress-level=1
: Faster compression with minimal CPU overhead--inplace
: Avoids temporary file creation--no-whole-file
: Enables delta-transfer algorithm--info=progress2
: Better progress reporting
For local networks, the rsync protocol is generally faster than SSH. However, consider these benchmarks:
# SSH version (slower but more secure)
rsync -e "ssh -T -c aes128-gcm@openssh.com -o Compression=no -x" \
-azP /source/ user@remote:/dest/
# rsync daemon version (faster)
rsync -azP rsync://user@remote/module/path /local/path
File Batching with tar
Combine small files into a tar stream during transfer:
# On source:
tar -cf - /source/dir | pv | \
ssh user@remote "tar -xf - -C /destination"
# With progress and compression:
tar -zcf - /source | pv -s $(du -sb /source | awk '{print $1}') | \
ssh user@remote "tar -zxf - -C /dest"
Parallel rsync with xargs
Process multiple files simultaneously:
find /source -type f -print0 | \
xargs -0 -n 100 -P 8 rsync -az --relative --files-from=- ./ user@remote:/dest
Optimize these filesystem parameters on both source and destination:
- Increase inotify watchers:
sysctl fs.inotify.max_user_watches=524288
- Use modern filesystems (XFS, ZFS) with proper sector sizes
- Disable atime updates:
mount -o remount,noatime /path
For truly massive small-file operations, evaluate these tools:
- Unison: Bidirectional sync with conflict resolution
- lsyncd: Real-time synchronization daemon
- Rclone: Cloud-optimized file transfers