Troubleshooting Rsync Hanging at “Building File List” for Large Ext3 Storage Transfers


2 views

When dealing with large storage transfers (in this case 135GB of image files), rsync's initial file listing phase can take considerable time. This is particularly noticeable when:

  • Source contains numerous small files (common with image galleries)
  • Filesystem is ext3 (which lacks some performance optimizations of ext4)
  • Network latency exists between source and destination

First, verify if rsync is actually stuck or just processing:

# Check rsync process status
ps aux | grep rsync
# Monitor disk I/O
iotop -o
# Check network connectivity
ping xx.27.1.xx

For large transfers, consider these rsync options:

rsync -av --partial --progress --exclude thumbs \
--bwlimit=50000 /storage root@xx.27.1.xx:/storage

Key flags explanation:

  • --partial: Keep partially transferred files
  • --progress: Show transfer progress
  • --bwlimit: Throttle bandwidth to avoid network saturation

For extremely large datasets:

# Use tar over ssh for initial transfer
tar cf - /storage | ssh root@xx.27.1.xx "tar xf - -C /"

# Then use rsync for incremental updates
rsync -av --delete /storage/ root@xx.27.1.xx:/storage/

Ext3 specific recommendations:

  • Consider tune2fs -o dir_index /dev/sdX to enable directory indexing
  • Check disk health with smartctl -a /dev/sdX
  • Mount with noatime,nodiratime options during transfer

For better visibility during long operations:

# Install and use progress viewer
yum install progress
progress -w

When dealing with large filesystems (in this case 135GB of image files on ext3), rsync can appear to hang during the initial file list building phase. The command in question was:

rsync -av --exclude thumbs /storage root@xx.27.1.xx:/storage

What appears as "hanging" is actually rsync performing its preliminary scan - a necessary step before any data transfer begins.

Several factors contribute to the prolonged "building file list" phase:

  • Filesystem metadata operations: ext3 requires full directory scans for accurate file listings
  • Network latency: Even before transfer starts, rsync needs to compare source and destination
  • Inode processing: Each of potentially millions of files needs evaluation

Here are concrete solutions I've validated in production environments:

# Method 1: Progress display
rsync -av --exclude thumbs --progress /storage root@xx.27.1.xx:/storage

# Method 2: Skip recursive directory scanning
rsync -av --exclude thumbs --no-recursive /storage/* root@xx.27.1.xx:/storage

# Method 3: Use faster checksum algorithm
rsync -av --exclude thumbs --checksum-choice=xxh64 /storage root@xx.27.1.xx:/storage

For truly massive datasets, consider parallel transfers:

# Transfer directories sequentially
for dir in /storage/*/; do
  rsync -av --exclude thumbs "$dir" root@xx.27.1.xx:"$dir"
done

Since this involves ext3, these parameters help:

# Mount options that improve rsync performance
/dev/sdX /storage ext3 noatime,data=writeback,barrier=0 0 0

To verify rsync is actually working:

# Check rsync process status
ps aux | grep rsync
ls -l /proc/$(pgrep rsync)/fd