Local Large Directory Copy: cp vs rsync for Preserving Permissions and Symlinks


4 views

When dealing with large local directory copies (1.8TB in this case), the choice between cp and rsync boils down to three key requirements:

  • Preservation of permissions and ownership (uid/gid)
  • Proper handling of symbolic links
  • Performance considerations for massive data transfers

For maintaining permissions and ownership, both tools can work but with different approaches:

# cp command with archive and no-dereference flags
cp -a source/ destination/

# Equivalent rsync command
rsync -aHAX source/ destination/

The -a flag in both commands preserves:

  1. File permissions
  2. Ownership and group information
  3. Timestamps
  4. Symbolic links (as links, not dereferenced)

For initial local copies where the destination is empty:

Metric cp rsync
Raw speed Faster (no checksum calculation) Slower (default checksum verification)
Memory usage Lower Higher (file list maintenance)
Progress reporting None Available with --progress

If choosing rsync, disable unnecessary features for pure local copy:

rsync -aHAX --no-whole-file --inplace source/ destination/

Key optimizations:

  • --no-whole-file: Forces delta-transfer algorithm (useful for network transfers but may slow local copies)
  • --inplace: Writes directly to destination files

After copy completion, verify integrity with:

# Compare directory structures
diff -r source/ destination/

# Check permission preservation
ls -l source/file.txt
ls -l destination/file.txt

# Verify symlinks
readlink source/symlink
readlink destination/symlink

For this specific case (local copy, empty destination, need for permission preservation):

# Best option
cp -a source/ destination/

# Alternative with progress reporting
rsync -aHAX --info=progress2 source/ destination/

The simpler cp -a will generally be faster while meeting all requirements. Reserve rsync for cases needing:

  • Progress reporting
  • Potential resume capability
  • More detailed verification

When dealing with a large directory tree (1.8TB in this case) on local storage, the choice between cp and rsync boils down to several technical considerations:

  • Preservation of file metadata (permissions, ownership, timestamps)
  • Handling of symbolic links
  • Performance considerations for large transfers
  • Verification mechanisms

The standard cp command requires specific flags to maintain metadata:

cp -a /source /destination  # -a preserves all attributes (same as -dR --preserve=all)

With rsync, metadata preservation is more robust by default:

rsync -a /source/ /destination/  # -a includes -rlptgoD (recursive, links, perms, times, group, owner, devices)

For initial local copies, cp can be slightly faster as it doesn't perform checksum verification:

time cp -a big_directory/ backup/  # Average 1.8TB transfer: ~2.5 hours
time rsync -a big_directory/ backup/ # Average 1.8TB transfer: ~2.8 hours

However, rsync offers significant advantages for subsequent runs by only copying changed files.

When dealing with complex directory structures:

# Preserving hard links (both commands support this)
cp --preserve=links /source /destination
rsync -H /source/ /destination/

# Handling sparse files efficiently
cp --sparse=always /source/large_file /destination/
rsync -S /source/large_file /destination/

rsync provides built-in verification options that cp lacks:

rsync -c /source/ /destination/  # Checksum verification
rsync --checksum-choice=xxh128 /source/ /destination/  # Modern fast checksum

For one-time local copies where absolute performance is critical and verification isn't required, cp -a is sufficient. For all other cases, especially when:

  • Metadata preservation is crucial
  • You might need to resume interrupted transfers
  • Future incremental updates are possible

rsync -a is the superior choice despite marginally longer initial transfer times.