When performing large-scale data migrations, copying entire directory trees can be both time-consuming and storage-intensive. The --link-dest
parameter in rsync offers an elegant solution by creating hardlinks instead of full copies, but getting it to work correctly requires precise syntax.
The command you tried:
rsync -a --link-dest=$DATA $DATA $DATA/../upgrade_tmp
fails because rsync's hardlinking mechanism requires distinct source and destination paths. When both point to the same filesystem location, rsync defaults to regular copying behavior.
Here's the correct approach for creating a hardlinked copy:
rsync -aH --link-dest=/original/path/ /original/path/ /destination/path/
Key parameters:
-a
: Archive mode (preserves permissions, timestamps)-H
: Explicitly preserve hardlinks--link-dest
: Specifies reference directory for hardlinking
For a production data migration scenario:
DATA=/var/www/production
rsync -aH --link-dest=$DATA \
--exclude='*.tmp' \
--exclude='cache/*' \
$DATA/ /var/www/staging_upgrade/
This creates a staging area where:
- Unchanged files remain as hardlinks
- Modified files get copied only when changed
- Temporary files are excluded from migration
Confirm successful hardlinking with:
find /destination/path -type f -links +1 -exec ls -i {} \;
This shows files with multiple hardlinks (same inode numbers).
For complex migrations with multiple source directories:
rsync -aH --link-dest=/base/version1/ \
--link-dest=/base/version2/ \
/base/version2/ /new/deployment/
Rsync will check both version directories when creating hardlinks.
When attempting to create a space-efficient copy of a directory structure using hardlinks, many developers stumble upon rsync's --link-dest
behavior. The key issue arises when trying to hardlink files within the same filesystem while maintaining the original directory structure.
The command you tried:
rsync -a --link-dest=$DATA $DATA $DATA/../upgrade_tmp
fails because rsync's --link-dest
compares against the destination directory, not the source. When source and link-dest are identical, rsync defaults to regular copying.
Here's the correct approach for creating a hardlinked copy:
rsync -aH --delete --link-dest=/path/to/original /path/to/original/ /path/to/destination/
Key flags explanation:
-a
: Archive mode (preserves permissions, timestamps, etc.)-H
: Preserve hardlinks (critical for this operation)--link-dest
: Reference directory for hardlinking unchanged files
For a data migration scenario where you need to stage changes:
# First create the hardlinked copy
rsync -aH --link-dest=/data/production /data/production /data/staging
# Perform your migrations in /data/staging
# (modify only files that need changing)
# Final atomic switch
mv /data/production /data/production_old
mv /data/staging /data/production
When dealing with large directory structures:
- Add
--stats
to monitor transfer efficiency - Consider
--numeric-ids
for consistent permissions - Use
--partial
for network transfers
For purely local operations, cp
might be simpler:
cp -al /original /backup
The -l
flag creates hardlinks instead of copies.
If hardlinks aren't being created:
- Verify both directories are on the same filesystem
- Check available inodes (
df -i
) - Ensure you have write permissions in destination