When maintaining a web gallery application, we often face the synchronization challenge between original images and their thumbnails. The specific requirements are:
- Original images live in
/home/gallery/images/
with subdirectory organization - Generated thumbnails reside in
/home/gallery/thumbs/
mirroring the same structure - We need to automatically remove thumbnails when their source images are deleted via FTP
- We must prevent copying original images to the thumb directory
The naive approach:
rsync -r --delete --ignore-existing /home/gallery/images/ /home/gallery/thumbs/
has two critical flaws:
--ignore-existing
prevents new thumbnail generation when needed- The command may copy original images when thumbnails don't exist
We need a two-phase approach combining rsync with find:
# First, synchronize the directory structure only
rsync -a --include='*/' --exclude='*' --delete /home/gallery/images/ /home/gallery/thumbs/
# Then remove thumbnails without corresponding originals
find /home/gallery/thumbs/ -type f | while read thumb; do
original="/home/gallery/images/${thumb#/home/gallery/thumbs/}"
[ -e "$original" ] || rm -v "$thumb"
done
For production use, we should add verification steps:
#!/bin/bash
SRC="/home/gallery/images"
DST="/home/gallery/thumbs"
# Dry run first
echo "=== DRY RUN ==="
rsync -an --include='*/' --exclude='*' --delete "$SRC/" "$DST/"
find "$DST" -type f | while read thumb; do
original="$SRC/${thumb#$DST/}"
[ -e "$original" ] || echo "Would remove: $thumb"
done
read -p "Proceed with actual cleanup? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
rsync -a --include='*/' --exclude='*' --delete "$SRC/" "$DST/"
find "$DST" -type f | while read thumb; do
original="$SRC/${thumb#$DST/}"
[ -e "$original" ] || rm -v "$thumb"
done
fi
To schedule this as a daily cleanup task:
# Add to crontab -e
0 3 * * * /usr/local/bin/gallery_cleanup.sh >> /var/log/gallery_cleanup.log 2>&1
For rsync purists, this complex filter solution works:
rsync -avn --delete \
--filter='R /home/gallery/thumbs/' \
--filter='P /home/gallery/images/' \
--filter='- thumbs/*' \
/home/gallery/images/ /home/gallery/thumbs/
Though more concise, it's harder to debug and maintain.
When building web galleries, maintaining thumbnail caches that accurately reflect the source image directory is crucial. The challenge arises when users delete original images via FTP, leaving orphaned thumbnails that waste storage and create inconsistencies.
The naive approach:
rsync -r --delete --ignore-existing /home/gallery/images /home/gallery/thumbs
has two critical flaws:
- Copies original images when thumbnails don't exist
- Doesn't account for the cache generation workflow
Here's the refined command that only deletes files without touching incomplete cache generation:
rsync -rv --delete --existing --ignore-existing \
--include='*/' --exclude='*' \
/home/gallery/images/ /home/gallery/thumbs/
Option | Purpose |
---|---|
--existing | Only affects files that exist in destination |
--ignore-existing | Prevents copying new files |
--include='*/' | Processes directory structure |
--exclude='*' | Ignores file content transfers |
For scheduled cleanup via cron, create this script at /usr/local/bin/clean_thumbs
:
#!/bin/bash
# Clean thumbnails for gallery
SRC="/home/gallery/images"
DST="/home/gallery/thumbs"
logger "Starting thumbnail cleanup"
rsync -rv --delete --existing --ignore-existing \
--include='*/' --exclude='*' \
"$SRC/" "$DST/"
logger "Thumbnail cleanup completed"
Always test with --dry-run
first:
rsync -rvn --delete --existing --ignore-existing \
--include='*/' --exclude='*' \
/home/gallery/images/ /home/gallery/thumbs/
For large-scale systems, implement a parallel processing approach:
#!/bin/bash
# Process all galleries in parallel
find /var/www/galleries -maxdepth 1 -type d | while read gallery; do
(
rsync -rv --delete --existing --ignore-existing \
--include='*/' --exclude='*' \
"$gallery/images/" "$gallery/thumbs/"
) &
done
wait
- Run as dedicated gallery user with limited privileges
- Set proper umask (0027 recommended)
- Implement filesystem monitoring (inotify) for real-time sync