How to Use rsync –delete to Remove Extraneous Files While Preserving Directory Structure


2 views

When maintaining a web gallery application, we often face the synchronization challenge between original images and their thumbnails. The specific requirements are:

  • Original images live in /home/gallery/images/ with subdirectory organization
  • Generated thumbnails reside in /home/gallery/thumbs/ mirroring the same structure
  • We need to automatically remove thumbnails when their source images are deleted via FTP
  • We must prevent copying original images to the thumb directory

The naive approach:

rsync -r --delete --ignore-existing /home/gallery/images/ /home/gallery/thumbs/

has two critical flaws:

  1. --ignore-existing prevents new thumbnail generation when needed
  2. The command may copy original images when thumbnails don't exist

We need a two-phase approach combining rsync with find:

# First, synchronize the directory structure only
rsync -a --include='*/' --exclude='*' --delete /home/gallery/images/ /home/gallery/thumbs/

# Then remove thumbnails without corresponding originals
find /home/gallery/thumbs/ -type f | while read thumb; do
    original="/home/gallery/images/${thumb#/home/gallery/thumbs/}"
    [ -e "$original" ] || rm -v "$thumb"
done

For production use, we should add verification steps:

#!/bin/bash

SRC="/home/gallery/images"
DST="/home/gallery/thumbs"

# Dry run first
echo "=== DRY RUN ==="
rsync -an --include='*/' --exclude='*' --delete "$SRC/" "$DST/"
find "$DST" -type f | while read thumb; do
    original="$SRC/${thumb#$DST/}"
    [ -e "$original" ] || echo "Would remove: $thumb"
done

read -p "Proceed with actual cleanup? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
    rsync -a --include='*/' --exclude='*' --delete "$SRC/" "$DST/"
    find "$DST" -type f | while read thumb; do
        original="$SRC/${thumb#$DST/}"
        [ -e "$original" ] || rm -v "$thumb"
    done
fi

To schedule this as a daily cleanup task:

# Add to crontab -e
0 3 * * * /usr/local/bin/gallery_cleanup.sh >> /var/log/gallery_cleanup.log 2>&1

For rsync purists, this complex filter solution works:

rsync -avn --delete \
    --filter='R  /home/gallery/thumbs/' \
    --filter='P  /home/gallery/images/' \
    --filter='-  thumbs/*' \
    /home/gallery/images/ /home/gallery/thumbs/

Though more concise, it's harder to debug and maintain.


When building web galleries, maintaining thumbnail caches that accurately reflect the source image directory is crucial. The challenge arises when users delete original images via FTP, leaving orphaned thumbnails that waste storage and create inconsistencies.

The naive approach:

rsync -r --delete --ignore-existing /home/gallery/images /home/gallery/thumbs

has two critical flaws:

  • Copies original images when thumbnails don't exist
  • Doesn't account for the cache generation workflow

Here's the refined command that only deletes files without touching incomplete cache generation:

rsync -rv --delete --existing --ignore-existing \
    --include='*/' --exclude='*' \
    /home/gallery/images/ /home/gallery/thumbs/
Option Purpose
--existing Only affects files that exist in destination
--ignore-existing Prevents copying new files
--include='*/' Processes directory structure
--exclude='*' Ignores file content transfers

For scheduled cleanup via cron, create this script at /usr/local/bin/clean_thumbs:

#!/bin/bash
# Clean thumbnails for gallery
SRC="/home/gallery/images"
DST="/home/gallery/thumbs"

logger "Starting thumbnail cleanup"
rsync -rv --delete --existing --ignore-existing \
    --include='*/' --exclude='*' \
    "$SRC/" "$DST/"
logger "Thumbnail cleanup completed"

Always test with --dry-run first:

rsync -rvn --delete --existing --ignore-existing \
    --include='*/' --exclude='*' \
    /home/gallery/images/ /home/gallery/thumbs/

For large-scale systems, implement a parallel processing approach:

#!/bin/bash
# Process all galleries in parallel
find /var/www/galleries -maxdepth 1 -type d | while read gallery; do
    (
        rsync -rv --delete --existing --ignore-existing \
            --include='*/' --exclude='*' \
            "$gallery/images/" "$gallery/thumbs/"
    ) &
done
wait
  • Run as dedicated gallery user with limited privileges
  • Set proper umask (0027 recommended)
  • Implement filesystem monitoring (inotify) for real-time sync