How to Capture and Process Changed Files List from Rsync in Bash Scripts


2 views

When automating server synchronization tasks, we often need more than just file transfers - we need actionable data about what changed. The standard rsync output doesn't make it easy to programmatically identify which files were actually modified during synchronization.

The -v (verbose) flag in rsync provides the key to solving this. Here's an optimized approach to capture changed files:

# Improved rsync command with output capture
rsync -avzi --out-format="%n" $source_directory $destination_directory | \
while read -r changed_file; do
    if [[ -n "$changed_file" ]]; then
        changed_items+=("$changed_file")
    fi
done

Here's a production-ready script demonstrating the full solution:

#!/bin/bash

# Configuration
source_dir="/data/app/files"
dest_dir="/mnt/nas/backup/files"
log_file="/var/log/sync_changes.log"
user_group="appuser:appgroup"

# Initialize array
declare -a changed_items=()

# Execute rsync and capture changes
echo "$(date) - Starting synchronization" >> "$log_file"
rsync -avzi --out-format="%n" "$source_dir/" "$dest_dir" | \
while IFS= read -r file; do
    if [[ -n "$file" ]]; then
        full_path="$dest_dir/$file"
        changed_items+=("$full_path")
        echo "$(date) - Changed: $full_path" >> "$log_file"
    fi
done

# Process changed files
for item in "${changed_items[@]}"; do
    if [[ -e "$item" ]]; then
        chown "$user_group" "$item"
        echo "$(date) - Permissions updated: $item" >> "$log_file"
    fi
done

echo "$(date) - Synchronization completed" >> "$log_file"

For more complex scenarios, consider these enhancements:

# 1. Dry-run for change preview
rsync -avzi --dry-run --out-format="%n" "$source_dir/" "$dest_dir"

# 2. Exclude patterns
rsync -avzi --exclude='*.tmp' --out-format="%n" "$source_dir/" "$dest_dir"

# 3. Parallel processing of changed files
printf "%s\n" "${changed_items[@]}" | xargs -P 4 -I {} chown "$user_group" "{}"

When dealing with large file sets:

  • Use --itemize-changes instead of -v for machine-readable output
  • Limit depth with --max-depth when appropriate
  • Consider file system monitoring tools (inotify) for real-time scenarios

Always include proper error checking:

if ! rsync -avzi --out-format="%n" "$source_dir/" "$dest_dir"; then
    echo "ERROR: Rsync failed with exit code $?" >&2
    exit 1
fi

When automating server synchronization tasks using rsync, many administrators need to perform post-sync operations only on changed files. The standard rsync output doesn't make this easy to implement programmatically.

The most reliable method is to use rsync's built-in output formatting combined with grep:

# Run rsync with machine-readable output
rsync -avzi --out-format='%n' source/ destination/ | \
    grep -v '/$' > changed_files.txt

# Read into array
mapfile -t changed_items < changed_files.txt

# Process changed files
for file in "${changed_items[@]}"
do
    chown -R user:usergroup "$file"
    echo "Changed ownership for: $file"
done

If you can't use --out-format, parse the verbose output:

changed_items=()
while IFS= read -r line; do
    if [[ $line =~ ^[^*].*[^\/]$ ]]; then
        changed_items+=("${line#* }")
    fi
done < <(rsync -avz source/ destination/)

printf "%s\n" "${changed_items[@]}" > changed_files.log

Consider these improvements for production scripts:

# Add error handling
if ! rsync_output=$(rsync -avzi --out-format='%n' source/ destination/ 2>&1); then
    echo "Rsync failed!" >&2
    exit 1
fi

# Filter directories and empty lines
changed_items=($(grep -vE '/$|^$' <<< "$rsync_output"))

# Verify array before processing
if [[ ${#changed_items[@]} -eq 0 ]]; then
    echo "No files changed"
    exit 0
fi

For large sync operations, use a temporary file instead of array storage:

tmpfile=$(mktemp)
rsync -avzi --out-format='%n' source/ destination/ | \
    grep -v '/$' > "$tmpfile"

while IFS= read -r file; do
    [ -e "$file" ] && chown user:usergroup "$file"
done < "$tmpfile"

rm "$tmpfile"