When dealing with SFTP servers where clients continuously upload large files, a common challenge arises: how to only transfer completed files while ignoring those currently being written. Standard rsync operations will attempt to copy these in-progress files, which can lead to corrupted transfers and wasted bandwidth.
The key is to implement a check that verifies files haven't been modified for a certain period. Here's a bash script approach:
#!/bin/bash SOURCE_DIR="/sftp/uploads" DESTINATION="user@backup-server:/backups" STABILITY_PERIOD=300 # 5 minutes in seconds # Find files not modified in last 5 minutes find "$SOURCE_DIR" -type f -mmin +$(($STABILITY_PERIOD/60)) -print0 | \ rsync -av --files-from=- --from0 "$SOURCE_DIR" "$DESTINATION"
For more precise detection of files being written, you can check which files are currently open:
#!/bin/bash SOURCE_DIR="/sftp/uploads" DESTINATION="user@backup-server:/backups" # Get list of files not currently opened by any process comm -23 \ <(find "$SOURCE_DIR" -type f | sort) \ <(lsof +D "$SOURCE_DIR" | awk 'NR>1 {print $9}' | sort) | \ rsync -av --files-from=- "$SOURCE_DIR" "$DESTINATION"
For ongoing transfers, consider using rsync's timeout feature:
rsync -av --timeout=60 --partial --progress /sftp/uploads/ user@backup-server:/backups/
For maximum reliability, combine multiple verification methods:
#!/bin/bash SOURCE="/sftp/uploads" DEST="user@backup-server:/backups" LOG="/var/log/sftp_backups.log" { echo "Starting backup at $(date)" # Step 1: Find files not modified in last 10 minutes STABLE_FILES=$(mktemp) find "$SOURCE" -type f -mmin +10 -print0 > "$STABLE_FILES" # Step 2: Filter out files currently open by any process OPEN_FILES=$(mktemp) lsof +D "$SOURCE" 2>/dev/null | awk 'NR>1 {print $9}' | sort > "$OPEN_FILES" # Step 3: Create final transfer list TRANSFER_LIST=$(mktemp) comm -23 \ <(cat "$STABLE_FILES" | tr '\\0' '\\n' | sort) \ "$OPEN_FILES" | tr '\\n' '\\0' > "$TRANSFER_LIST" # Step 4: Execute rsync rsync -av --files-from="$TRANSFER_LIST" --from0 "$SOURCE" "$DEST" # Cleanup rm "$STABLE_FILES" "$OPEN_FILES" "$TRANSFER_LIST" echo "Backup completed at $(date)" } >> "$LOG" 2>&1
For complex scenarios, consider these alternatives:
- csync2: Cluster synchronization tool with file verification
- lsyncd: Live syncing daemon with various monitoring options
- incron: Trigger actions on filesystem events
When managing an SFTP server that receives continuous large file uploads from clients, copying only complete files becomes crucial. Attempting to process partially uploaded files can lead to data corruption, processing errors, or incomplete datasets. The standard rsync behavior doesn't inherently distinguish between complete and in-progress transfers.
To safely identify files that aren't actively being written to, we can use several approaches:
# Method 1: Check if file is open by any process
lsof /path/to/file | grep 'REG'
# Method 2: Compare file size changes over time
stat -c %s /path/to/file
sleep 5
stat -c %s /path/to/file
Here are three practical rsync-based solutions:
1. Using --ignore-existing with Size Checks
#!/bin/bash
# First pass to identify stable files
find /sftp/uploads -type f -mmin +5 -exec stat -c "%s %n" {} + > stable_files.list
# Rsync only files that haven't changed in 5 minutes
rsync -avz --files-from=<(awk '{print $2}' stable_files.list) \
--ignore-existing \
user@sftp-server:/ /backup/destination/
2. Combining with inotifywait
# Monitor for file closure events
inotifywait -m -e close_write --format '%w%f' /sftp/uploads |
while read file
do
rsync -avz "$file" user@backup-server:/destination/
done
3. LVM Snapshot Approach
# Create LVM snapshot
lvcreate -L10G -s -n sftp-snap /dev/vg/sftp-lv
# Mount snapshot and rsync from it
mount /dev/vg/sftp-snap /mnt/sftp-snapshot
rsync -avz /mnt/sftp-snapshot/uploads/ user@backup-server:/destination/
# Cleanup
umount /mnt/sftp-snapshot
lvremove /dev/vg/sftp-snap
For more complex scenarios, these alternatives might be better suited:
- LFTP: Supports mirroring with better transfer control
- csync2: Designed for cluster synchronization
- Unison: Two-way file synchronization
When implementing any of these solutions:
- Always test with non-production data first
- Implement proper logging for troubleshooting
- Consider adding checksum verification for critical files
- Monitor disk space when using snapshot-based approaches
- Set up proper error handling in your scripts
# Example logging implementation
rsync -avz --log-file=/var/log/rsync_$(date +%Y%m%d).log \
--files-from=stable_files.list \
user@sftp-server:/ /backup/destination/