Storing all your data with a single cloud provider like Amazon S3 poses risks—vendor lock-in, potential outages, or security breaches. A robust solution involves mirroring your S3 buckets to another provider (e.g., Rackspace Cloud Files) for redundancy and faster disaster recovery.
- Automation: Scripted syncs to avoid manual intervention.
- Scalability: Handle large buckets efficiently.
- Incremental Updates: Only transfer modified files.
- Cross-Provider Compatibility: Support S3 and non-S3 targets.
rclone is a CLI tool perfect for this. It supports 40+ cloud providers, including S3 and Rackspace Cloud Files, with incremental sync and bandwidth control.
Step 1: Install rclone
curl https://rclone.org/install.sh | sudo bash
Step 2: Configure Remote Connections
Run rclone config
to set up both S3 and Rackspace as remotes. Example S3 config:
[s3source]
type = s3
provider = AWS
access_key_id = YOUR_ACCESS_KEY
secret_access_key = YOUR_SECRET_KEY
region = us-east-1
Rackspace Cloud Files config:
[rackspacebackup]
type = swift
env_auth = false
user = YOUR_USERNAME
key = YOUR_API_KEY
region = ORD # Chicago region
Step 3: Sync Command with Cron Automation
This command mirrors S3 to Rackspace with checksum verification:
rclone sync s3source:bucket-name rackspacebackup:container-name \
--checksum \
--transfers=16 \
--swift-chunk-size=1G \
--log-file=/var/log/rclone-sync.log
Add to cron for daily syncs:
0 3 * * * /usr/bin/rclone sync [config_path] >> /var/log/rclone-cron.log 2>&1
For multi-TB buckets:
- Use
--fast-list
to reduce API calls - Set
--max-backlog=200000
for high file counts - Throttle bandwidth with
--bwlimit=10M
Add these flags to validate syncs:
rclone check s3source:bucket-name rackspacebackup:container-name \
--size-only \
--download
Integrate with monitoring tools like Prometheus using rclone's remote control API.
For enterprise needs, consider:
- AWS Storage Gateway (hybrid solution)
- MinIO as a multi-cloud sync layer
In production environments, relying solely on AWS S3 for critical data storage presents single points of failure. Recent S3 outages (like US-EAST-1 in 2017) demonstrate the importance of cross-cloud replication. This guide covers practical methods to mirror S3 buckets to alternative providers like Backblaze B2, Wasabi, or Rackspace Cloud Files.
For Linux-based mirroring, we have three primary approaches:
- AWS CLI + Provider SDKs: Native but requires custom scripting
- Rclone: Unified interface for 40+ cloud providers
- Terraform + Cloud Sync Tools: For infrastructure-as-code setups
Installation and configuration:
# Install rclone (Debian/Ubuntu)
sudo apt-get install rclone
# Configure S3 remote
rclone config
> n/s3/your-aws-profile
> AWS_ACCESS_KEY_ID
> AWS_SECRET_ACCESS_KEY
> region=us-west-2
> endpoint=
> location_constraint=
> acl=private
# Configure Cloud Files remote
rclone config
> n/swift/rackspace-mirror
> user=your_rackspace_username
> key=your_api_key
> auth=https://identity.api.rackspacecloud.com/v2.0
> tenant=your_account_number
> region=DFW
Create a cron-driven sync solution with error handling:
#!/bin/bash
# Environment variables
LOG_FILE="/var/log/s3_mirror.log"
TIMESTAMP=$(date +"%Y-%m-%d %T")
# Sync command with bandwidth limiting
rclone sync s3:source-bucket swift:destination-container \
--bwlimit "08:00,512 00:00,off" \
--checksum \
--transfers 16 \
--low-level-retries 20 \
--log-file $LOG_FILE \
--log-level INFO
# Error handling
if [ $? -eq 0 ]; then
echo "[$TIMESTAMP] Sync completed successfully" >> $LOG_FILE
else
echo "[$TIMESTAMP] ERROR: Sync failed" >> $LOG_FILE
# Add notification logic (SNS, Slack, etc.)
fi
For large buckets (10TB+), consider this optimized approach:
rclone sync s3:massive-bucket swift:backup-container \
--fast-list \
--size-only \
--checksum \
--transfers 32 \
--checkers 64 \
--retries 10 \
--stats 30s
Implement these verification steps:
- Daily md5sum comparisons for critical files
- Object count verification with:
rclone size s3:source-bucket
- CloudWatch alarms for sync failures
When mirroring to Rackspace or other providers:
- Enable S3 Intelligent-Tiering on source
- Use --include/exclude filters for non-critical data
- Schedule syncs during off-peak hours
- Consider Wasabi for cost-effective hot storage