How to Automatically Mirror an Amazon S3 Bucket to Another Cloud Provider (e.g., Rackspace) Using Linux


2 views

Storing all your data with a single cloud provider like Amazon S3 poses risks—vendor lock-in, potential outages, or security breaches. A robust solution involves mirroring your S3 buckets to another provider (e.g., Rackspace Cloud Files) for redundancy and faster disaster recovery.

  • Automation: Scripted syncs to avoid manual intervention.
  • Scalability: Handle large buckets efficiently.
  • Incremental Updates: Only transfer modified files.
  • Cross-Provider Compatibility: Support S3 and non-S3 targets.

rclone is a CLI tool perfect for this. It supports 40+ cloud providers, including S3 and Rackspace Cloud Files, with incremental sync and bandwidth control.

Step 1: Install rclone

curl https://rclone.org/install.sh | sudo bash

Step 2: Configure Remote Connections

Run rclone config to set up both S3 and Rackspace as remotes. Example S3 config:

[s3source]
type = s3
provider = AWS
access_key_id = YOUR_ACCESS_KEY
secret_access_key = YOUR_SECRET_KEY
region = us-east-1

Rackspace Cloud Files config:

[rackspacebackup]
type = swift
env_auth = false
user = YOUR_USERNAME
key = YOUR_API_KEY
region = ORD  # Chicago region

Step 3: Sync Command with Cron Automation

This command mirrors S3 to Rackspace with checksum verification:

rclone sync s3source:bucket-name rackspacebackup:container-name \
--checksum \
--transfers=16 \
--swift-chunk-size=1G \
--log-file=/var/log/rclone-sync.log

Add to cron for daily syncs:

0 3 * * * /usr/bin/rclone sync [config_path] >> /var/log/rclone-cron.log 2>&1

For multi-TB buckets:

  • Use --fast-list to reduce API calls
  • Set --max-backlog=200000 for high file counts
  • Throttle bandwidth with --bwlimit=10M

Add these flags to validate syncs:

rclone check s3source:bucket-name rackspacebackup:container-name \
--size-only \
--download

Integrate with monitoring tools like Prometheus using rclone's remote control API.

For enterprise needs, consider:

  • AWS Storage Gateway (hybrid solution)
  • MinIO as a multi-cloud sync layer

In production environments, relying solely on AWS S3 for critical data storage presents single points of failure. Recent S3 outages (like US-EAST-1 in 2017) demonstrate the importance of cross-cloud replication. This guide covers practical methods to mirror S3 buckets to alternative providers like Backblaze B2, Wasabi, or Rackspace Cloud Files.

For Linux-based mirroring, we have three primary approaches:

  1. AWS CLI + Provider SDKs: Native but requires custom scripting
  2. Rclone: Unified interface for 40+ cloud providers
  3. Terraform + Cloud Sync Tools: For infrastructure-as-code setups

Installation and configuration:


# Install rclone (Debian/Ubuntu)
sudo apt-get install rclone

# Configure S3 remote
rclone config
> n/s3/your-aws-profile
> AWS_ACCESS_KEY_ID
> AWS_SECRET_ACCESS_KEY
> region=us-west-2
> endpoint=
> location_constraint=
> acl=private

# Configure Cloud Files remote
rclone config
> n/swift/rackspace-mirror
> user=your_rackspace_username
> key=your_api_key
> auth=https://identity.api.rackspacecloud.com/v2.0
> tenant=your_account_number
> region=DFW

Create a cron-driven sync solution with error handling:


#!/bin/bash

# Environment variables
LOG_FILE="/var/log/s3_mirror.log"
TIMESTAMP=$(date +"%Y-%m-%d %T")

# Sync command with bandwidth limiting
rclone sync s3:source-bucket swift:destination-container \
  --bwlimit "08:00,512 00:00,off" \
  --checksum \
  --transfers 16 \
  --low-level-retries 20 \
  --log-file $LOG_FILE \
  --log-level INFO

# Error handling
if [ $? -eq 0 ]; then
  echo "[$TIMESTAMP] Sync completed successfully" >> $LOG_FILE
else
  echo "[$TIMESTAMP] ERROR: Sync failed" >> $LOG_FILE
  # Add notification logic (SNS, Slack, etc.)
fi

For large buckets (10TB+), consider this optimized approach:


rclone sync s3:massive-bucket swift:backup-container \
  --fast-list \
  --size-only \
  --checksum \
  --transfers 32 \
  --checkers 64 \
  --retries 10 \
  --stats 30s

Implement these verification steps:

  • Daily md5sum comparisons for critical files
  • Object count verification with: rclone size s3:source-bucket
  • CloudWatch alarms for sync failures

When mirroring to Rackspace or other providers:

  • Enable S3 Intelligent-Tiering on source
  • Use --include/exclude filters for non-critical data
  • Schedule syncs during off-peak hours
  • Consider Wasabi for cost-effective hot storage