How to Roll Back an S3 Versioned Bucket to Specific Point-in-Time: Complete Guide with Ruby Examples


1 views

When working with AWS S3 versioning, every object modification creates a new version while retaining previous versions. This becomes crucial when dealing with accidental deletions or unwanted modifications. Here's the basic versioning workflow:


# Enable versioning via AWS CLI
aws s3api put-bucket-versioning \
    --bucket your-bucket-name \
    --versioning-configuration Status=Enabled

The main difficulty isn't restoring individual files (which can be done through the AWS Console), but rather performing a complete bucket rollback to a specific timestamp. This requires:

  • Identifying correct versions for all objects
  • Handling deleted markers
  • Managing thousands of objects efficiently

Here's a comprehensive Ruby script that handles point-in-time restoration:


require 'aws-sdk-s3'
require 'time'

def restore_bucket_to_point_in_time(bucket_name, target_time)
  s3 = Aws::S3::Client.new
  target_time = Time.parse(target_time)
  
  # First pass: List all objects and their versions
  object_versions = []
  s3.list_object_versions(bucket: bucket_name).each do |response|
    response.versions.each do |version|
      object_versions << {
        key: version.key,
        version_id: version.version_id,
        last_modified: version.last_modified
      }
    end
  end

  # Second pass: Restore appropriate versions
  object_versions.each do |obj|
    next unless obj[:last_modified] <= target_time
    
    # Skip if current version is already correct
    current = s3.get_object(
      bucket: bucket_name,
      key: obj[:key]
    )
    next if current.version_id == obj[:version_id]

    # Restore the version
    s3.copy_object(
      bucket: bucket_name,
      key: obj[:key],
      copy_source: "#{bucket_name}/#{obj[:key]}?versionId=#{obj[:version_id]}"
    )
  end
end

# Example usage:
restore_bucket_to_point_in_time('my-image-bucket', '2023-05-15T14:30:00Z')

Production environments require additional considerations:


# For deleted objects
s3.list_object_versions(bucket: bucket_name).delete_markers.each do |marker|
  if marker.last_modified <= target_time
    s3.delete_object(
      bucket: bucket_name,
      key: marker.key,
      version_id: marker.version_id
    )
  end
end

# Large bucket optimization
def process_large_bucket(bucket_name)
  s3 = Aws::S3::Client.new
  token = nil
  
  loop do
    resp = s3.list_object_versions(
      bucket: bucket_name,
      continuation_token: token
    )
    
    process_versions(resp.versions)
    
    break unless resp.next_continuation_token
    token = resp.next_continuation_token
  end
end

For a Rails application, you can create a rake task:


# lib/tasks/s3_restore.rake
namespace :s3 do
  desc "Restore S3 bucket to specific point in time"
  task :restore, [:bucket, :time] => :environment do |t, args|
    require 'aws-sdk-s3'
    S3RestoreService.new(args[:bucket], args[:time]).execute
  end
end

# Run with: rake s3:restore[my-bucket,"2023-05-15 14:30:00"]

For non-Ruby environments, consider these AWS-native approaches:

  • S3 Batch Operations with manifest files
  • AWS Lambda functions triggered by CloudWatch Events
  • AWS Step Functions for complex restoration workflows

When dealing with large buckets:

  • Implement parallel processing (e.g., using threads)
  • Consider regional API rate limits
  • Monitor AWS costs (PUT operations are billable)
  • Use S3 inventory reports for very large buckets

When versioning is enabled on an S3 bucket, every object modification creates a new version ID while retaining previous versions. This becomes crucial for scenarios like:

  • Accidental deletions by team members
  • Application bugs corrupting stored files
  • Rollback requirements after faulty deployments

Unlike traditional backup systems, S3 versioning requires you to:

  1. Identify target versions for each object
  2. Explicitly restore them to become current versions
  3. Handle delete markers appropriately

Here's a bash script example to restore objects modified before a specific timestamp:


#!/bin/bash
BUCKET="your-bucket-name"
RESTORE_DATE="2023-05-15T00:00:00Z"

aws s3api list-object-versions --bucket $BUCKET \
  --query 'Versions[?LastModified<='"$RESTORE_DATE"'].[Key,VersionId]' \
  --output text | while read -r key version; do
    aws s3api copy-object \
      --bucket $BUCKET \
      --key "$key" \
      --copy-source "$BUCKET/$key?versionId=$version"
done

For Ruby developers, here's a Rake task implementation:


require 'aws-sdk-s3'

namespace :s3 do
  desc "Restore bucket to point-in-time"
  task :restore, [:date] => :environment do |t, args|
    s3 = Aws::S3::Client.new
    bucket = 'your-image-bucket'
    cutoff = Time.parse(args[:date])

    # Fetch all object versions
    versions = s3.list_object_versions(bucket: bucket).each_with_object([]) do |resp, arr|
      arr.concat(resp.versions)
    end

    versions.select { |v| v.last_modified <= cutoff }.each do |version|
      begin
        s3.copy_object(
          bucket: bucket,
          key: version.key,
          copy_source: "#{bucket}/#{version.key}?versionId=#{version.version_id}"
        )
        puts "Restored: #{version.key} (v#{version.version_id})"
      rescue Aws::S3::Errors::ServiceError => e
        puts "Error restoring #{version.key}: #{e.message}"
      end
    end
  end
end

Consider these additional scenarios:


# To remove delete markers (un-delete files):
aws s3api list-object-versions --bucket $BUCKET \
  --query 'DeleteMarkers[?IsLatest==true].[Key]' \
  --output text | xargs -I {} aws s3api delete-object \
  --bucket $BUCKET --key {} --version-id
  • For large buckets, implement pagination (NextKeyMarker/NextVersionIdMarker)
  • Consider parallel processing for faster restoration
  • Monitor S3 API request limits (5500 PUT/COPY requests per second per prefix)

For enterprise-scale restoration:


# Generate manifest.csv of target versions
aws s3api list-object-versions --bucket $BUCKET \
  --query 'Versions[?LastModified<=2023-05-15T00:00:00Z].[Key,VersionId]' \
  --output text | awk '{print $1","$2}' > manifest.csv

# Create batch job
aws s3control create-job \
  --account-id YOUR_ACCOUNT_ID \
  --operation '{"S3PutObjectCopy": {"TargetResource": "arn:aws:s3:::'$BUCKET'"}}' \
  --manifest '{"Spec": {"Format": "S3BatchOperations_CSV_20180820","Fields": ["Bucket","Key","VersionId"]},"Location": {"ObjectArn": "arn:aws:s3:::'$BUCKET'/manifest.csv","ETag": "'$(md5sum manifest.csv | awk '{print $1}')'"}}' \
  --report '{"Bucket": "arn:aws:s3:::'$BUCKET'","Prefix": "batch-reports","Format": "Report_CSV_20180820","Enabled": true,"ReportScope": "AllTasks"}' \
  --priority 10 \
  --role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/batch-operations-role