When working with AWS S3 versioning, every object modification creates a new version while retaining previous versions. This becomes crucial when dealing with accidental deletions or unwanted modifications. Here's the basic versioning workflow:
# Enable versioning via AWS CLI
aws s3api put-bucket-versioning \
--bucket your-bucket-name \
--versioning-configuration Status=Enabled
The main difficulty isn't restoring individual files (which can be done through the AWS Console), but rather performing a complete bucket rollback to a specific timestamp. This requires:
- Identifying correct versions for all objects
- Handling deleted markers
- Managing thousands of objects efficiently
Here's a comprehensive Ruby script that handles point-in-time restoration:
require 'aws-sdk-s3'
require 'time'
def restore_bucket_to_point_in_time(bucket_name, target_time)
s3 = Aws::S3::Client.new
target_time = Time.parse(target_time)
# First pass: List all objects and their versions
object_versions = []
s3.list_object_versions(bucket: bucket_name).each do |response|
response.versions.each do |version|
object_versions << {
key: version.key,
version_id: version.version_id,
last_modified: version.last_modified
}
end
end
# Second pass: Restore appropriate versions
object_versions.each do |obj|
next unless obj[:last_modified] <= target_time
# Skip if current version is already correct
current = s3.get_object(
bucket: bucket_name,
key: obj[:key]
)
next if current.version_id == obj[:version_id]
# Restore the version
s3.copy_object(
bucket: bucket_name,
key: obj[:key],
copy_source: "#{bucket_name}/#{obj[:key]}?versionId=#{obj[:version_id]}"
)
end
end
# Example usage:
restore_bucket_to_point_in_time('my-image-bucket', '2023-05-15T14:30:00Z')
Production environments require additional considerations:
# For deleted objects
s3.list_object_versions(bucket: bucket_name).delete_markers.each do |marker|
if marker.last_modified <= target_time
s3.delete_object(
bucket: bucket_name,
key: marker.key,
version_id: marker.version_id
)
end
end
# Large bucket optimization
def process_large_bucket(bucket_name)
s3 = Aws::S3::Client.new
token = nil
loop do
resp = s3.list_object_versions(
bucket: bucket_name,
continuation_token: token
)
process_versions(resp.versions)
break unless resp.next_continuation_token
token = resp.next_continuation_token
end
end
For a Rails application, you can create a rake task:
# lib/tasks/s3_restore.rake
namespace :s3 do
desc "Restore S3 bucket to specific point in time"
task :restore, [:bucket, :time] => :environment do |t, args|
require 'aws-sdk-s3'
S3RestoreService.new(args[:bucket], args[:time]).execute
end
end
# Run with: rake s3:restore[my-bucket,"2023-05-15 14:30:00"]
For non-Ruby environments, consider these AWS-native approaches:
- S3 Batch Operations with manifest files
- AWS Lambda functions triggered by CloudWatch Events
- AWS Step Functions for complex restoration workflows
When dealing with large buckets:
- Implement parallel processing (e.g., using threads)
- Consider regional API rate limits
- Monitor AWS costs (PUT operations are billable)
- Use S3 inventory reports for very large buckets
When versioning is enabled on an S3 bucket, every object modification creates a new version ID while retaining previous versions. This becomes crucial for scenarios like:
- Accidental deletions by team members
- Application bugs corrupting stored files
- Rollback requirements after faulty deployments
Unlike traditional backup systems, S3 versioning requires you to:
- Identify target versions for each object
- Explicitly restore them to become current versions
- Handle delete markers appropriately
Here's a bash script example to restore objects modified before a specific timestamp:
#!/bin/bash
BUCKET="your-bucket-name"
RESTORE_DATE="2023-05-15T00:00:00Z"
aws s3api list-object-versions --bucket $BUCKET \
--query 'Versions[?LastModified<='"$RESTORE_DATE"'].[Key,VersionId]' \
--output text | while read -r key version; do
aws s3api copy-object \
--bucket $BUCKET \
--key "$key" \
--copy-source "$BUCKET/$key?versionId=$version"
done
For Ruby developers, here's a Rake task implementation:
require 'aws-sdk-s3'
namespace :s3 do
desc "Restore bucket to point-in-time"
task :restore, [:date] => :environment do |t, args|
s3 = Aws::S3::Client.new
bucket = 'your-image-bucket'
cutoff = Time.parse(args[:date])
# Fetch all object versions
versions = s3.list_object_versions(bucket: bucket).each_with_object([]) do |resp, arr|
arr.concat(resp.versions)
end
versions.select { |v| v.last_modified <= cutoff }.each do |version|
begin
s3.copy_object(
bucket: bucket,
key: version.key,
copy_source: "#{bucket}/#{version.key}?versionId=#{version.version_id}"
)
puts "Restored: #{version.key} (v#{version.version_id})"
rescue Aws::S3::Errors::ServiceError => e
puts "Error restoring #{version.key}: #{e.message}"
end
end
end
end
Consider these additional scenarios:
# To remove delete markers (un-delete files):
aws s3api list-object-versions --bucket $BUCKET \
--query 'DeleteMarkers[?IsLatest==true].[Key]' \
--output text | xargs -I {} aws s3api delete-object \
--bucket $BUCKET --key {} --version-id
- For large buckets, implement pagination (NextKeyMarker/NextVersionIdMarker)
- Consider parallel processing for faster restoration
- Monitor S3 API request limits (5500 PUT/COPY requests per second per prefix)
For enterprise-scale restoration:
# Generate manifest.csv of target versions
aws s3api list-object-versions --bucket $BUCKET \
--query 'Versions[?LastModified<=2023-05-15T00:00:00Z].[Key,VersionId]' \
--output text | awk '{print $1","$2}' > manifest.csv
# Create batch job
aws s3control create-job \
--account-id YOUR_ACCOUNT_ID \
--operation '{"S3PutObjectCopy": {"TargetResource": "arn:aws:s3:::'$BUCKET'"}}' \
--manifest '{"Spec": {"Format": "S3BatchOperations_CSV_20180820","Fields": ["Bucket","Key","VersionId"]},"Location": {"ObjectArn": "arn:aws:s3:::'$BUCKET'/manifest.csv","ETag": "'$(md5sum manifest.csv | awk '{print $1}')'"}}' \
--report '{"Bucket": "arn:aws:s3:::'$BUCKET'","Prefix": "batch-reports","Format": "Report_CSV_20180820","Enabled": true,"ReportScope": "AllTasks"}' \
--priority 10 \
--role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/batch-operations-role