How to Batch Delete S3 Objects with Specific Prefix Using AWS CLI and SDKs


1 views

When dealing with Amazon S3 buckets containing hundreds of thousands of objects, performing targeted deletions can be challenging. The standard S3 console interface isn't designed for bulk operations on specific prefixes, especially when you need to delete a large subset of objects matching a particular pattern.

S3 provides multiple ways to delete objects:

  1. Single object deletion (inefficient for bulk operations)
  2. Multi-object deletion (max 1000 objects per request)
  3. Batch operations (most efficient for large-scale deletions)

The most straightforward approach is using AWS CLI with S3's filtering capabilities:

aws s3 rm s3://your-bucket-name/abc_1 --recursive

For more control and verification:

# First list objects to verify
aws s3api list-objects-v2 --bucket your-bucket-name --prefix abc_1 --output json

# Then perform deletion
aws s3api delete-objects --bucket your-bucket-name --delete "$(aws s3api list-objects-v2 --bucket your-bucket-name --prefix abc_1 --output json --query '{Objects: Contents[].{Key: Key}}')"

For programmatic control or integration with other systems:

import boto3

def delete_s3_prefix(bucket_name, prefix):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    
    # List and delete in batches of 1000
    delete_us = {'Objects': []}
    for obj in bucket.objects.filter(Prefix=prefix):
        delete_us['Objects'].append({'Key': obj.key})
        
        # S3 API limits deletions to 1000 objects per request
        if len(delete_us['Objects']) >= 1000:
            bucket.delete_objects(Delete=delete_us)
            delete_us = {'Objects': []}
    
    # Delete remaining objects
    if len(delete_us['Objects']):
        bucket.delete_objects(Delete=delete_us)

# Usage
delete_s3_prefix('your-bucket-name', 'abc_1')

For extremely large buckets (millions of objects), consider:

  • S3 Batch Operations - Create a manifest file and use S3's batch processing
  • Step Functions - Break the operation into smaller chunks
  • Lambda - Use parallel execution with multiple Lambda functions

Before running any deletion script:

  1. Enable versioning or create a backup
  2. Start with a dry-run to verify objects to be deleted
  3. Monitor AWS costs (DELETE operations incur costs)
  4. Consider S3 lifecycle policies for future automation

Always implement proper error handling in your scripts:

try:
    response = bucket.delete_objects(Delete=delete_us)
    if 'Errors' in response:
        for error in response['Errors']:
            print(f"Failed to delete {error['Key']}: {error['Message']}")
except Exception as e:
    print(f"Error during deletion: {str(e)}")

When working with large AWS S3 buckets containing hundreds of thousands of objects, selectively deleting files with specific prefixes becomes non-trivial. The standard AWS Console interface doesn't provide efficient bulk operations for this scenario.

For programmatic deletion of objects matching a prefix pattern, we have several effective methods:

1. AWS CLI with S3 Batch Operations

The most efficient method for large-scale deletions:

aws s3api list-objects-v2 --bucket YOUR_BUCKET --prefix "abc_1" --output text --query "Contents[].{Key:Key}" | \
xargs -I {} aws s3api delete-object --bucket YOUR_BUCKET --key {}

2. Python Boto3 Implementation

For more control and error handling:

import boto3

def delete_s3_prefix(bucket, prefix):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket)
    
    # List and delete in batches of 1000
    for obj in bucket.objects.filter(Prefix=prefix):
        obj.delete()
        print(f"Deleted: {obj.key}")

# Example usage:
delete_s3_prefix('your-bucket-name', 'abc_1')

3. Lambda Function for Large-Scale Deletion

For extremely large buckets (millions of objects):

import boto3

def lambda_handler(event, context):
    s3 = boto3.client('s3')
    paginator = s3.get_paginator('list_objects_v2')
    
    for page in paginator.paginate(Bucket='your-bucket', Prefix='abc_1'):
        if 'Contents' in page:
            objects = [{'Key': obj['Key']} for obj in page['Contents']]
            s3.delete_objects(
                Bucket='your-bucket',
                Delete={'Objects': objects}
            )
            print(f"Deleted batch of {len(objects)} objects")
  • Batch operations are significantly faster than individual deletions
  • For 300,000+ objects, consider parallel processing
  • Monitor AWS API rate limits (3500 PUT/DELETE requests per second per prefix)
  • Use S3 Inventory to analyze objects before deletion

Always implement proper error handling:

try:
    response = s3.delete_objects(
        Bucket=bucket_name,
        Delete={
            'Objects': objects_to_delete,
            'Quiet': True
        }
    )
    if 'Errors' in response:
        for error in response['Errors']:
            print(f"Failed to delete {error['Key']}: {error['Message']}")
except Exception as e:
    print(f"Error during deletion: {str(e)}")

For truly massive datasets, create a manifest file and use S3 Batch:

aws s3control create-job \
    --account-id YOUR_ACCOUNT_ID \
    --operation '{"S3DeleteObjectTagging": {}}' \
    --manifest '{"Spec":{"Format":"S3BatchOperations_CSV_20180820","Fields":["Bucket","Key"]},"Location":{"ObjectArn":"arn:aws:s3:::your-bucket/manifest.csv","ETag":"your-etag"}}' \
    --report '{"Bucket":"arn:aws:s3:::your-report-bucket","Prefix":"reports","Format":"Report_CSV_20180820","Enabled":true}' \
    --priority 10 \
    --role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/S3BatchOperationsRole