How to Programmatically Determine the Actual Storage Size of AWS EBS Snapshots for Accurate Billing


2 views

While AWS bills customers based on the actual storage consumed by EBS snapshots (which only include changed blocks), the AWS console and most API responses show the original volume size. This discrepancy creates challenges for:

  • Cost allocation and billing verification
  • Storage optimization efforts
  • Capacity planning for snapshot archiving

Here are three technical methods to get accurate size data:

# AWS CLI method (using snapshot-ids.txt)
aws ec2 list-snapshots --snapshot-ids $(cat snapshot-ids.txt) \
--query "Snapshots[*].[SnapshotId,VolumeSize,StartTime]" \
--output table

However, this still shows volume size, not actual storage. For true consumption:

# Get snapshot storage metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EBS \
--metric-name SnapshotStorageUsed \
--dimensions Name=SnapshotId,Value=snap-1234567890abcdef0 \
--start-time $(date -d "1 day ago" +%F)T00:00:00Z \
--end-time $(date -d "now" +%F)T23:59:59Z \
--period 86400 \
--statistics Average \
--output json

For billing verification across multiple snapshots:

import boto3
from datetime import datetime, timedelta

client = boto3.client('ce')

response = client.get_cost_and_usage(
    TimePeriod={
        'Start': (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d'),
        'End': datetime.now().strftime('%Y-%m-%d')
    },
    Granularity='MONTHLY',
    Metrics=['UnblendedCost'],
    Filter={
        'Dimensions': {
            'Key': 'USAGE_TYPE',
            'Values': ['EBS:SnapshotUsage']
        }
    }
)

After measuring actual sizes, consider these storage reduction techniques:

  • Schedule snapshots during low-write periods
  • Use separate volumes for static vs dynamic data
  • Implement snapshot lifecycle policies to archive older snaps

Remember that snapshot sizes compound - each incremental snapshot contains all previous changes since the last full backup.


While AWS documentation clearly states that "you're only billed for changed blocks" in EBS snapshots, the actual storage consumption remains frustratingly opaque. The standard AWS CLI command aws ec2 describe-snapshots only shows the original volume size, not the incremental storage being billed.

Many engineers try to estimate size through:

aws ec2 describe-snapshots --snapshot-ids snap-1234567890abcdef0

But the output's VolumeSize field refers to the source volume, not the snapshot's actual storage footprint.

Here are three reliable methods to uncover the true size:

Method 1: CloudWatch Metrics

The AWS/EBS namespace contains the SnapshotStorageUsed metric:

aws cloudwatch get-metric-statistics \
--namespace AWS/EBS \
--metric-name SnapshotStorageUsed \
--dimensions Name=SnapshotId,Value=snap-1234567890abcdef0 \
--statistics Average \
--period 3600 \
--start-time $(date -d "1 day ago" +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date +%Y-%m-%dT%H:%M:%SZ)

Method 2: Cost Explorer API

Filter by snapshot ID in Cost Explorer:

aws ce get-cost-and-usage \
--time-period Start=2023-01-01,End=2023-01-31 \
--granularity MONTHLY \
--metrics "UsageQuantity" \
--filter '{"Dimensions": {"Key": "USAGE_TYPE", "Values": ["EBS:SnapshotUsage"]}}'

Method 3: Lambda-based Audit System

Create a scheduled Lambda function that:

import boto3

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    cloudwatch = boto3.client('cloudwatch')
    
    snapshots = ec2.describe_snapshots(OwnerIds=['self'])['Snapshots']
    
    for snap in snapshots:
        response = cloudwatch.get_metric_statistics(
            Namespace='AWS/EBS',
            MetricName='SnapshotStorageUsed',
            Dimensions=[{'Name':'SnapshotId', 'Value': snap['SnapshotId']}],
            StartTime=datetime.utcnow() - timedelta(days=1),
            EndTime=datetime.utcnow(),
            Period=3600,
            Statistics=['Average']
        )
        # Store results in DynamoDB for historical tracking

Armed with actual size data, consider these optimization strategies:

  • Schedule snapshots during low-change periods
  • Use fsfreeze on Linux instances before snapshotting
  • Implement tiered retention policies
  • Consider alternative backup solutions for highly volatile data