How to Configure Cache-Control Headers for AWS CloudFront-S3 Origin Files


1 views

After migrating to AWS infrastructure, I discovered CloudFront wasn't passing through Cache-Control headers for static files served from S3 origin. This became problematic when trying to implement proper browser caching strategies.

The setup consists of:

  • EC2 instance running Nginx reverse proxy
  • Apache backend serving dynamic content
  • S3 bucket for static assets
  • CloudFront CDN distributing both EC2 and S3 content

The key observation: requests for static files bypass the EC2 entirely, going directly through CloudFront to S3.

There are actually three ways to implement Cache-Control headers in this scenario:

Option 1: S3 Object Metadata

You can set Cache-Control headers at the S3 object level. For bulk operations, use the AWS CLI:

aws s3 cp s3://your-bucket/ s3://your-bucket/ --recursive \
--metadata-directive REPLACE \
--cache-control "public, max-age=31536000" \
--exclude "*" \
--include "*.css" \
--include "*.js" \
--include "*.png" \
--include "*.jpg" \
--include "*.jpeg" \
--include "*.gif" \
--include "*.webp"

Option 2: CloudFront Behavior Settings

Configure Cache-Control headers directly in CloudFront:

  1. Go to CloudFront distribution → Behaviors tab
  2. Edit your behavior
  3. Set "Cache Based on Selected Request Headers" to "Whitelist"
  4. Add "Cache-Control" to the whitelist
  5. Set default TTL (minimum 0, maximum 31536000)

Option 3: Lambda@Edge

For advanced control, use Lambda@Edge to modify headers:

exports.handler = (event, context, callback) => {
    const response = event.Records[0].cf.response;
    const headers = response.headers;
    
    headers['cache-control'] = [{
        key: 'Cache-Control',
        value: 'public, max-age=31536000, immutable'
    }];
    
    callback(null, response);
};

For most use cases, Option 2 (CloudFront Behavior) provides the best balance of performance and maintainability. The Lambda@Edge approach adds about 50-100ms latency per request.

After implementation, verify with curl:

curl -I https://your-distribution.cloudfront.net/path/to/file.js

Or programmatically check headers using any HTTP client library.

  • Consider varying cache durations by file type
  • Implement versioning in filenames for immutable assets
  • Use CloudFront invalidation when updating critical files

When migrating static assets to AWS S3 with CloudFront CDN, many developers encounter a surprising behavior: their carefully configured caching strategies seem to disappear. The files serve perfectly, but browser caching doesn't work as expected because the Cache-Control headers are missing from responses.

In this setup, requests for static files bypass your EC2 infrastructure completely. The flow looks like this:

Client → CloudFront CDN → S3 Bucket → Client

This means any caching headers set in your Nginx/Apache configuration won't affect these static assets.

Option 1: Setting Headers at S3 Origin

The most reliable method is configuring Cache-Control headers at the S3 object level. For new uploads, you can specify headers during the PUT operation:

aws s3 cp localfile.txt s3://my-bucket/ \
  --cache-control "max-age=31536000, public" \
  --metadata-directive REPLACE

For existing objects, you'll need to update them:

aws s3 cp s3://my-bucket/file.txt s3://my-bucket/file.txt \
  --metadata-directive REPLACE \
  --cache-control "max-age=31536000, public"

Option 2: Using CloudFront Behaviors

CloudFront can override origin headers through Cache Policy settings:

  1. Navigate to CloudFront distributions
  2. Select your distribution
  3. Go to "Behaviors" tab
  4. Create or edit a behavior
  5. Under "Cache key and origin requests", select or create a cache policy with your desired TTL

For those with thousands of existing objects, manual updates aren't practical. Here's a Python script using boto3 to update headers in bulk:

import boto3

s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')

for page in paginator.paginate(Bucket='your-bucket-name'):
    for obj in page.get('Contents', []):
        s3.copy_object(
            Bucket='your-bucket-name',
            Key=obj['Key'],
            CopySource={'Bucket': 'your-bucket-name', 'Key': obj['Key']},
            MetadataDirective='REPLACE',
            CacheControl='max-age=31536000, public'
        )

After making changes, verify headers are being served correctly:

curl -I https://your-distribution.cloudfront.net/path/to/asset.js

Look for the Cache-Control header in the response. If using CloudFront behaviors, remember that changes may take several minutes to propagate.

For complex caching requirements, consider:

  • Setting different TTLs for different file types
  • Implementing versioning in filenames for cache busting
  • Using Lambda@Edge for dynamic header manipulation