Automating AMI Rotation in AWS Auto Scaling Groups with Zero Downtime

When managing production infrastructure with AWS Auto Scaling Groups (ASGs), one common pain point is updating the underlying Amazon Machine Images (AMIs) while maintaining availability. The current manual process of scaling up/down works but introduces operational overhead and potential downtime windows.

Here are proven approaches to automate AMI rotation:

# CloudFormation example using UpdatePolicy
"MyASG": {
  "Type": "AWS::AutoScaling::AutoScalingGroup",
  "UpdatePolicy": {
    "AutoScalingRollingUpdate": {
      "MaxBatchSize": "2",
      "MinInstancesInService": "1",
      "PauseTime": "PT5M",
      "WaitOnResourceSignals": "true"
    }
  }
}

For critical production systems, consider creating a parallel ASG with the new AMI:

Create new launch template with updated AMI
Stand up new ASG pointing to same ELB
Gradually shift traffic using ELB weights
Decommission old ASG after validation

SSM Automation Documents can orchestrate the entire process:

aws ssm create-automation-execution \
  --document-name "AWS-UpdateLinuxAmi" \
  --parameters "AutomationAssumeRole=arn:aws:iam::123456789012:role/AutomationServiceRole,SourceAmiId=ami-12345678,InstanceIamRole=MyInstanceProfile,TargetAmiName=web-app-{{timestamp}}"

For GitOps workflows, integrate AMI updates into your CI/CD pipeline:

# Sample Jenkins pipeline stage
stage('Update ASG') {
  steps {
    script {
      def newLT = aws.ec2.createLaunchTemplateVersion(
        launchTemplateId: 'lt-0123456789abcdef',
        sourceVersion: '1',
        amiId: params.AMI_ID
      )
      aws.autoscaling.updateAutoScalingGroup(
        autoScalingGroupName: 'web-app-asg',
        launchTemplate: [
          launchTemplateId: 'lt-0123456789abcdef',
          version: newLT.versionNumber
        ]
      )
    }
  }
}

Always test new AMIs in staging first
Monitor health checks during rotation
Consider canary deployments for major changes
Implement proper rollback procedures

When managing web applications on AWS, we often face the dilemma of updating Amazon Machine Images (AMIs) while maintaining continuous availability. The current approach of manually scaling up/down works but introduces operational overhead and potential service disruption.

Here are effective methods to automate AMI rotation in your Auto Scaling Groups (ASGs):

1. Using AWS Systems Manager (SSM) Automation

This native AWS solution provides the most integrated approach. Create an SSM Automation document that:

Creates a new launch template version with the updated AMI
Gradually replaces instances using rolling updates
Verifies health checks before proceeding


# Sample AWS CLI command to start the automation
aws ssm start-automation-execution \
  --document-name "AWS-UpdateAutoScalingGroup" \
  --parameters '{
    "AutoScalingGroupName":["your-asg-name"],
    "LaunchTemplateName":["your-launch-template"],
    "LaunchTemplateVersion":["$LATEST"],
    "MinHealthyPercentage":["90"],
    "WaitOnResourceSignals":["false"]
  }'

2. AWS CodePipeline Integration

For CI/CD pipelines, you can trigger AMI updates through CodePipeline:


# CloudFormation snippet for Pipeline configuration
Resources:
  AMIUpdatePipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      Stages:
        - Name: Source
          Actions:
            - Name: SourceAction
              ActionTypeId:
                Category: Source
                Owner: AWS
                Provider: CodeCommit
              Configuration:
                RepositoryName: your-repo
                BranchName: main
        - Name: Build
          Actions:
            - Name: BuildAMIAction
              ActionTypeId:
                Category: Build
                Owner: AWS
                Provider: CodeBuild
              Configuration:
                ProjectName: your-build-project
        - Name: Deploy
          Actions:
            - Name: UpdateASG
              ActionTypeId:
                Category: Deploy
                Owner: AWS
                Provider: AutoScaling
              Configuration:
                LaunchTemplateName: your-template
                AutoScalingGroupName: your-asg

3. Custom Lambda Function Solution

For maximum control, implement a Lambda function triggered by CloudWatch Events:


import boto3
import time

def lambda_handler(event, context):
    autoscaling = boto3.client('autoscaling')
    ec2 = boto3.client('ec2')
    
    # Get current ASG configuration
    asg = autoscaling.describe_auto_scaling_groups(
        AutoScalingGroupNames=['your-asg-name']
    )['AutoScalingGroups'][0]
    
    # Create new launch template version with updated AMI
    new_launch_template = ec2.create_launch_template_version(
        LaunchTemplateName='your-template',
        SourceVersion='$LATEST',
        LaunchTemplateData={
            'ImageId': 'ami-1234567890abcdef0'
        }
    )
    
    # Update ASG with new launch template
    autoscaling.update_auto_scaling_group(
        AutoScalingGroupName='your-asg-name',
        LaunchTemplate={
            'LaunchTemplateName': 'your-template',
            'Version': str(new_launch_template['LaunchTemplateVersion']['VersionNumber'])
        },
        MinSize=asg['MinSize'],
        MaxSize=asg['MaxSize'],
        DesiredCapacity=asg['DesiredCapacity']
    )
    
    # Implement instance refresh
    refresh = autoscaling.start_instance_refresh(
        AutoScalingGroupName='your-asg-name',
        Preferences={
            'MinHealthyPercentage': 90,
            'InstanceWarmup': 300
        }
    )
    
    return {
        'statusCode': 200,
        'body': f"Instance refresh initiated: {refresh['InstanceRefreshId']}"
    }

Always test new AMIs in a staging environment first
Implement health checks that accurately reflect application state
Use canary deployments when possible (gradual rollout)
Monitor CloudWatch metrics during rotation
Set appropriate instance warm-up times

Implement these CloudWatch Alarms to detect issues:


aws cloudwatch put-metric-alarm \
  --alarm-name "ASG-HealthCheck-Failures" \
  --metric-name "HealthyHostCount" \
  --namespace "AWS/AutoScaling" \
  --statistic "Average" \
  --period 60 \
  --threshold 2 \
  --comparison-operator "LessThanThreshold" \
  --dimensions "Name=AutoScalingGroupName,Value=your-asg-name" \
  --evaluation-periods 2 \
  --alarm-actions "arn:aws:sns:us-east-1:123456789012:your-sns-topic"

For rollback scenarios, maintain previous launch template versions and implement automation to revert if alarms trigger.

ServerDevWorker

Automating AMI Rotation in AWS Auto Scaling Groups with Zero Downtime

1. Using AWS Systems Manager (SSM) Automation

2. AWS CodePipeline Integration

3. Custom Lambda Function Solution

Related Articles