How to Gracefully Restart All Tasks in an AWS ECS Service Without Downtime


12 views

When managing microservices in AWS ECS, we often need to propagate configuration changes across all running tasks. The brute-force approach of scaling the service down to 0 and back up creates unacceptable downtime. Here's a better way:

The recommended approach is to use the ECS API to trigger rolling restarts while maintaining service availability. Here's a Python implementation using boto3:

import boto3

def restart_ecs_tasks(cluster, service):
    ecs = boto3.client('ecs')
    
    # Get all running tasks
    tasks = ecs.list_tasks(
        cluster=cluster,
        serviceName=service,
        desiredStatus='RUNNING'
    )['taskArns']
    
    # Restart each task with a slight delay between them
    for task in tasks:
        ecs.stop_task(
            cluster=cluster,
            task=task,
            reason='Configuration update'
        )
        # Give ECS time to launch replacement
        time.sleep(15)
    
    print(f"Successfully rotated {len(tasks)} tasks in service {service}")

# Usage example
restart_ecs_tasks('production-cluster', 'config-service')

For zero-downtime deployments, consider these additional techniques:

  • Blue-green deployment configuration
  • Task draining with ALB connection draining
  • ECS deployment circuit breaker

Always verify the restart operation completed successfully:

aws ecs describe-services \
  --cluster production-cluster \
  --services config-service \
  --query 'services[].events[]' \
  --output table

This shows the deployment events and helps identify any issues during the rolling restart.

For quick operations, you can use the AWS CLI:

#!/bin/bash
CLUSTER="production-cluster"
SERVICE="config-service"

TASKS=$(aws ecs list-tasks \
  --cluster $CLUSTER \
  --service-name $SERVICE \
  --query "taskArns" \
  --output text)

for TASK in $TASKS; do
  aws ecs stop-task \
    --cluster $CLUSTER \
    --task $TASK \
    --reason "Configuration update"
  sleep 15
done

When managing microservices in AWS ECS, we often face the need to reload configuration across all running tasks after updates. The brute-force approach of scaling tasks down to zero and back up creates unnecessary downtime and violates zero-downtime deployment principles.

Here are three production-ready approaches to achieve configuration reloads:


// Method 1: Using AWS CLI (Bash)
aws ecs update-service \
  --cluster your-cluster-name \
  --service your-service-name \
  --force-new-deployment \
  --region us-west-2

// Method 2: Programmatic way with AWS SDK (Python)
import boto3

client = boto3.client('ecs')
response = client.update_service(
    cluster='your-cluster-name',
    service='your-service-name',
    forceNewDeployment=True
)

For critical services, consider implementing blue-green deployments:


// Using AWS CodeDeploy for controlled rollout
aws deploy create-deployment \
  --application-name your-app \
  --deployment-group-name your-dg \
  --revision '{
    "revisionType": "AppSpecContent",
    "appSpecContent": {
      "content": "{\"version\":1,\"Resources\":[{\"TargetService\":{\"Type\":\"AWS::ECS::Service\",\"Properties\":{\"TaskDefinition\":\"your-task-def\",\"LoadBalancerInfo\":{\"ContainerName\":\"web\",\"ContainerPort\":80}}}}]}"
    }
  }'

For stateful services where immediate restart isn't possible:

  1. Implement configuration hot-reload endpoints
  2. Use S3 Event Notifications to trigger reloads
  3. Consider parameter store with version tracking

Always verify successful deployment:


aws ecs describe-services \
  --cluster your-cluster \
  --services your-service \
  --query 'services[0].events' \
  --output table