How to Gracefully Restart All Tasks in an AWS ECS Service Without Downtime


2 views

When managing microservices in AWS ECS, we often need to propagate configuration changes across all running tasks. The brute-force approach of scaling the service down to 0 and back up creates unacceptable downtime. Here's a better way:

The recommended approach is to use the ECS API to trigger rolling restarts while maintaining service availability. Here's a Python implementation using boto3:

import boto3

def restart_ecs_tasks(cluster, service):
    ecs = boto3.client('ecs')
    
    # Get all running tasks
    tasks = ecs.list_tasks(
        cluster=cluster,
        serviceName=service,
        desiredStatus='RUNNING'
    )['taskArns']
    
    # Restart each task with a slight delay between them
    for task in tasks:
        ecs.stop_task(
            cluster=cluster,
            task=task,
            reason='Configuration update'
        )
        # Give ECS time to launch replacement
        time.sleep(15)
    
    print(f"Successfully rotated {len(tasks)} tasks in service {service}")

# Usage example
restart_ecs_tasks('production-cluster', 'config-service')

For zero-downtime deployments, consider these additional techniques:

  • Blue-green deployment configuration
  • Task draining with ALB connection draining
  • ECS deployment circuit breaker

Always verify the restart operation completed successfully:

aws ecs describe-services \
  --cluster production-cluster \
  --services config-service \
  --query 'services[].events[]' \
  --output table

This shows the deployment events and helps identify any issues during the rolling restart.

For quick operations, you can use the AWS CLI:

#!/bin/bash
CLUSTER="production-cluster"
SERVICE="config-service"

TASKS=$(aws ecs list-tasks \
  --cluster $CLUSTER \
  --service-name $SERVICE \
  --query "taskArns" \
  --output text)

for TASK in $TASKS; do
  aws ecs stop-task \
    --cluster $CLUSTER \
    --task $TASK \
    --reason "Configuration update"
  sleep 15
done

When managing microservices in AWS ECS, we often face the need to reload configuration across all running tasks after updates. The brute-force approach of scaling tasks down to zero and back up creates unnecessary downtime and violates zero-downtime deployment principles.

Here are three production-ready approaches to achieve configuration reloads:


// Method 1: Using AWS CLI (Bash)
aws ecs update-service \
  --cluster your-cluster-name \
  --service your-service-name \
  --force-new-deployment \
  --region us-west-2

// Method 2: Programmatic way with AWS SDK (Python)
import boto3

client = boto3.client('ecs')
response = client.update_service(
    cluster='your-cluster-name',
    service='your-service-name',
    forceNewDeployment=True
)

For critical services, consider implementing blue-green deployments:


// Using AWS CodeDeploy for controlled rollout
aws deploy create-deployment \
  --application-name your-app \
  --deployment-group-name your-dg \
  --revision '{
    "revisionType": "AppSpecContent",
    "appSpecContent": {
      "content": "{\"version\":1,\"Resources\":[{\"TargetService\":{\"Type\":\"AWS::ECS::Service\",\"Properties\":{\"TaskDefinition\":\"your-task-def\",\"LoadBalancerInfo\":{\"ContainerName\":\"web\",\"ContainerPort\":80}}}}]}"
    }
  }'

For stateful services where immediate restart isn't possible:

  1. Implement configuration hot-reload endpoints
  2. Use S3 Event Notifications to trigger reloads
  3. Consider parameter store with version tracking

Always verify successful deployment:


aws ecs describe-services \
  --cluster your-cluster \
  --services your-service \
  --query 'services[0].events' \
  --output table