During our ECS deployments using CloudFormation, we noticed ALB target groups consistently taking the full deregistration delay (default 300 seconds) to remove old containers, even when no active connections existed. This contradicts AWS's official documentation which states:
"Elastic Load Balancing immediately completes the deregistration process [...] if a deregistering target has no in-flight requests and no active connections."
After extensive testing with minimal traffic (only developer requests), we identified several non-obvious factors:
# CloudFormation snippet showing problematic health check configuration
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 30
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
UnhealthyThresholdCount: 2
HealthCheckPath: /status
HealthCheckPort: 8080
HealthCheckProtocol: HTTP
- Health Check Connections: ALB continues health checks during draining
- TCP Keepalives: Modern HTTP clients maintain persistent connections
- ECS Service Timing: Container shutdown sequence affects connection termination
We implemented these adjustments across our deployment pipeline:
// AWS CLI command to modify deregistration behavior
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/my-targets/73e2d6bc24d8a067 \
--attributes Key=deregistration_delay.timeout_seconds,Value=30
- Reduced deregistration delay from 300s to 30s for faster cycling
- Added connection termination in application shutdown hook
- Configured TCP keepalive timeout (3s) in ALB target group
Use these CloudWatch metrics to monitor actual draining behavior:
Metric | Namespace | Significance |
---|---|---|
HealthyHostCount | AWS/ApplicationELB | Shows actual registration state |
RequestCount | AWS/ApplicationELB | Verifies zero traffic during drain |
This JSON snippet shows proper lifecycle hooks for connection cleanup:
{
"containerDefinitions": [
{
"name": "web",
"image": "nginx:latest",
"lifecycle": {
"preStop": {
"command": [
"sh",
"-c",
"sleep 5 && kill -SIGTERM $(cat /var/run/nginx.pid)"
]
}
}
}
]
}
For containers not responding to SIGTERM, consider adding TCP connection tracking and forced termination in your application's shutdown sequence.
During ECS service updates through CloudFormation, I noticed ALB target groups consistently take the full 300 seconds (default deregistration delay) to complete draining, even when no active connections exist. This contradicts AWS documentation stating deregistration should complete immediately when no in-flight requests exist.
To isolate the issue, I created a minimal test environment:
Resources:
TestService:
Type: AWS::ECS::Service
Properties:
DeploymentConfiguration:
DeploymentCircuitBreaker:
Enable: true
Rollback: true
MaximumPercent: 200
MinimumHealthyPercent: 100
LoadBalancers:
- ContainerName: "web"
ContainerPort: 80
TargetGroupArn: !Ref TargetGroup
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: "ENABLED"
SecurityGroups:
- !Ref SecurityGroup
Subnets: !Split [",", !ImportValue "PrivateSubnets"]
Through CloudWatch Metrics and ALB access logs, I identified three potential culprits:
- Health check connections being counted as active
- TCP keep-alive from ALB to targets
- ECS task networking cleanup latency
Here's the working configuration that solved the issue:
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 10
HealthCheckTimeoutSeconds: 6
HealthyThresholdCount: 2
UnhealthyThresholdCount: 2
TargetType: ip
Port: 80
Protocol: HTTP
DeregistrationDelayTimeoutSeconds: 30 # Reduced from default 300
VpcId: !ImportValue "VpcId"
For HTTP/HTTPS services, implement connection timeouts in your application:
# Nginx configuration example
keepalive_timeout 10s;
keepalive_requests 100;
send_timeout 60s;
Use these AWS CLI commands to monitor deregistration:
# Check target health
aws elbv2 describe-target-health \
--target-group-arn [TARGET_GROUP_ARN] \
--query 'TargetHealthDescriptions[?TargetHealth.State==draining]'
# Check network connections (requires SSM access)
aws ssm send-command \
--instance-ids [INSTANCE_ID] \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["netstat -anp | grep ESTABLISHED"]'