Implementing Health Checks for ECS Tasks Without ELB: Zero-Downtime Deployment for Spring Boot Containers


4 views

When deploying Spring Boot applications on ECS without Elastic Load Balancing, implementing proper health checks becomes crucial for achieving zero-downtime deployments. Many developers encounter unexpected behavior when the health check status remains stuck in "UNKNOWN" state despite seemingly correct configuration.

The examples you tried with exit 0 and exit 1 demonstrate a common misconception. ECS health checks for tasks without ELB require actual endpoint validation, not just command execution. The "UNKNOWN" status typically indicates one of three scenarios:

1. The health check command isn't properly structured
2. The container isn't exposing the expected health endpoint
3. Network connectivity issues between ECS agent and container

For Spring Boot applications, leverage the Actuator health endpoint (enabled by adding spring-boot-starter-actuator to dependencies). Here's a working task definition health check configuration:

"healthCheck": {
    "command": [
        "CMD-SHELL",
        "curl -f http://localhost:8080/actuator/health || exit 1"
    ],
    "interval": 30,
    "retries": 3,
    "startPeriod": 60,
    "timeout": 5
}
  • interval: 30 seconds between checks (adjust based on app startup time)
  • retries: 3 consecutive failures mark task as unhealthy
  • startPeriod: 60 second grace period for app initialization
  • timeout: 5 seconds to prevent hanging checks

If health checks still don't work:

  1. Verify the Actuator endpoint is exposed (check application.properties)
  2. Test the health endpoint manually inside the container
  3. Check ECS agent logs for health check execution errors
  4. Ensure network connectivity between ECS agent and container

For custom health check logic, you might use:

"healthCheck": {
    "command": [
        "CMD-SHELL",
        "if [ $(curl -s -o /dev/null -w '%{http_code}' http://localhost:8080/health) -eq 200 ]; then exit 0; else exit 1; fi"
    ],
    "interval": 20,
    "retries": 5,
    "startPeriod": 90,
    "timeout": 3
}

When deploying Spring Boot applications in Docker containers on AWS ECS without Elastic Load Balancing (ELB), many developers encounter unexpected behavior with task health checks. The core issue manifests when:

  • Health check commands like ["CMD-SHELL","exit 0"] don't change the UNKNOWN status
  • Service updates can't properly roll out due to undetermined health states
  • Documentation gaps leave developers troubleshooting in the dark

The ECS health check system behaves differently when ELB isn't involved. Without ELB health checks, ECS relies solely on the Docker container's health check definition, which requires:

// This minimal configuration won't work as expected
"healthCheck": {
  "command": ["CMD-SHELL","exit 0"],
  "interval": 30,
  "timeout": 5,
  "retries": 3
}

For Spring Boot applications, we need an actual endpoint check rather than shell commands. Here's the working configuration:

// Working health check for Spring Boot
"healthCheck": {
  "command": [
    "CMD-SHELL",
    "curl -f http://localhost:8080/actuator/health || exit 1"
  ],
  "interval": 30,
  "timeout": 5,
  "retries": 3,
  "startPeriod": 60
}
  • startPeriod: Gives your application time to start (critical for Spring Boot)
  • curl -f: Fails on non-2xx responses
  • Port mapping: Ensure your container exposes the correct port
{
  "family": "spring-boot-app",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "app-container",
      "image": "your-ecr-repo/spring-boot-app:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "hostPort": 8080
        }
      ],
      "healthCheck": {
        "command": [
          "CMD-SHELL",
          "curl -f http://localhost:8080/actuator/health || exit 1"
        ],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}

After deployment:

  1. Check task health status in ECS console
  2. View stopped tasks to see if health checks failed
  3. Examine CloudWatch logs for health check command output

Common issues include:

  • Insufficient startPeriod for Spring Boot initialization
  • Missing actuator/health endpoint in application.properties
  • Network configuration preventing localhost access