Load Balancer Failure in AWS: Impact on EC2 Instances and High Availability Strategies

When an AWS Elastic Load Balancer (ELB) fails, it doesn't automatically take down your EC2 instances. The instances continue running, but they become inaccessible to clients because the load balancer acts as the entry point for incoming traffic.

In AWS architecture, ELBs are highly available by design. They're actually distributed systems themselves, consisting of multiple nodes across Availability Zones. A complete failure is extremely rare, but let's examine the scenario:


// Example of checking instance health despite LB failure
const AWS = require('aws-sdk');
const ec2 = new AWS.EC2();

async function checkInstanceHealth(instanceIds) {
  const params = {
    InstanceIds: instanceIds,
    IncludeAllInstances: true
  };
  
  const data = await ec2.describeInstanceStatus(params).promise();
  return data.InstanceStatuses.map(status => ({
    InstanceId: status.InstanceId,
    State: status.InstanceState.Name,
    Status: status.InstanceStatus.Status
  }));
}

Modern AWS load balancers (ALB/NLB) have automatic redundancy:

Multi-AZ deployment by default
Continuous health checks
Automatic failover between nodes

For mission-critical applications, consider these patterns:


# CloudFormation snippet for multi-region failover
Resources:
  PrimaryLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Scheme: internet-facing
      Subnets: !Ref PublicSubnets
      SecurityGroups: [!Ref LoadBalancerSecurityGroup]
  
  FailoverDNS:
    Type: AWS::Route53::RecordSet
    Properties:
      Name: !Sub "${ApplicationDomain}."
      Type: A
      AliasTarget:
        HostedZoneId: !GetAtt PrimaryLoadBalancer.CanonicalHostedZoneID
        DNSName: !GetAtt PrimaryLoadBalancer.DNSName
      Failover: PRIMARY

Implement comprehensive monitoring:


// CloudWatch alarm for LB health
{
  "AlarmName": "High-Unhealthy-Hosts",
  "MetricName": "UnHealthyHostCount",
  "Namespace": "AWS/ApplicationELB",
  "Statistic": "Average",
  "Dimensions": [
    {
      "Name": "LoadBalancer",
      "Value": "app/my-load-balancer/50dc6c495c0c9188"
    }
  ],
  "Period": 60,
  "EvaluationPeriods": 2,
  "Threshold": 1,
  "ComparisonOperator": "GreaterThanThreshold"
}

Consider these advanced patterns:

Active-active deployment across regions
DNS-based failover with Route53
Service mesh with retry logic
Circuit breakers in application code


// Example circuit breaker implementation
class CircuitBreaker {
  constructor(request, options = {}) {
    this.request = request;
    this.state = "CLOSED";
    this.failureThreshold = options.failureThreshold || 5;
    this.successThreshold = options.successThreshold || 2;
    this.timeout = options.timeout || 5000;
    this.failureCount = 0;
    this.successCount = 0;
  }
  
  async fire() {
    if (this.state === "OPEN") {
      throw new Error("Circuit breaker is OPEN");
    }
    
    try {
      const response = await this.request();
      return this.success(response);
    } catch (err) {
      return this.fail(err);
    }
  }
  
  success(response) {
    if (this.state === "HALF") {
      this.successCount++;
      if (this.successCount > this.successThreshold) {
        this.close();
      }
    }
    return response;
  }
  
  fail(err) {
    this.failureCount++;
    if (this.failureCount >= this.failureThreshold) {
      this.open();
    }
    throw err;
  }
  
  open() {
    this.state = "OPEN";
    setTimeout(() => this.half(), this.timeout);
  }
  
  half() {
    this.state = "HALF";
  }
  
  close() {
    this.state = "CLOSED";
    this.failureCount = 0;
    this.successCount = 0;
  }
}

When an AWS Elastic Load Balancer (ELB) fails, the behavior depends on the type of failure and your architecture configuration. The key thing to understand is that ELB itself is a managed service with built-in redundancy.

In AWS architecture, ELB failures are extremely rare because:

ELBs are distributed across multiple Availability Zones by default
AWS automatically replaces unhealthy ELB nodes
The service has multiple redundant components

However, in the extremely unlikely event of a complete ELB failure:

# Example of checking instance health directly
import boto3

ec2 = boto3.client('ec2')
response = ec2.describe_instance_status(InstanceIds=['i-1234567890abcdef0'])
print(response['InstanceStatuses'][0]['InstanceState']['Name'])

The critical point is that your EC2 instances continue running normally. They don't fail just because the load balancer fails. The impact is:

Existing connections to instances remain active
New connections can't be established through the failed ELB
Health checks from the ELB stop reaching your instances

Here's how to implement DNS failover as a backup:

// Route 53 failover configuration example
{
  "Comment": "Failover configuration",
  "Changes": [{
    "Action": "CREATE",
    "ResourceRecordSet": {
      "Name": "example.com",
      "Type": "A",
      "SetIdentifier": "Primary",
      "Failover": "PRIMARY",
      "AliasTarget": {
        "HostedZoneId": "Z3DZXE0EXAMPLE",
        "DNSName": "dualstack.primary-elb-123456789.us-west-2.elb.amazonaws.com",
        "EvaluateTargetHealth": true
      }
    }
  }]
}

To minimize impact from any potential ELB issues:

Enable cross-zone load balancing
Distribute instances across multiple AZs
Implement health checks at both ELB and application level
Consider using multiple ELBs in different regions

Set up CloudWatch alarms to detect ELB issues:

aws cloudwatch put-metric-alarm \
  --alarm-name "ELB-Unhealthy-Hosts" \
  --metric-name "UnHealthyHostCount" \
  --namespace "AWS/ELB" \
  --statistic "Maximum" \
  --period 60 \
  --threshold 0 \
  --comparison-operator "GreaterThanThreshold" \
  --evaluation-periods 1 \
  --alarm-actions "arn:aws:sns:us-west-2:123456789012:my-sns-topic"

For critical applications, consider implementing these backup access methods:

Direct instance access via SSH/RDP (with proper security groups)
Secondary ELB in another region
API Gateway direct integration

ServerDevWorker

Load Balancer Failure in AWS: Impact on EC2 Instances and High Availability Strategies

Related Articles