Scalable HAProxy Cluster Deployment on AWS EC2: High Availability and Layer 7 Routing Solutions


1 views

When ELB's limited L7 capabilities don't meet your needs, a custom HAProxy cluster on EC2 offers superior flexibility. The challenge lies in creating a self-healing infrastructure that:

  • Automatically detects node failures
  • Distributes traffic evenly across AZs
  • Handles dynamic cluster membership changes

Here's the stack I've successfully deployed for high-traffic services:

HAProxy Cluster (3+ nodes) → Keepalived (VRRP) → AWS Route53 (DNS failover)
│
├── Auto Scaling Group (cross-AZ)
├── EC2 User Data for bootstrap
└── CloudWatch for health checks

Keepalived Configuration (for VIP management):

vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
    weight 2
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    virtual_router_id 51
    priority 101
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    track_script {
        chk_haproxy
    }
}

For auto-discovery of backend servers, use this HAProxy template with Consul:

backend web_servers
    balance roundrobin
    option httpchk GET /health
    server-template web 1-10 _web._tcp.service.consul resolvers consul resolve-opts allow-dup-ip check

CloudFormation snippet for the Launch Template:

UserData: 
  Fn::Base64: |
    #!/bin/bash
    yum install -y haproxy keepalived
    aws s3 cp s3://my-config-bucket/haproxy.cfg /etc/haproxy/
    systemctl enable --now haproxy keepalived

Essential CloudWatch metrics to alert on:

  • HAProxy queue depth > 10
  • Active connections > 80% of maxconn
  • 5xx error rate > 1% for 5 minutes

Critical findings from our 6-node cluster handling 50K RPS:

  1. Always set maxconn 20% below instance limits
  2. Use TCP health checks for L4, HTTP for L7
  3. Enable option log-health-checks for debugging

Building a highly available HAProxy cluster on AWS EC2 requires careful consideration of several components. The core challenge lies in maintaining continuous traffic flow while handling node failures and scaling events.

# Infrastructure components needed:
1. HAProxy nodes (min 3 for quorum)
2. IPVS load balancer layer
3. Route 53 for DNS failover
4. Auto Scaling Groups for node management
5. AWS Lambda for configuration updates

IPVS (IP Virtual Server) acts as our frontend load balancer, directing traffic to healthy HAProxy nodes. Here's a sample configuration:

# Sample IPVS configuration
ipvsadm -A -t 192.168.0.1:80 -s rr
ipvsadm -a -t 192.168.0.1:80 -r 10.0.1.10:80 -g
ipvsadm -a -t 192.168.0.1:80 -r 10.0.2.10:80 -g
ipvsadm -a -t 192.168.0.1:80 -r 10.0.3.10:80 -g

For the HAProxy layer, we recommend using keepalived with a floating VIP. This example shows a 3-node configuration:

# /etc/keepalived/keepalived.conf (node1)
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass secret
    }
    virtual_ipaddress {
        192.168.0.1/24 dev eth0
    }
}

To handle dynamic EC2 environments, implement automatic node discovery using AWS APIs:

import boto3

def get_haproxy_nodes():
    ec2 = boto3.client('ec2')
    response = ec2.describe_instances(
        Filters=[
            {'Name': 'tag:Role', 'Values': ['haproxy']},
            {'Name': 'instance-state-name', 'Values': ['running']}
        ]
    )
    return [i['PrivateIpAddress'] for r in response['Reservations'] for i in r['Instances']]

Distribute nodes across availability zones using AWS Auto Scaling Groups:

{
    "AutoScalingGroupName": "haproxy-cluster",
    "AvailabilityZones": ["us-east-1a", "us-east-1b", "us-east-1c"],
    "LaunchConfigurationName": "haproxy-launch-config",
    "MinSize": 3,
    "MaxSize": 6,
    "DesiredCapacity": 3,
    "HealthCheckType": "ELB",
    "Tags": [
        {
            "Key": "Role",
            "Value": "haproxy",
            "PropagateAtLaunch": true
        }
    ]
}

Implement comprehensive health checks combining EC2 status checks and application-level monitoring:

#!/bin/bash
# Health check script for HAProxy
if curl -s http://localhost:8080/health | grep -q "healthy"; then
    exit 0
else
    exit 1
fi

While SSL can be handled separately, here's how to configure HAProxy for SSL termination:

frontend https-in
    bind *:443 ssl crt /etc/ssl/certs/mydomain.pem
    mode http
    default_backend servers

backend servers
    balance roundrobin
    server web1 10.0.1.20:80 check
    server web2 10.0.2.20:80 check