When building fault-tolerant systems on AWS EC2, traditional HA solutions like CoroSync face limitations due to their multicast dependencies. EC2's VPC architecture only supports unicast communication, forcing us to explore alternative clustering technologies.
HashiCorp's Consul provides a robust alternative with native unicast support. Here's a basic configuration for HAProxy failover:
services {
name = "haproxy"
port = 80
check {
args = ["curl", "-f", "http://localhost:80/health"]
interval = "10s"
timeout = "1s"
}
}
Key advantages include:
- Gossip-based membership protocol (Serf) works in unicast environments
- Built-in DNS interface for service discovery
- Automatic health checking and failover
For simpler VIP failover scenarios, Keepalived offers a unicast-compatible solution. Example configuration for EC2:
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
unicast_peer {
10.0.1.5
10.0.1.6
}
virtual_ipaddress {
10.0.1.100/24
}
}
Note: Requires careful tuning of timers for cloud environments.
For AWS-specific deployments, consider these integrated options:
- ALB/ELB: For stateless services, use target group health checks
- Route 53: DNS-based failover with health checking
- EKS/ECS: Container orchestration with built-in service recovery
When evaluating alternatives, test these critical factors:
Solution | Quorum Support | Automation Friendly | State Handling |
---|---|---|---|
Consul | Yes | Excellent | Limited |
Keepalived | No | Moderate | Basic |
Route 53 | N/A | Excellent | None |
For Solr specifically, consider combining Zookeeper with custom health checks rather than traditional HA clustering.
When implementing high availability solutions on AWS EC2, we immediately hit the multicast limitation. Traditional solutions like CoroSync become non-starters since they rely on multicast communication. While Heartbeat+Pacemaker works, it presents several operational challenges in dynamic cloud environments:
# Typical Heartbeat config limitation (two-node only)
autojoin none
ucast eth0 10.0.1.2 # Hardcoded peer IP
After extensive testing across multiple AWS regions, these alternatives proved most effective for unicast environments:
1. Keepalived with VRRP
While traditionally used for L4 failover, Keepalived can manage service-level HA when combined with custom scripts:
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101
virtual_ipaddress {
10.0.1.100/24 dev eth0
}
track_script {
chk_haproxy
}
}
2. Consul with Health Checks
HashiCorp's Consul provides service discovery with native health checking capabilities. This example shows Solr failover configuration:
service {
name = "solr"
port = 8983
check = {
id = "solr-http"
name = "Solr HTTP Check"
http = "http://localhost:8983/solr/admin/ping"
interval = "10s"
timeout = "1s"
}
}
3. Patroni for PostgreSQL-like Services
While designed for PostgreSQL, Patroni's architecture pattern works well for other services:
scope: solr_cluster
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.1:8008
etcd:
hosts: 10.0.1.1:2379,10.0.1.2:2379,10.0.1.3:2379
bootstrap:
dcs:
ttl: 30
retry_timeout: 10
The true test of any HA solution in cloud environments is its ability to be fully automated. Terraform combined with cloud-init provides the most robust approach:
resource "aws_instance" "ha_node" {
count = 3
user_data = templatefile("${path.module}/cloud-init.yaml", {
node_index = count.index
cluster_nodes = aws_instance.ha_node.*.private_ip
})
}
# cloud-init.yaml
packages:
- consul
- haproxy
write_files:
- path: /etc/consul.d/solr.json
content: |
${service_definition}
Each solution presents different characteristics:
Solution | Service Types | Node Limit | Learning Curve |
---|---|---|---|
Keepalived | L4/L7 | 255 | Low |
Consul | Any | Unlimited | Medium |
Patroni | Stateful | Unlimited | High |
The choice ultimately depends on your specific service requirements and operational constraints. For most AWS deployments, Consul-based solutions provide the best balance of flexibility and reliability.