Traditional single-instance load balancers (like HAProxy or Nginx) typically max out around 50k-100k concurrent connections, even on beefy hardware. When facing web-scale traffic (think 2M+ persistent WebSocket connections), we need horizontal scaling strategies.
The first tier of scaling involves DNS round-robin with multiple A/AAAA records:
example.com. 300 IN A 192.0.2.1
example.com. 300 IN A 192.0.2.2
example.com. 300 IN A 192.0.2.3
Combine this with health checks using Route53's latency-based routing or Cloudflare's Load Balancer for automatic failover.
For global applications, implement Anycast BGP routing:
# Sample BGP configuration (Juniper syntax)
protocols {
bgp {
group anycast-group {
type external;
local-address 203.0.113.1;
neighbor 192.88.99.1 {
peer-as 64496;
}
export [ advertise-anycast ];
}
}
}
Modern solutions employ these architectural patterns:
- Active-Active HAProxy: Using keepalived with CARP protocol
- Nginx Plus Clustering: With shared memory zone sync
- Envoy Mesh: Sidecar proxy auto-scaling
For cloud-native deployments, here's a sample k8s configuration:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: scalable-ingress
annotations:
nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri"
spec:
ingressClassName: nginx
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: webapp-service
port:
number: 80
To distribute 2M+ connections:
# TCP sharding with iptables
iptables -t nat -A PREROUTING -p tcp --dport 443 \
-m statistic --mode nth --every 3 --packet 0 \
-j DNAT --to-destination 10.0.1.1:443
iptables -t nat -A PREROUTING -p tcp --dport 443 \
-m statistic --mode nth --every 2 --packet 0 \
-j DNAT --to-destination 10.0.1.2:443
Metric | Threshold | Tool |
---|---|---|
New connections/sec | >10k | Netdata |
SSL handshakes/sec | >5k | Prometheus |
HTTP/2 streams | >100/conn | Envoy stats |
Modern web applications often face the challenge of handling massive connection volumes. While a single load balancer (LB) works well for typical workloads, extreme cases like 2 million persistent HTTP connections require a distributed approach. The fundamental question becomes: how do we scale out load balancers while maintaining performance and reliability?
Here are the most effective approaches for scaling load balancers:
- DNS Round Robin: Distributes traffic across multiple LB instances
- Anycast Routing: Uses BGP to route to the nearest LB instance
- L4/L7 Cluster: Creates a coordinated group of load balancers
Here's a configuration example for a scalable HAProxy setup:
# haproxy.cfg - Frontend configuration
frontend http-in
bind *:80
mode http
default_backend servers
# Backend configuration with multiple LBs
backend servers
balance roundrobin
server lb1 192.168.1.10:80 check
server lb2 192.168.1.11:80 check
server lb3 192.168.1.12:80 check
For extreme scalability, consider these approaches:
- BGP Anycast: Announce the same IP from multiple locations
- ECMP Routing: Equal-cost multi-path routing at network layer
- Consistent Hashing: For persistent connection distribution
Implement robust monitoring to handle dynamic scaling:
# Sample Prometheus config for LB monitoring
scrape_configs:
- job_name: 'haproxy'
static_configs:
- targets: ['lb1:9101', 'lb2:9101', 'lb3:9101']
Remember that the optimal solution depends on your specific traffic patterns, protocol requirements, and infrastructure constraints.