Scalable Load Balancer Architectures: Handling 2M+ Persistent HTTP Connections in Distributed Systems

Traditional single-instance load balancers (like HAProxy or Nginx) typically max out around 50k-100k concurrent connections, even on beefy hardware. When facing web-scale traffic (think 2M+ persistent WebSocket connections), we need horizontal scaling strategies.

The first tier of scaling involves DNS round-robin with multiple A/AAAA records:


example.com.  300 IN  A  192.0.2.1
example.com.  300 IN  A  192.0.2.2
example.com.  300 IN  A  192.0.2.3

Combine this with health checks using Route53's latency-based routing or Cloudflare's Load Balancer for automatic failover.

For global applications, implement Anycast BGP routing:


# Sample BGP configuration (Juniper syntax)
protocols {
    bgp {
        group anycast-group {
            type external;
            local-address 203.0.113.1;
            neighbor 192.88.99.1 {
                peer-as 64496;
            }
            export [ advertise-anycast ];
        }
    }
}

Modern solutions employ these architectural patterns:

Active-Active HAProxy: Using keepalived with CARP protocol
Nginx Plus Clustering: With shared memory zone sync
Envoy Mesh: Sidecar proxy auto-scaling

For cloud-native deployments, here's a sample k8s configuration:


apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: scalable-ingress
  annotations:
    nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri"
spec:
  ingressClassName: nginx
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: webapp-service
            port:
              number: 80

To distribute 2M+ connections:


# TCP sharding with iptables
iptables -t nat -A PREROUTING -p tcp --dport 443 \
  -m statistic --mode nth --every 3 --packet 0 \
  -j DNAT --to-destination 10.0.1.1:443

iptables -t nat -A PREROUTING -p tcp --dport 443 \
  -m statistic --mode nth --every 2 --packet 0 \
  -j DNAT --to-destination 10.0.1.2:443

Metric	Threshold	Tool
New connections/sec	>10k	Netdata
SSL handshakes/sec	>5k	Prometheus
HTTP/2 streams	>100/conn	Envoy stats

Modern web applications often face the challenge of handling massive connection volumes. While a single load balancer (LB) works well for typical workloads, extreme cases like 2 million persistent HTTP connections require a distributed approach. The fundamental question becomes: how do we scale out load balancers while maintaining performance and reliability?

Here are the most effective approaches for scaling load balancers:

DNS Round Robin: Distributes traffic across multiple LB instances
Anycast Routing: Uses BGP to route to the nearest LB instance
L4/L7 Cluster: Creates a coordinated group of load balancers

Here's a configuration example for a scalable HAProxy setup:


# haproxy.cfg - Frontend configuration
frontend http-in
    bind *:80
    mode http
    default_backend servers

# Backend configuration with multiple LBs
backend servers
    balance roundrobin
    server lb1 192.168.1.10:80 check
    server lb2 192.168.1.11:80 check
    server lb3 192.168.1.12:80 check

For extreme scalability, consider these approaches:

BGP Anycast: Announce the same IP from multiple locations
ECMP Routing: Equal-cost multi-path routing at network layer
Consistent Hashing: For persistent connection distribution

Implement robust monitoring to handle dynamic scaling:


# Sample Prometheus config for LB monitoring
scrape_configs:
  - job_name: 'haproxy'
    static_configs:
      - targets: ['lb1:9101', 'lb2:9101', 'lb3:9101']

Remember that the optimal solution depends on your specific traffic patterns, protocol requirements, and infrastructure constraints.

ServerDevWorker

Scalable Load Balancer Architectures: Handling 2M+ Persistent HTTP Connections in Distributed Systems

Related Articles