Optimal Strategies for Load Balancing Long-Lived TCP Connections with Session Stickiness


5 views

When dealing with long-running TCP connections (typically 10-15 hours) where clients establish multiple concurrent sockets, traditional load balancing approaches often fall short. The key requirements are:

  • Persistent connection handling (15KB/s sustained traffic per socket)
  • Session stickiness for related connections (3 sockets per client)
  • Health monitoring for backend servers
  • Minimal connection disruption during failovers

Despite initial doubts, HAProxy is actually well-suited for this use case when properly configured. Here's a sample configuration that addresses the core requirements:


frontend tcp_frontend
    bind *:80
    mode tcp
    timeout client 15h
    default_backend tcp_backend

backend tcp_backend
    mode tcp
    balance leastconn
    timeout server 15h
    stick-table type ip size 200k expire 16h
    stick on src
    server server1 192.168.1.10:80 check maxconn 300
    server server2 192.168.1.11:80 check maxconn 300

The critical components that make this work:


# Session stickiness based on source IP
stick-table type ip size 200k expire 16h
stick on src

# Extended timeout matching connection duration
timeout client 15h
timeout server 15h

# Connection-based load balancing
balance leastconn

For ~900 total connections (300 clients × 3 sockets):

  • HAProxy can easily handle this load (tested up to 50k concurrent connections on modest hardware)
  • Memory usage will be minimal (~5MB for stick tables)
  • CPU utilization typically under 5% for this traffic pattern

If you need more advanced features:


# Linux Virtual Server (LVS) with persistence
ipvsadm -A -t 203.0.113.1:80 -s lc
ipvsadm -a -t 203.0.113.1:80 -r 192.168.1.10 -m -p 54000
ipvsadm -a -t 203.0.113.1:80 -r 192.168.1.11 -m -p 54000

Or using Nginx TCP load balancing:


stream {
    upstream backend {
        least_conn;
        server 192.168.1.10:80;
        server 192.168.1.11:80;
    }

    server {
        listen 80;
        proxy_pass backend;
        proxy_timeout 15h;
    }
}

Essential metrics to track:

  • Active connections per backend
  • Session table utilization
  • Connection duration distribution
  • Health check failures

When dealing with persistent TCP connections lasting 10-15 hours, traditional load balancing approaches often fall short. The current client-side round-robin implementation with multiple public IPs creates several pain points:

  • No real-time server health monitoring
  • No intelligent traffic distribution
  • Difficulty maintaining session affinity across multiple ports

Despite initial doubts, HAProxy excels precisely for this use case. Here's why:

# Sample HAProxy configuration for persistent TCP connections
frontend tcp_front
    bind *:80
    mode tcp
    option tcplog
    timeout client 15h
    default_backend tcp_back

backend tcp_back
    mode tcp
    balance leastconn
    stick-table type ip size 200k expire 8h
    stick on src
    timeout server 15h
    server server1 192.168.1.10:80 check
    server server2 192.168.1.11:80 check
    server server3 192.168.1.12:80 check

The magic happens through these key settings:

  • stick-table: Maintains client IP to server mapping
  • stick on src: Ensures all connections from same client hit same backend
  • 15h timeouts: Matches your connection duration requirements
  • leastconn: Better than round-robin for persistent connections

For the 3-connection requirement, consider either:

# Option 1: Separate frontends with shared stick table
frontend port1
    bind *:8001
    # ... same config as above ...
    use_backend tcp_back

frontend port2
    bind *:8002
    # ... same config as above ...
    use_backend tcp_back

Or implement port ranges:

# Option 2: Port range binding
frontend tcp_range
    bind *:8001-8003
    # ... rest of config ...

At ~300 clients with 3 connections each (900 total) and 15KB/s per connection:

  • Network throughput: ~13.5MB/s aggregate - trivial for modern hardware
  • Connection count: Well below HAProxy's limits (typically handles 10k+ connections)
  • Memory usage: ~2MB for stick tables at this scale

While HAProxy is my top recommendation, other options include:

  1. Nginx Plus: Commercial version with TCP load balancing
  2. AWS Network Load Balancer: For cloud deployments
  3. Keepalived + IPVS: For Linux-based solutions