Debugging Apache Bench SSL Handshake Failures at High Concurrency Levels (200+ Threads)


3 views

When stress-testing HTTPS endpoints with Apache Bench (ab), many developers encounter mysterious SSL handshake failures that appear precisely when crossing a specific concurrency threshold. In this case, the magic number is 155 concurrent connections.

ab -n 1000 -c 200 https://example.com/api/v1/endpoint

The primary culprit is OpenSSL's session cache mechanism. Each SSL handshake requires:

  • TLS version negotiation
  • Cipher suite selection
  • Certificate verification
  • Session ticket generation

At high concurrency, OpenSSL's default session cache size (16,384 entries) becomes a bottleneck. When multiple threads attempt simultaneous handshakes:

openssl s_client -connect example.com:443 -state -debug

1. Increase OpenSSL session cache size:

export OPENSSL_CONF=/etc/ssl/openssl.cnf
# Add/modify:
openssl_conf = openssl_def
[openssl_def]
ssl_conf = ssl_sect
[ssl_sect]
system_default = system_default_sect
[system_default_sect]
Options = SessionCache
SessionCacheSize = 32768

2. Use keep-alive connections:

ab -k -n 1000 -c 200 https://example.com

3. Distribute load across multiple source IPs:

# Using xinetd to create multiple local proxies
service example-proxy
{
    type = UNLISTED
    socket_type = stream
    protocol = tcp
    wait = no
    user = nobody
    port = 5000-5020
    server = /usr/bin/nc
    server_args = example.com 443
}

For heavy SSL load testing:

  • wrk2: Better thread handling
    wrk -t4 -c200 -d30s --latency https://example.com
  • vegeta: Distributed load testing
    echo "GET https://example.com" | vegeta attack -rate=200 -duration=30s

Use OpenSSL's built-in statistics:

openssl s_client -connect example.com:443 -status < /dev/null 2>&1 | grep "SSL-Session"

Key metrics to watch:

  • Session cache hits/misses
  • Session ID context size
  • Renegotiation requests

When running Apache Bench (ab) tests with high concurrency levels (specifically >155 concurrent connections), I consistently encountered SSL handshake failures while the requests ultimately completed successfully:

SSL handshake failed (5).
SSL handshake failed (5).
...
Complete requests:      200
Failed requests:        0

This behavior typically stems from one of these system limitations:

  • OpenSSL session cache collisions: The default session cache size can't handle high concurrent handshakes
  • File descriptor limits: Check with ulimit -n and increase if needed
  • TCP/IP stack limitations: Kernel parameters may need tuning for many concurrent connections
  • Server-side limitations: The target server may have connection rate limits

Here are concrete steps to resolve the issue:

# Increase file descriptors (Linux/MacOS)
ulimit -n 65536

# Use keep-alive to reduce handshakes
ab -n 1000 -c 200 -k https://example.com/

# Modify OpenSSL configuration (if you control the server)
# Add to openssl.cnf:
openssl_conf = openssl_init
[openssl_init]
ssl_conf = ssl_sect
[ssl_sect]
system_default = system_default_sect
[system_default_sect]
Options = SessionTicket

For production-grade testing:

# Recommended ab command with optimized parameters
ab -n 5000 -c 200 \
   -k -l \
   -Z ECDHE-RSA-AES256-GCM-SHA384 \
   -H "Connection: keep-alive" \
   https://example.com/api/v1/test

While running tests, monitor these metrics in separate terminals:

# Watch open connections
watch -n 1 "netstat -an | grep ESTABLISHED | wc -l"

# Monitor SSL handshake states
openssl s_client -connect example.com:443 -state -debug

If limitations persist, consider these alternatives:

  • wrk2: More modern HTTP benchmarking tool
  • locust: Python-based load testing framework
  • vegeta: Golang-based HTTP load testing tool

Remember that each testing scenario requires different optimizations. The key is to systematically identify and eliminate bottlenecks.