When implementing CONN_MAX_AGE=60
in our Django 1.6.7 application with PostgreSQL 9.3, we noticed an unexpected pattern:
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'CONN_MAX_AGE': 60, # Connection lifetime in seconds
# other params...
}
}
Instead of connection reuse, we observed:
- Immediate connection count doubling (from ~250 to max_connections)
- Connection timeouts appearing in Sentry alerts
- Server load increasing from 20-30 to critical levels
Our Gunicorn configuration uses eventlet workers:
gunicorn myapp.wsgi:application --worker-class=eventlet --workers 8
This combination creates specific challenges:
- Eventlet's green threads maintain separate connection states
- Django's connection pooling wasn't designed for coroutine-based workers
- Each greenthread creates its own persistent connection
To verify connection behavior, we added monitoring:
# utils/db_monitor.py
from django.db import connections
import psycopg2
def check_connections():
for conn_name in connections:
conn = connections[conn_name]
if conn.connection is not None:
print(f"Connection {conn_name}: PID {conn.connection.get_backend_pid()}")
Key findings showed:
- Identical backend PIDs appearing multiple times
- No actual connection reuse despite persistence
- Connection churn during request processing
Option 1: Switch to Sync Workers
gunicorn myapp.wsgi:application --worker-class=sync --workers 8
Pros:
- Proper connection pooling behavior
- Predictable connection reuse
Option 2: Implement External Pooling
Using pgBouncer in transaction mode:
# pgBouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
Option 3: Upgrade Stack Components
# requirements.txt
Django>=2.2 # Improved connection handling
psycopg2-binary>=2.8
Solution | Connections | Load Avg | Timeout Rate |
---|---|---|---|
Original | 500+ | 35 | 12% |
Sync Workers | 120 | 18 | 0.5% |
pgBouncer | 80 | 15 | 0.2% |
For production systems:
- Start with pgBouncer for immediate relief
- Plan migration to newer Django versions
- Consider connection pool monitoring:
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;
When implementing Django's built-in connection pooling with CONN_MAX_AGE
in a PostgreSQL environment, we observed an unexpected behavior: while connections were persisting (as evidenced by the increased connection count), they weren't being effectively reused. This manifested in our monitoring as connection spikes reaching our PostgreSQL max_connections
limit, followed by connection rejections.
The root cause appears to be an incompatibility between Django's connection handling and the eventlet worker model. Eventlet's green threads don't play nicely with Django's thread-local connection storage. Each green thread essentially creates its own connection, but the pooling mechanism doesn't properly account for this architecture.
# Problematic configuration in gunicorn_config.py
worker_class = 'eventlet'
workers = 8
To confirm this was our issue, we ran several tests:
- Monitor
pg_stat_activity
to see connection lifetimes - Check Django's connection registry with debug middleware
- Compare behavior with different worker classes
We found three viable approaches to resolve this:
Option 1: Switch Worker Class
# Recommended async worker for Django+PostgreSQL
worker_class = 'gevent'
workers = (2 * cpu_count) + 1
Option 2: External Connection Pooling
Using pgBouncer in transaction pooling mode:
# settings.py with pgBouncer
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'localhost',
'PORT': '6432', # pgBouncer port
'NAME': 'mydb',
'CONN_MAX_AGE': 600,
}
}
Option 3: Django-DB-Gevent Pool
For those committed to eventlet, this package helps:
# installation
pip install django-db-gevent-pool
# settings.py configuration
DATABASES = {
'default': {
'ENGINE': 'django_db_geventpool.backends.postgresql_psycopg2',
'MAX_CONNS': 20,
'REUSE_CONNS': 10,
}
}
After implementing pgBouncer, we observed:
- Connection count reduced from ~250 to stable 40-50
- Load average dropped from 20-30 to 8-12
- No more connection timeout errors