In production Ruby on Rails environments, the Unicorn application server presents unique challenges when balancing CPU utilization and memory footprint. While the traditional (n_cores + 1)
formula works well for CPU-bound applications, web applications often have different characteristics:
# Typical unicorn.rb configuration
worker_processes 12 # For 12-core system
listen "/tmp/unicorn.sock", backlog: 64
timeout 30
Unicorn's architecture requires special consideration because:
- Workers block during I/O operations (DB queries, external API calls)
- MRI Ruby's GIL prevents true parallel execution within a process
- OS schedulers can context-switch between workers during I/O wait
Benchmark showing worker utilization during I/O:
# Simulation of I/O bound workload
def api_call
sleep(0.1) # Simulate network latency
# Process response
end
For a 12-core Xeon E5645 system with load average ~6:
# Optimized unicorn.rb configuration
worker_processes 18 # 1.5x core count
working_directory "/var/www/app"
listen "/tmp/unicorn.sock", backlog: 128
after_fork do |server, worker|
ActiveRecord::Base.establish_connection
end
Key metrics to monitor when tuning workers:
Metric | Target Value | Monitoring Command |
---|---|---|
Memory per worker | < 300MB | ps -o rss= -p pgrep -f unicorn |
Request queue | < 5% dropped | netstat -xlnp | grep unicorn |
CPU utilization | 60-80% | mpstat -P ALL 1 5 |
For memory-constrained environments:
# Enable kernel memory overcommit
sysctl -w vm.overcommit_memory=2
# Use copy-on-write friendly config
preload_app true
GC.start(full_mark: true, immediate_sweep: true)
Unicorn's architecture fundamentally differs from threaded servers like Puma. As a pre-forking Rack server, each Unicorn worker is a separate process with its own memory space. The master process manages worker lifecycles but doesn't handle requests directly. This design has important implications for CPU utilization:
# Typical Unicorn config showing worker processes
worker_processes 12 # For 12-core system
preload_app true
listen "/tmp/.unicorn.sock", backlog: 1024
In your 12-core system with peak load of 6, the rationale for additional workers stems from:
- I/O Wait States: When workers block on database queries, external APIs, or filesystem operations
- Request Queue Management: Even with backlog, additional workers prevent TCP socket drops during spikes
- Slow Clients: HTTP clients with high latency can tie up workers
For memory-constrained deployments, consider these optimizations:
# Memory-conscious configuration
worker_processes 8 # 2/3 of CPU cores
timeout 30 # Aggressive timeout
preload_app true # Share memory via copy-on-write
GC.respond_to?(:compact) && GC.compact # Ruby 2.7+ memory optimization
Instead of pure worker increases, combine strategies:
- Reverse Proxy Buffering: Configure Nginx to buffer slow clients
- Connection Pooling: Optimize database connection reuse
- Vertical Scaling: Upgrade to CPUs with hyperthreading
A SaaS application with similar specs achieved optimal performance with:
# Production-tuned configuration
worker_processes (ENV['RAILS_MAX_THREADS'] || 4).to_i * 1.5
stderr_path "/var/log/unicorn/stderr.log"
stdout_path "/var/log/unicorn/stdout.log"
before_fork do |server, worker|
defined?(ActiveRecord::Base) && ActiveRecord::Base.connection.disconnect!
end
after_fork do |server, worker|
defined?(ActiveRecord::Base) && ActiveRecord::Base.establish_connection
end