Optimizing Unicorn Worker Processes: CPU Core Allocation and Memory Efficiency in Ruby on Rails


11 views

In production Ruby on Rails environments, the Unicorn application server presents unique challenges when balancing CPU utilization and memory footprint. While the traditional (n_cores + 1) formula works well for CPU-bound applications, web applications often have different characteristics:

# Typical unicorn.rb configuration
worker_processes 12  # For 12-core system
listen "/tmp/unicorn.sock", backlog: 64
timeout 30

Unicorn's architecture requires special consideration because:

  • Workers block during I/O operations (DB queries, external API calls)
  • MRI Ruby's GIL prevents true parallel execution within a process
  • OS schedulers can context-switch between workers during I/O wait

Benchmark showing worker utilization during I/O:

# Simulation of I/O bound workload
def api_call
  sleep(0.1)  # Simulate network latency
  # Process response
end

For a 12-core Xeon E5645 system with load average ~6:

# Optimized unicorn.rb configuration
worker_processes 18  # 1.5x core count
working_directory "/var/www/app"
listen "/tmp/unicorn.sock", backlog: 128

after_fork do |server, worker|
  ActiveRecord::Base.establish_connection
end

Key metrics to monitor when tuning workers:

Metric Target Value Monitoring Command
Memory per worker < 300MB ps -o rss= -p pgrep -f unicorn
Request queue < 5% dropped netstat -xlnp | grep unicorn
CPU utilization 60-80% mpstat -P ALL 1 5

For memory-constrained environments:

# Enable kernel memory overcommit
sysctl -w vm.overcommit_memory=2

# Use copy-on-write friendly config
preload_app true
GC.start(full_mark: true, immediate_sweep: true)

Unicorn's architecture fundamentally differs from threaded servers like Puma. As a pre-forking Rack server, each Unicorn worker is a separate process with its own memory space. The master process manages worker lifecycles but doesn't handle requests directly. This design has important implications for CPU utilization:

# Typical Unicorn config showing worker processes
worker_processes 12 # For 12-core system
preload_app true
listen "/tmp/.unicorn.sock", backlog: 1024

In your 12-core system with peak load of 6, the rationale for additional workers stems from:

  • I/O Wait States: When workers block on database queries, external APIs, or filesystem operations
  • Request Queue Management: Even with backlog, additional workers prevent TCP socket drops during spikes
  • Slow Clients: HTTP clients with high latency can tie up workers

For memory-constrained deployments, consider these optimizations:

# Memory-conscious configuration
worker_processes 8 # 2/3 of CPU cores
timeout 30         # Aggressive timeout
preload_app true   # Share memory via copy-on-write
GC.respond_to?(:compact) && GC.compact # Ruby 2.7+ memory optimization

Instead of pure worker increases, combine strategies:

  1. Reverse Proxy Buffering: Configure Nginx to buffer slow clients
  2. Connection Pooling: Optimize database connection reuse
  3. Vertical Scaling: Upgrade to CPUs with hyperthreading

A SaaS application with similar specs achieved optimal performance with:

# Production-tuned configuration
worker_processes (ENV['RAILS_MAX_THREADS'] || 4).to_i * 1.5
stderr_path "/var/log/unicorn/stderr.log"
stdout_path "/var/log/unicorn/stdout.log"

before_fork do |server, worker|
  defined?(ActiveRecord::Base) && ActiveRecord::Base.connection.disconnect!
end

after_fork do |server, worker|
  defined?(ActiveRecord::Base) && ActiveRecord::Base.establish_connection
end