Optimizing Unicorn Worker Processes: CPU Core Allocation and Memory Efficiency in Ruby on Rails


3 views

In production Ruby on Rails environments, the Unicorn application server presents unique challenges when balancing CPU utilization and memory footprint. While the traditional (n_cores + 1) formula works well for CPU-bound applications, web applications often have different characteristics:

# Typical unicorn.rb configuration
worker_processes 12  # For 12-core system
listen "/tmp/unicorn.sock", backlog: 64
timeout 30

Unicorn's architecture requires special consideration because:

  • Workers block during I/O operations (DB queries, external API calls)
  • MRI Ruby's GIL prevents true parallel execution within a process
  • OS schedulers can context-switch between workers during I/O wait

Benchmark showing worker utilization during I/O:

# Simulation of I/O bound workload
def api_call
  sleep(0.1)  # Simulate network latency
  # Process response
end

For a 12-core Xeon E5645 system with load average ~6:

# Optimized unicorn.rb configuration
worker_processes 18  # 1.5x core count
working_directory "/var/www/app"
listen "/tmp/unicorn.sock", backlog: 128

after_fork do |server, worker|
  ActiveRecord::Base.establish_connection
end

Key metrics to monitor when tuning workers:

Metric Target Value Monitoring Command
Memory per worker < 300MB ps -o rss= -p pgrep -f unicorn
Request queue < 5% dropped netstat -xlnp | grep unicorn
CPU utilization 60-80% mpstat -P ALL 1 5

For memory-constrained environments:

# Enable kernel memory overcommit
sysctl -w vm.overcommit_memory=2

# Use copy-on-write friendly config
preload_app true
GC.start(full_mark: true, immediate_sweep: true)

Unicorn's architecture fundamentally differs from threaded servers like Puma. As a pre-forking Rack server, each Unicorn worker is a separate process with its own memory space. The master process manages worker lifecycles but doesn't handle requests directly. This design has important implications for CPU utilization:

# Typical Unicorn config showing worker processes
worker_processes 12 # For 12-core system
preload_app true
listen "/tmp/.unicorn.sock", backlog: 1024

In your 12-core system with peak load of 6, the rationale for additional workers stems from:

  • I/O Wait States: When workers block on database queries, external APIs, or filesystem operations
  • Request Queue Management: Even with backlog, additional workers prevent TCP socket drops during spikes
  • Slow Clients: HTTP clients with high latency can tie up workers

For memory-constrained deployments, consider these optimizations:

# Memory-conscious configuration
worker_processes 8 # 2/3 of CPU cores
timeout 30         # Aggressive timeout
preload_app true   # Share memory via copy-on-write
GC.respond_to?(:compact) && GC.compact # Ruby 2.7+ memory optimization

Instead of pure worker increases, combine strategies:

  1. Reverse Proxy Buffering: Configure Nginx to buffer slow clients
  2. Connection Pooling: Optimize database connection reuse
  3. Vertical Scaling: Upgrade to CPUs with hyperthreading

A SaaS application with similar specs achieved optimal performance with:

# Production-tuned configuration
worker_processes (ENV['RAILS_MAX_THREADS'] || 4).to_i * 1.5
stderr_path "/var/log/unicorn/stderr.log"
stdout_path "/var/log/unicorn/stdout.log"

before_fork do |server, worker|
  defined?(ActiveRecord::Base) && ActiveRecord::Base.connection.disconnect!
end

after_fork do |server, worker|
  defined?(ActiveRecord::Base) && ActiveRecord::Base.establish_connection
end