Automatically Restart MongoDB on Crash in Ubuntu Using Upstart/Supervisord


2 views

When running MongoDB in production on Ubuntu servers, a common issue emerges: the mongod process might crash due to various reasons (OOM killer, segmentation faults, etc.), but the default upstart configuration doesn't automatically restart it. Unlike systemd, Ubuntu's older upstart system requires explicit respawn directives.

For systems still using upstart (Ubuntu 14.04 or earlier), edit the MongoDB init file:


# /etc/init/mongod.conf
description "MongoDB Database Server"
author "MongoDB, Inc"

start on runlevel [2345]
stop on runlevel [06]

respawn                  # This enables automatic restart
respawn limit 5 10       # Max 5 restarts within 10 seconds

pre-start script
    mkdir -p /var/lib/mongodb/
    chown mongodb:mongodb /var/lib/mongodb/
end script

exec /usr/bin/mongod --config /etc/mongod.conf

For Ubuntu 16.04+ using systemd, create or modify the service unit:


# /etc/systemd/system/mongodb.service
[Unit]
Description=MongoDB Database Service
After=network.target

[Service]
User=mongodb
Group=mongodb
ExecStart=/usr/bin/mongod --config /etc/mongod.conf
Restart=always
RestartSec=10
TimeoutSec=300

[Install]
WantedBy=multi-user.target

Then run:


sudo systemctl daemon-reload
sudo systemctl enable mongodb

For more control, consider using supervisord:


# /etc/supervisor/conf.d/mongodb.conf
[program:mongodb]
command=/usr/bin/mongod --config /etc/mongod.conf
autostart=true
autorestart=true
startretries=5
stderr_logfile=/var/log/mongodb.err.log
stdout_logfile=/var/log/mongodb.out.log
user=mongodb

After implementing any solution, verify with:


# For upstart
initctl status mongod

# For systemd
systemctl status mongodb

# For supervisord
supervisorctl status

Implement proper monitoring using tools like:

  • Monit
  • Nagios checks
  • Custom health scripts with cron

When running MongoDB in production environments, process stability is crucial. Traditional init systems like Upstart (which Ubuntu phased out after version 15.04) don't always provide robust crash recovery. The modern solution is to use systemd, which offers superior process supervision and automatic restart capabilities.

First, let's verify your current setup:

ps -ef | grep mongod
systemctl status mongod

If you're still using Upstart, you'll need to transition to systemd. Ubuntu 16.04+ uses systemd by default, and MongoDB provides native systemd unit files.

Here's an optimized MongoDB systemd service configuration at /lib/systemd/system/mongod.service:

[Unit]
Description=MongoDB Database Server
After=network.target

[Service]
User=mongodb
Group=mongodb
ExecStart=/usr/bin/mongod --config /etc/mongod.conf
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=10s
StartLimitInterval=60
StartLimitBurst=3
TimeoutStopSec=5
PrivateTmp=true

[Install]
WantedBy=multi-user.target

The critical directives for crash recovery are:

  • Restart=always: Attempts to restart under any exit circumstance
  • RestartSec=10s: Waits 10 seconds before restarting to avoid thrashing
  • StartLimitInterval=60: Counts restart attempts within 60 seconds
  • StartLimitBurst=3: Allows 3 restarts within the interval period

After creating the file:

sudo systemctl daemon-reload
sudo systemctl enable mongod
sudo systemctl start mongod

To test the crash recovery:

sudo kill -9 $(pgrep mongod)
journalctl -u mongod -f

You should see MongoDB restarting automatically in the logs.

For additional monitoring, implement a health check script (/usr/local/bin/mongo-healthcheck):

#!/bin/bash
if ! mongo --eval 'db.runCommand({ping:1})' &> /dev/null; then
    systemctl restart mongod
    exit 1
fi
exit 0

Then add it to cron:

*/5 * * * * root /usr/local/bin/mongo-healthcheck

This provides a secondary monitoring layer that can catch cases where MongoDB is running but unresponsive.

If restarts aren't working:

  • Check journalctl -xe for errors
  • Verify permissions on MongoDB data directory
  • Ensure sufficient disk space exists
  • Check for port conflicts with sudo netstat -tulnp | grep 27017