When critical processes like Varnish crash intermittently, it can wreak havoc on production environments. Many sysadmins face situations where the monitoring tool (in this case Monit) detects the failure but subsequent restart attempts fail, leaving the service down.
Your existing setup has a logical structure but might need some adjustments:
check process varnish with pidfile /var/run/varnish.pid
start program = "/etc/init.d/varnish start" with timeout 60 seconds
stop program = "/etc/init.d/varnish stop"
if failed host 192.168.1.100 port 80 protocol http
and request "/healthcheck" then restart
if 3 restarts within 5 cycles then timeout
group server
Several factors could cause restart failures:
- PID file not being properly cleaned up
- Port conflicts when restarting
- Resource starvation preventing new instances
- Improper shutdown sequences
Try this more robust configuration that handles edge cases better:
check process varnish with pidfile /var/run/varnish.pid
start program = "/bin/bash -c '/etc/init.d/varnish stop; sleep 2; /etc/init.d/varnish start'"
as uid varnish and gid varnish
with timeout 90 seconds
stop program = "/etc/init.d/varnish stop"
as uid varnish and gid varnish
with timeout 30 seconds
if failed host 127.0.0.1 port 80
protocol http request "/healthcheck"
with timeout 10 seconds for 3 times within 5 cycles
then restart
if 5 restarts within 10 cycles then exec "/usr/local/bin/alert_admin.sh"
depends on varnish_bin
group cache_services
If Monit continues to be problematic, consider these alternatives:
Systemd Service Recovery
For systems using systemd, add these directives to your service unit file:
[Service]
Restart=on-failure
RestartSec=5s
StartLimitInterval=100s
StartLimitBurst=5
Supervisor Approach
Supervisord provides more sophisticated process control:
[program:varnish]
command=/usr/sbin/varnishd -f /etc/varnish/default.vcl -s malloc,256m
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/varnish.err.log
stdout_logfile=/var/log/varnish.out.log
user=varnish
When automatic restarts fail, check these components:
- Examine
/var/log/varnish.log
for startup errors - Verify permissions on PID file directory
- Test manual start/stop sequences
- Check for port conflicts with
netstat -tulnp | grep 80
A more comprehensive health check can prevent false positives:
#!/bin/bash
response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost/healthcheck)
if [ "$response" -eq 200 ]; then
exit 0
elif varnishadm ping | grep -q "PONG"; then
exit 0
else
exit 1
fi
Many sysadmins face situations where Varnish - the high-performance HTTP accelerator - keeps crashing unexpectedly. The standard Monit configuration often fails to properly restart it, leaving your site vulnerable. Here's a deeper look at why this happens and how to fix it.
The typical Monit configuration has several potential failure points when dealing with Varnish:
check process varnish with pidfile /var/run/varnish.pid
start program = "/etc/init.d/varnish start" with timeout 60 seconds
stop program = "/etc/init.d/varnish stop"
if failed host 127.0.0.1 port 80 protocol http
and request "/blank.html" then restart
if 3 restarts within 5 cycles then timeout
Common issues include:
- The pidfile location might be incorrect (varies by distro)
- Init scripts might not properly clean up stale processes
- Port 80 checks might fail even when Varnish is technically running
Option 1: Enhanced Monit Configuration
Try this more robust configuration that adds additional checks and proper cleanup:
check process varnish with pidfile /var/run/varnish.pid
start program = "/bin/bash -c '/etc/init.d/varnish stop; sleep 2; /etc/init.d/varnish start'"
stop program = "/etc/init.d/varnish stop"
if failed host 127.0.0.1 port 80 protocol http
and request "/blank.html" with timeout 15 seconds for 3 times within 4 cycles then restart
if 5 restarts within 5 cycles then exec "/bin/bash -c 'echo \"Varnish keeps crashing\" | mail -s \"Varnish Alert\" admin@example.com'"
depends on varnish_bin
group varnish
Option 2: Systemd-based Solution
For systems using systemd, create a service unit with automatic restart:
[Unit]
Description=Varnish HTTP accelerator
After=network.target
[Service]
Type=forking
Restart=always
RestartSec=5
PIDFile=/run/varnish.pid
ExecStart=/usr/sbin/varnishd -j unix,user=varnish -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m
ExecReload=/usr/sbin/varnishreload
[Install]
WantedBy=multi-user.target
Option 3: Supervisord Alternative
For more control, consider using Supervisord:
[program:varnish]
command=/usr/sbin/varnishd -j unix,user=varnish -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m
autostart=true
autorestart=true
startretries=5
stderr_logfile=/var/log/varnish/supervisor_err.log
stdout_logfile=/var/log/varnish/supervisor_out.log
When troubleshooting Varnish crashes:
- Check shared memory allocation with
varnishstat -1 -f MAIN.shm_*
- Monitor worker threads:
varnishstat -1 -f threads.*
- Verify backend health:
varnishlog -g request -q 'Backend_health'
- Examine recent panics:
journalctl -u varnish | grep panic
To reduce crashes:
- Implement proper VCL error handling
- Set conservative timeouts for backends
- Monitor memory usage and adjust malloc allocation
- Regularly check for and install security updates