Troubleshooting systemd Service Restart Failures: Why Restart=always Doesn’t Work and How to Fix It


2 views

When examining your autossh@.service unit file and its status output, we can identify several important details:

[Unit]
Description=Tunnel For %i
After=network.target

[Service]
User=autossh
ExecStart=/usr/bin/autossh -M 0 -N -o "ExitOnForwardFailure yes" -o "ConnectTimeout=1" -o "ServerAliveInterval 60" -o "ServerAliveCountMax 3" -R 40443:installserver:40443 -R 8080:installserver:8080 tunnel@%i
Restart=always

[Install]
WantedBy=multi-user.target

The key observation from the status output shows:

Active: failed (Result: start-limit) since Wed, 2016-02-10 14:33:34 CET; 2 weeks and 1 days ago

The start-limit result indicates systemd's rate-limiting mechanism has kicked in. By default, systemd implements these protections:

  • 5 restarts within 10 seconds (StartLimitBurst)
  • 10-second window (StartLimitInterval)

When these limits are exceeded, systemd stops attempting restarts regardless of your Restart=always setting.

To ensure your service keeps restarting indefinitely, modify your unit file as follows:

[Unit]
Description=Tunnel For %i
After=network.target
StartLimitIntervalSec=0

[Service]
User=autossh
ExecStart=/usr/bin/autossh -M 0 -N -o "ExitOnForwardFailure yes" -o "ConnectTimeout=1" -o "ServerAliveInterval 60" -o "ServerAliveCountMax 3" -R 40443:installserver:40443 -R 8080:installserver:8080 tunnel@%i
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Let's examine the critical additions:

StartLimitIntervalSec=0  # Disables rate limiting completely
RestartSec=5             # Waits 5 seconds between restart attempts

For more controlled behavior rather than complete disabling:

[Unit]
StartLimitIntervalSec=300  # 5 minute window
StartLimitBurst=30         # Allow 30 restarts within 5 minutes

When troubleshooting, these commands provide valuable insights:

# View complete service configuration
systemctl show autossh@eins-work

# Check restart history
journalctl -u autossh@eins-work --since "1 hour ago"

Consider these enhancements for critical services:

[Service]
...
TimeoutStartSec=30       # Adjust startup timeout
TimeoutStopSec=10        # Adjust shutdown timeout
KillMode=process         # More graceful signal handling
...

When debugging why Restart=always isn't working as expected, we need to examine multiple systemd mechanisms that might interfere with the restart behavior. The status output shows Result: start-limit, which is the key to understanding the issue.

Systemd implements rate-limiting for service restarts by default. The relevant parameters are:

[Service]
Restart=always
StartLimitIntervalSec=60
StartLimitBurst=5

If your service fails more than 5 times (StartLimitBurst) within 60 seconds (StartLimitIntervalSec), systemd will stop attempting to restart it.

Here's the proper way to configure a service that should always restart:

[Unit]
Description=Tunnel For %i
After=network.target
StartLimitIntervalSec=0

[Service]
User=autossh
ExecStart=/usr/bin/autossh -M 0 -N -o "ExitOnForwardFailure yes" -o "ConnectTimeout=1" -o "ServerAliveInterval 60" -o "ServerAliveCountMax 3" -R 40443:installserver:40443 -R 8080:installserver:8080 tunnel@%i
Restart=always
RestartSec=5
StartLimitIntervalSec=0

[Install]
WantedBy=multi-user.target
  • StartLimitIntervalSec=0: Disables rate limiting completely
  • RestartSec=5: Adds delay between restart attempts
  • Restart=always: Ensures restarts for all exit cases

To reset the start limit counter for a currently failing service:

systemctl reset-failed autossh@eins-work.service
systemctl start autossh@eins-work.service

For SSH-related services, consider adding these improvements:

[Service]
Environment="AUTOSSH_GATETIME=0"
Environment="AUTOSSH_POLL=60"
...

This helps autossh better manage connection stability and reduces false-positive failures.