When running scheduled backups via systemd timers, we often encounter a frustrating scenario: the backup script completes its primary function but returns a non-zero exit code due to non-critical warnings. While technically successful, this triggers systemd's failure state mechanism, potentially preventing subsequent timer executions.
Systemd treats any non-zero exit code as a failure by default. This becomes problematic with:
[Unit]
Description=Backup Service
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/bin/docker run --rm backup-image
The service enters failed state when the container exits with warnings, despite successful backup completion.
1. Exit Code Normalization in Entrypoint
Modify your container's entrypoint to always return success:
#!/bin/bash
/backup/script.sh || true # Forces exit code 0
2. Systemd Service Configuration Options
Use these directives in your service unit:
[Service]
SuccessExitStatus=0 1 2 # Accepts multiple exit codes
RestartForceExitStatus= # Empty means don't restart
3. Post-Execution Cleanup
Add a reset mechanism:
[Service]
ExecStopPost=/bin/bash -c "systemctl reset-failed %n"
4. Timer-Specific Configuration
Ensure your timer unit includes:
[Timer]
Unit=backup.service
Persistent=true # Ensures missed runs are executed
For more sophisticated monitoring:
[Service]
ExecStart=/usr/bin/bash -c '/backup/script.sh; echo $? > /run/backup.status'
Maintain visibility while handling exit codes:
[Service]
StandardOutput=journal
StandardError=journal
LogLevelMax=warning # Filters out debug noise
When running backup services in containers managed by systemd and fleet, we frequently encounter a frustrating scenario: the backup script completes successfully but returns non-zero exit codes due to non-critical warnings. This causes the service to enter a failed
state, which then prevents the associated timer from triggering subsequent executions.
Systemd treats any non-zero exit code as a failure by default. While this makes sense for most services, it becomes problematic for backup operations where:
- Warnings about non-critical files are common
- Partial success is still valuable
- We want the timer to continue triggering regardless
1. Custom Exit Code Handling in Service Unit
Modify your backup.service
to ignore specific exit codes:
[Service]
ExecStart=/usr/bin/docker run --rm backup-container
SuccessExitStatus=0 1 2
Restart=on-failure
RestartSec=60s
2. Wrapper Script Approach
Create a wrapper script that handles exit code conversion:
#!/bin/bash
/backup/actual-script.sh
exit_status=$?
if [ $exit_status -eq 1 ]; then
# Known warning condition
exit 0
else
exit $exit_status
fi
3. Automatic State Reset
Use ExecStopPost
to clean the failed state:
[Service]
ExecStart=/usr/bin/docker run --rm backup-container
ExecStopPost=/usr/bin/systemctl reset-failed backup.service
For fleet environments, you might need to combine these approaches. The most robust solution is often:
[Unit]
Description=Backup Service
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup-wrapper
SuccessExitStatus=0 1
ExecStopPost=/bin/sh -c "systemctl reset-failed %n"
[Install]
WantedBy=multi-user.target
For critical backups, consider implementing explicit success tracking:
[Service]
ExecStart=/bin/bash -c '/backup/script.sh && touch /var/run/backup.success'
ExecStopPost=/bin/bash -c 'if [ -f /var/run/backup.success ]; then rm /var/run/backup.success; exit 0; else exit 1; fi'