Debugging “Job Canceled” Errors During systemd Service Stops: Causes and Solutions


2 views

When working with systemd services, encountering a Job for [service].service canceled message during shutdown typically indicates one of several underlying issues. This behavior differs from normal service termination where you'd expect either success or a timeout.

From hands-on troubleshooting, these are the most frequent culprits:

# Example of problematic service unit that might cause cancellation
[Service]
ExecStart=/usr/bin/my-service --daemonize
ExecStop=/usr/bin/my-service --shutdown
TimeoutStopSec=30s
KillSignal=SIGTERM

When the stop operation originates from Ansible, these factors often contribute:

# Ansible playbook snippet that might trigger the issue
- name: Stop test service
  ansible.builtin.systemd:
    name: test-server
    state: stopped
    timeout: 10  # Potentially too short for complex shutdowns

Systemd maintains job queues for service operations. A cancellation typically occurs when:

  • Another conflicting job gets higher priority
  • The job exceeds its allocated timeout period
  • Systemd's job queue becomes overloaded

Try these diagnostic commands when facing cancellations:

# Check service dependencies
systemctl list-dependencies test-server.service

# View complete journal logs with microsecond precision
journalctl -u test-server.service --since "1 hour ago" --no-pager -o short-precise

# Check for deadlocked processes
systemctl status test-server.service -l

Here's a robust service configuration template that minimizes cancellation risks:

[Unit]
Description=Robust Service Template
After=network.target

[Service]
Type=notify
ExecStart=/usr/bin/my-service
ExecStop=/usr/bin/clean-shutdown-script
TimeoutStopSec=120s
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target

For complex cases, understanding systemd's job states helps. A cancellation occurs during these state transitions:

JOB_RUNNING → JOB_CANCELED (when externally triggered)
JOB_WAITING → JOB_CANCELED (when dependency fails)

Modify your Ansible tasks to handle cancellations gracefully:

- name: Stop service with retry logic
  ansible.builtin.systemd:
    name: test-server
    state: stopped
    timeout: 120
  register: stop_result
  retries: 3
  delay: 5
  until: stop_result is succeeded
  ignore_errors: yes

- name: Force kill if necessary
  ansible.builtin.command: systemctl kill test-server.service -s SIGKILL
  when: stop_result is failed

When systemctl reports "Job for [service].service canceled", this indicates that the stop operation was actively terminated before completion. The cancellation typically occurs when:

  • Another process (like Ansible) sends a conflicting command
  • The service's ExecStop= operation times out
  • Systemd's job timeout threshold is exceeded
  • Conflicting dependencies exist

When stopping services through Ansible, the default behavior might contribute to cancellations:

# Typical Ansible service module usage that could cause issues
- name: Stop test server
  ansible.builtin.service:
    name: test-server
    state: stopped
    timeout: 30  # Default timeout value

Systemd has multiple relevant timeout settings:

# Check current timeouts for the service
systemctl show test-server.service | grep Timeout

# Common relevant properties:
TimeoutStopSec=90s  # In service unit file
DefaultTimeoutStopSec=  # In system.conf

First, examine the complete service lifecycle:

journalctl -u test-server --no-pager -n 100
systemctl status test-server.service

Then verify potential Ansible conflicts:

# Add debug to your playbook
- name: Debug service state
  ansible.builtin.debug:
    var: ansible_facts.services["test-server.service"].status

Option 1: Extend timeouts in the unit file:

[Service]
TimeoutStopSec=300
KillMode=mixed

Option 2: Modify Ansible task parameters:

- name: Stop service with extended timeout
  ansible.builtin.service:
    name: test-server
    state: stopped
    timeout: 120  # Extended timeout in seconds

For complex cases, consider:

  1. Strace the service stop process:
  2. strace -p $(systemctl show test-server -p MainPID --value)
  3. Check for conflicting systemd transactions:
  4. systemctl list-jobs