How to Identify Failed systemd Services in Linux: Troubleshooting and Commands


5 views

html

When managing Linux servers, the systemctl command is essential for service monitoring. A common indicator of problems is when systemctl status shows:

● server.example.com
    State: degraded
     Jobs: 0 queued
   Failed: 1 units
    Since: Fri 2023-08-18 14:22:15 UTC

The most precise command to identify failed services is:

systemctl list-units --state=failed
# Alternative with more details:
systemctl list-units --failed --all --no-pager

This outputs a clean list of failed units like:

UNIT                     LOAD   ACTIVE SUB    DESCRIPTION
nginx.service           loaded failed failed nginx - high performance web server

For deeper investigation, use these commands:

# Show last 20 journal entries for a specific service
journalctl -u nginx.service -n 20 --no-pager

# Follow logs in real-time
journalctl -u failed.service -f

# Check service dependencies
systemctl list-dependencies failed.service

When encountering an Nginx failure, check configuration:

# Test configuration before restart
nginx -t

# Common remediation steps
systemctl reset-failed nginx
systemctl daemon-reload
systemctl start nginx

For monitoring scripts, this one-liner is useful:

#!/bin/bash
failed_services=$(systemctl list-units --state=failed --no-legend | awk '{print $1}')
[ -z "$failed_services" ] || echo "Alert: Failed services - $failed_services"

When managing Linux servers with systemd, seeing a degraded state with failed units can be concerning. The systemctl status output shows the system state but doesn't immediately reveal which specific service failed.

Here are three effective ways to find failed services:

# 1. List all failed units
systemctl --failed

# 2. Filter for failed services specifically
systemctl list-units --state=failed

# 3. Check all units with detailed state
systemctl list-units --all

Once you've identified the failed service (let's say nginx.service), investigate further:

# View detailed status
systemctl status nginx.service -l

# Check service dependencies
systemctl list-dependencies nginx.service

# Examine journal logs
journalctl -u nginx.service -b

For persistent failures, try these diagnostic commands:

# Show service startup time
systemd-analyze blame | grep nginx

# Verify unit file syntax
systemd-analyze verify nginx.service

# Check for masking
systemctl is-enabled nginx.service

Create a monitoring script to alert about failed services:

#!/bin/bash

failed_services=$(systemctl --failed --no-legend | awk '{print $1}')

if [[ -n "$failed_services" ]]; then
    echo "Alert: Failed services detected:"
    echo "$failed_services"
    # Add your notification logic here
fi

Set this script to run periodically via cron for proactive monitoring.