When managing enterprise infrastructure, ensuring critical services like sendmail, xinetd, automount, and samba remain operational is paramount. While these services are generally stable, unexpected crashes can occur, making monitoring essential.
Nagios Core comes with a powerful generic plugin called check_procs that can monitor any running process. The basic syntax is:
define command {
command_name check_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -C $ARG3$
}
Here's how to configure service checks for various critical processes:
Monitoring Sendmail
define service {
use generic-service
host_name mail-server
service_description Sendmail Process
check_command check_procs!1:1!1!sendmail
}
Checking Samba Services
define service {
use generic-service
host_name file-server
service_description Samba Process
check_command check_procs!1:1!1!smbd
}
For more complex scenarios, you can combine multiple checks:
# Check OpenVPN with exact process count
define command {
command_name check_openvpn
command_line $USER1$/check_procs -w 1:1 -c 1 -C openvpn
}
# Check ClamAV with argument matching
define command {
command_name check_clamd
command_line $USER1$/check_procs -w 1:1 -c 1 -a '/usr/sbin/clamd'
}
For services that spawn multiple processes (like xinetd):
define service {
use generic-service
host_name network-server
service_description Xinetd Processes
check_command check_procs!3:5!1:2!xinetd
}
For services not easily detectable via process name, consider writing custom plugins:
#!/bin/bash
# check_mcafee.sh
if pgrep -x "uvscan" >/dev/null
then
echo "OK: McAfee processes running"
exit 0
else
echo "CRITICAL: McAfee not running"
exit 2
fi
When monitoring numerous processes, consider:
- Combining related checks into single service definitions
- Using process group monitoring instead of individual checks
- Adjusting check intervals based on service criticality
When managing critical infrastructure, simply checking if a service is listening on a port isn't always sufficient. Many essential services like sendmail, xinetd, or openvpn can appear "up" while their actual processing components might have crashed. This is where process-level monitoring becomes crucial.
The Nagios Exchange contains specialized plugins for popular services, but many critical but less-common services lack dedicated monitoring solutions. While you could use check_procs for basic checks, it often requires complex command-line arguments and lacks service-specific intelligence.
Here's a Python-based solution that provides more flexibility than the standard check_procs:
#!/usr/bin/env python3
import sys
import psutil
from optparse import OptionParser
def check_process(process_name, min_count=1, max_count=None, exact_match=False):
count = 0
for proc in psutil.process_iter(['name', 'cmdline']):
try:
if exact_match:
if proc.name() == process_name:
count += 1
else:
if process_name.lower() in ' '.join(proc.cmdline()).lower():
count += 1
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
if max_count is None:
max_count = min_count
if count < min_count:
print(f"CRITICAL: Only {count} {process_name} processes (needs at least {min_count})")
return 2
elif count > max_count:
print(f"WARNING: {count} {process_name} processes (should be at most {max_count})")
return 1
else:
print(f"OK: {count} {process_name} processes running")
return 0
if __name__ == "__main__":
parser = OptionParser()
parser.add_option("-p", "--process", dest="process_name", help="Process name to check")
parser.add_option("-m", "--min", dest="min_count", default=1, type="int", help="Minimum required processes")
parser.add_option("-M", "--max", dest="max_count", type="int", help="Maximum allowed processes")
parser.add_option("-e", "--exact", action="store_true", dest="exact_match", default=False, help="Exact process name match")
(options, args) = parser.parse_args()
if not options.process_name:
parser.error("Process name not specified")
sys.exit(check_process(
options.process_name,
options.min_count,
options.max_count,
options.exact_match
))
- Flexible matching: Can match by exact process name or search command lines
- Count thresholds: Set minimum and maximum allowed process counts
- Python-based: Easier to maintain and extend than shell scripts
- psutil library: More reliable than parsing
psoutput
Here are some example Nagios command definitions for common services:
# Check for at least 1 sendmail process
define command {
command_name check_sendmail_process
command_line /usr/local/nagios/libexec/check_process.py -p sendmail -e
}
# Check for exactly 3 clamd processes
define command {
command_name check_clamd_process
command_line /usr/local/nagios/libexec/check_process.py -p clamd -m 3 -M 3
}
# Check for samba processes (non-exact match)
define command {
command_name check_smbd_process
command_line /usr/local/nagios/libexec/check_process.py -p smbd -m 1
}
For environments with thousands of processes:
- Cache the process list if checking multiple services
- Consider running the check less frequently for non-critical services
- On Linux systems,
/procparsing might be faster than psutil
For production environments, you might also consider:
- Systemd-based checks using
systemctl is-active - Supervisor process monitoring
- Kernel audit system integration