When managing enterprise infrastructure, ensuring critical services like sendmail, xinetd, automount, and samba remain operational is paramount. While these services are generally stable, unexpected crashes can occur, making monitoring essential.
Nagios Core comes with a powerful generic plugin called check_procs
that can monitor any running process. The basic syntax is:
define command { command_name check_procs command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -C $ARG3$ }
Here's how to configure service checks for various critical processes:
Monitoring Sendmail
define service { use generic-service host_name mail-server service_description Sendmail Process check_command check_procs!1:1!1!sendmail }
Checking Samba Services
define service { use generic-service host_name file-server service_description Samba Process check_command check_procs!1:1!1!smbd }
For more complex scenarios, you can combine multiple checks:
# Check OpenVPN with exact process count define command { command_name check_openvpn command_line $USER1$/check_procs -w 1:1 -c 1 -C openvpn } # Check ClamAV with argument matching define command { command_name check_clamd command_line $USER1$/check_procs -w 1:1 -c 1 -a '/usr/sbin/clamd' }
For services that spawn multiple processes (like xinetd):
define service { use generic-service host_name network-server service_description Xinetd Processes check_command check_procs!3:5!1:2!xinetd }
For services not easily detectable via process name, consider writing custom plugins:
#!/bin/bash # check_mcafee.sh if pgrep -x "uvscan" >/dev/null then echo "OK: McAfee processes running" exit 0 else echo "CRITICAL: McAfee not running" exit 2 fi
When monitoring numerous processes, consider:
- Combining related checks into single service definitions
- Using process group monitoring instead of individual checks
- Adjusting check intervals based on service criticality
When managing critical infrastructure, simply checking if a service is listening on a port isn't always sufficient. Many essential services like sendmail, xinetd, or openvpn can appear "up" while their actual processing components might have crashed. This is where process-level monitoring becomes crucial.
The Nagios Exchange contains specialized plugins for popular services, but many critical but less-common services lack dedicated monitoring solutions. While you could use check_procs
for basic checks, it often requires complex command-line arguments and lacks service-specific intelligence.
Here's a Python-based solution that provides more flexibility than the standard check_procs
:
#!/usr/bin/env python3
import sys
import psutil
from optparse import OptionParser
def check_process(process_name, min_count=1, max_count=None, exact_match=False):
count = 0
for proc in psutil.process_iter(['name', 'cmdline']):
try:
if exact_match:
if proc.name() == process_name:
count += 1
else:
if process_name.lower() in ' '.join(proc.cmdline()).lower():
count += 1
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
if max_count is None:
max_count = min_count
if count < min_count:
print(f"CRITICAL: Only {count} {process_name} processes (needs at least {min_count})")
return 2
elif count > max_count:
print(f"WARNING: {count} {process_name} processes (should be at most {max_count})")
return 1
else:
print(f"OK: {count} {process_name} processes running")
return 0
if __name__ == "__main__":
parser = OptionParser()
parser.add_option("-p", "--process", dest="process_name", help="Process name to check")
parser.add_option("-m", "--min", dest="min_count", default=1, type="int", help="Minimum required processes")
parser.add_option("-M", "--max", dest="max_count", type="int", help="Maximum allowed processes")
parser.add_option("-e", "--exact", action="store_true", dest="exact_match", default=False, help="Exact process name match")
(options, args) = parser.parse_args()
if not options.process_name:
parser.error("Process name not specified")
sys.exit(check_process(
options.process_name,
options.min_count,
options.max_count,
options.exact_match
))
- Flexible matching: Can match by exact process name or search command lines
- Count thresholds: Set minimum and maximum allowed process counts
- Python-based: Easier to maintain and extend than shell scripts
- psutil library: More reliable than parsing
ps
output
Here are some example Nagios command definitions for common services:
# Check for at least 1 sendmail process
define command {
command_name check_sendmail_process
command_line /usr/local/nagios/libexec/check_process.py -p sendmail -e
}
# Check for exactly 3 clamd processes
define command {
command_name check_clamd_process
command_line /usr/local/nagios/libexec/check_process.py -p clamd -m 3 -M 3
}
# Check for samba processes (non-exact match)
define command {
command_name check_smbd_process
command_line /usr/local/nagios/libexec/check_process.py -p smbd -m 1
}
For environments with thousands of processes:
- Cache the process list if checking multiple services
- Consider running the check less frequently for non-critical services
- On Linux systems,
/proc
parsing might be faster than psutil
For production environments, you might also consider:
- Systemd-based checks using
systemctl is-active
- Supervisor process monitoring
- Kernel audit system integration