By default, Nagios uses ICMP ping to determine host availability, which becomes problematic in environments where ICMP is blocked. This creates false "down" alerts despite servers being operational through other protocols.
Nagios supports multiple plugin-based checks for host availability:
# SSH check example
define command {
command_name check_ssh_alive
command_line $USER1$/check_ssh -H $HOSTADDRESS$ -t 30
}
# HTTP check example
define command {
command_name check_http_alive
command_line $USER1$/check_http -H $HOSTADDRESS$ -I $HOSTADDRESS$ -t 30
}
Modify your host definition in nagios.cfg:
define host {
host_name webserver01
alias Web Server
address 192.168.1.100
check_command check_ssh_alive # Or check_http_alive
max_check_attempts 3
check_interval 5
retry_interval 1
check_period 24x7
notification_interval 30
notification_period 24x7
notification_options d,u,r
contact_groups admins
}
For comprehensive monitoring, implement multiple check methods:
define service {
host_name webserver01
service_description SSH_Availability
check_command check_ssh_alive
check_interval 5
retry_interval 1
}
define service {
host_name webserver01
service_description HTTP_Availability
check_command check_http_alive
check_interval 5
retry_interval 1
}
When replacing ping checks:
- SSH checks add ~300ms overhead compared to ICMP
- HTTP checks typically complete within 500ms
- Adjust timeouts accordingly in check commands
Common issues and solutions:
# Verify plugin execution manually
/usr/lib/nagios/plugins/check_ssh -H 192.168.1.100
# Check Nagios debug logs
tail -f /var/log/nagios/nagios.debug
# Validate configuration
nagios -v /etc/nagios/nagios.cfg
When working in restricted network environments where ICMP/ping traffic is blocked, Nagios will incorrectly report servers as "down" despite them being operational. This occurs because Nagios defaults to using ping checks for basic host availability.
Nagios provides several robust alternatives to ICMP-based checks:
1. SSH-Based Host Alive Check
Create a custom check command in your Nagios configuration:
define command { command_name check_ssh_alive command_line $USER1$/check_ssh -H $HOSTADDRESS$ -p 22 -t 30 }
Then apply it to your host definition:
define host { use linux-server host_name webserver1 address 192.168.1.100 check_command check_ssh_alive max_check_attempts 3 ... }
2. HTTP/HTTPS Service Check
For web servers, HTTP checks are often more reliable than SSH:
define command { command_name check_http_alive command_line $USER1$/check_http -H $HOSTADDRESS$ -I $HOSTADDRESS$ -t 30 } define service { use generic-service host_name webserver1 service_description HTTP Availability check_command check_http_alive check_interval 5 retry_interval 1 }
For more complex scenarios, consider these approaches:
NRPE Checks: When direct SSH/HTTP access isn't possible, use NRPE for remote execution:
define command { command_name check_nrpe_alive command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_load }
Combined Checks: Implement multiple verification methods:
define service { use generic-service host_name appserver1 service_description Combined Alive Check check_command check_http_alive!check_ssh_alive check_interval 5 }
When replacing ping checks with service-based verification:
- Increase check timeouts (30-60 seconds instead of default 10)
- Adjust max_check_attempts to account for temporary service fluctuations
- Monitor your Nagios server's resource usage as these checks are more intensive
If checks still show incorrect status:
# Verify check execution manually: /usr/lib/nagios/plugins/check_http -H 192.168.1.100 # Check Nagios debug logs: tail -f /var/log/nagios/nagios.debug | grep HOSTNAME
Remember to reload Nagios after configuration changes:
systemctl reload nagios