Your monitoring setup consists of:
- 42 services across 8 hosts
- Primary check type: check_http (5-min and 1-min intervals)
- Concurrent Cacti operations: 6 hosts polled every minute
With 400MB RAM being the critical constraint, consider these metrics:
# Sample check_http memory usage (per process)
$ ps -o rss= -p $(pgrep -f check_http) | awk '{sum+=$1} END {print sum/NR " kB"}'
1264 kB
# Total concurrent processes during peak
$ nagiosenv check_load -w 5 -c 10
Based on empirical data from production deployments:
| Hardware | Service Capacity | Recommended Interval |
|---|---|---|
| Single-core 2GHz + 1GB RAM | ~80 checks | 5-minute baseline |
| Dual-core 2GHz + 2GB RAM | 150-200 checks | 1-minute for critical services |
Implement these tweaks before hardware upgrades:
# /etc/nagios/nagios.cfg
max_concurrent_checks=12
check_result_reaper_frequency=2
service_check_timeout=30
enable_environment_macros=0
# Distributed monitoring example:
define host {
host_name remote_satellite
address 192.168.1.100
check_command check_nrpe!load_satellite
}
For environments exceeding 50+ checks:
- NRPE Satellites:
# Sample NRPE configuration command[check_http]=/usr/lib/nagios/plugins/check_http -I $ARG1$ -u $ARG2$
- Modular Setup:
# Load distribution with NSCA /usr/sbin/send_nsca -H nagios_master -c /etc/nagios/send_nsca.cfg
Priority order for resource allocation:
- RAM upgrade to 2GB (immediate 400% capacity increase)
- SSD storage for check result processing
- Additional CPU cores for concurrent check processing
Your Nagios setup (2.0GHz CPU, 400MB RAM, RAID10) monitoring 42 services across 8 hosts with 5-minute intervals (some at 1-minute) plus Cacti polling 6 hosts every minute is pushing current hardware limits. Typical load averages above 4-6 indicate resource contention.
# Example check_interval impact analysis: Hosts: 8 | Services: 42 1-minute checks: 12 services → 720 checks/hour 5-minute checks: 30 services → 360 checks/hour Total: 1080 checks/hour + Cacti (6 hosts × 60 = 360 polls/hour)
Try these nagios.cfg tweaks before hardware upgrades:
# Increase check processing parallelism max_concurrent_checks=50 service_check_timeout=30 check_result_reaper_frequency=2 # Reduce disk I/O use_retained_program_state=1 interval_length=60
For hardware-constrained environments, consider NSCA passive checks:
# On remote host:
define command {
command_name submit_to_nsca
command_line /usr/lib/nagios/plugins/check_dummy 0 "Result submitted" | /usr/sbin/send_nsca -H nagios.server -c /etc/nagios/send_nsca.cfg
}
If optimizations fail, prioritize upgrades in this order:
- RAM: 400MB → 2GB (reduces swap thrashing)
- CPU: 2.0GHz → 3.0GHz+ multi-core (helps parallel checks)
- Storage: RAID10 SSD (faster check result processing)
A typical Nagios server handling 100+ services would have:
- 4GB RAM - 4 CPU cores @ 2.5GHz+ - SSD storage - Load averages between 1.5-3.0
Use this bash script to simulate check loads:
#!/bin/bash
for i in {1..100}; do
/usr/lib/nagios/plugins/check_dummy 0 "Test $i" &
sleep 0.1
done
wait
echo "Load test completed"