Your monitoring setup consists of:
- 42 services across 8 hosts
- Primary check type: check_http (5-min and 1-min intervals)
- Concurrent Cacti operations: 6 hosts polled every minute
With 400MB RAM being the critical constraint, consider these metrics:
# Sample check_http memory usage (per process) $ ps -o rss= -p $(pgrep -f check_http) | awk '{sum+=$1} END {print sum/NR " kB"}' 1264 kB # Total concurrent processes during peak $ nagiosenv check_load -w 5 -c 10
Based on empirical data from production deployments:
Hardware | Service Capacity | Recommended Interval |
---|---|---|
Single-core 2GHz + 1GB RAM | ~80 checks | 5-minute baseline |
Dual-core 2GHz + 2GB RAM | 150-200 checks | 1-minute for critical services |
Implement these tweaks before hardware upgrades:
# /etc/nagios/nagios.cfg max_concurrent_checks=12 check_result_reaper_frequency=2 service_check_timeout=30 enable_environment_macros=0 # Distributed monitoring example: define host { host_name remote_satellite address 192.168.1.100 check_command check_nrpe!load_satellite }
For environments exceeding 50+ checks:
- NRPE Satellites:
# Sample NRPE configuration command[check_http]=/usr/lib/nagios/plugins/check_http -I $ARG1$ -u $ARG2$
- Modular Setup:
# Load distribution with NSCA /usr/sbin/send_nsca -H nagios_master -c /etc/nagios/send_nsca.cfg
Priority order for resource allocation:
- RAM upgrade to 2GB (immediate 400% capacity increase)
- SSD storage for check result processing
- Additional CPU cores for concurrent check processing
Your Nagios setup (2.0GHz CPU, 400MB RAM, RAID10) monitoring 42 services across 8 hosts with 5-minute intervals (some at 1-minute) plus Cacti polling 6 hosts every minute is pushing current hardware limits. Typical load averages above 4-6 indicate resource contention.
# Example check_interval impact analysis: Hosts: 8 | Services: 42 1-minute checks: 12 services → 720 checks/hour 5-minute checks: 30 services → 360 checks/hour Total: 1080 checks/hour + Cacti (6 hosts × 60 = 360 polls/hour)
Try these nagios.cfg tweaks before hardware upgrades:
# Increase check processing parallelism max_concurrent_checks=50 service_check_timeout=30 check_result_reaper_frequency=2 # Reduce disk I/O use_retained_program_state=1 interval_length=60
For hardware-constrained environments, consider NSCA passive checks:
# On remote host: define command { command_name submit_to_nsca command_line /usr/lib/nagios/plugins/check_dummy 0 "Result submitted" | /usr/sbin/send_nsca -H nagios.server -c /etc/nagios/send_nsca.cfg }
If optimizations fail, prioritize upgrades in this order:
- RAM: 400MB → 2GB (reduces swap thrashing)
- CPU: 2.0GHz → 3.0GHz+ multi-core (helps parallel checks)
- Storage: RAID10 SSD (faster check result processing)
A typical Nagios server handling 100+ services would have:
- 4GB RAM - 4 CPU cores @ 2.5GHz+ - SSD storage - Load averages between 1.5-3.0
Use this bash script to simulate check loads:
#!/bin/bash for i in {1..100}; do /usr/lib/nagios/plugins/check_dummy 0 "Test $i" & sleep 0.1 done wait echo "Load test completed"