When our server room AC failed last month, the temperature spike from 70°F to 90°F went unnoticed until server fans became audibly distressed. This incident revealed our monitoring gap - while we tracked individual server temperatures through IPMI, we lacked holistic room environmental monitoring.
An effective solution should:
- Measure ambient temperature at multiple room locations
- Support configurable threshold alerts (email/SMS/SNMP)
- Provide historical data for capacity planning
- Integrate with existing monitoring stacks
1. Raspberry Pi + Sensors (Budget DIY option):
# Python script for DHT22 sensor
import Adafruit_DHT
import smtplib
sensor = Adafruit_DHT.DHT22
pin = 4
humidity, temperature = Adafruit_DHT.read_retry(sensor, pin)
if temperature > 80: # Threshold in Fahrenheit
server = smtplib.SMTP('smtp.gmail.com', 587)
server.starttls()
server.login("monitor@example.com", "password")
msg = f"ALERT: Server room temp {temperature}°F"
server.sendmail("monitor@example.com", "admin@example.com", msg)
2. APC NetBotz (Enterprise solution):
- Supports multiple environmental sensors
- SNMP traps integration
- Web interface with historical graphs
While temperature is critical, consider adding:
Parameter | Why Monitor | Ideal Range |
---|---|---|
Humidity | Prevents static/condensation | 40-60% RH |
Water Detection | Early flood warning | Dry |
Airflow | Cooling efficiency | Varies by rack |
For Nagios users:
define command {
command_name check_room_temp
command_line /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ \
-o .1.3.6.1.4.1.318.1.1.10.2.3.1.1.2.1 -w 75 -c 85
}
Prometheus exporter configuration example:
scrape_configs:
- job_name: 'environment'
static_configs:
- targets: ['netbotz:9100']
- Set progressive alerts (75°F warning, 85°F critical)
- Include device location in alerts ("Rack A3 Ambient")
- Configure multiple notification channels (SMS for after-hours)
- Test alert delivery monthly
Last Thursday's AC failure taught me a hard lesson - traditional manual checks aren't reliable for critical infrastructure monitoring. When our server room hit 90°F (32°C), we narrowly avoided hardware damage only because someone noticed the fans' abnormal noise. This incident prompted me to build a robust monitoring solution.
After evaluating multiple options, I found these solutions most effective:
- Standalone Environmental Monitors: Devices like APC NetBotz or ITWatchDogs provide comprehensive monitoring (temperature, humidity, air flow)
- Raspberry Pi with Sensors: Cost-effective DIY solution using DHT22 or DS18B20 sensors
- Smart PDUs: Enterprise-grade units like Eaton or Raritan often include environmental monitoring
Here's a basic script I wrote to monitor temperature via a Raspberry Pi and DHT22 sensor:
import Adafruit_DHT
import smtplib
from time import sleep
# Sensor configuration
DHT_SENSOR = Adafruit_DHT.DHT22
DHT_PIN = 4
# Alert thresholds
TEMP_HIGH = 80 # Fahrenheit
CHECK_INTERVAL = 300 # 5 minutes
def send_alert(current_temp):
sender = 'monitoring@yourdomain.com'
receivers = ['admin@yourdomain.com']
message = f"""Subject: Server Room Temperature Alert
Critical temperature detected: {current_temp}°F
Immediate action required."""
try:
smtp_obj = smtplib.SMTP('localhost')
smtp_obj.sendmail(sender, receivers, message)
except SMTPException:
print("Error: unable to send email")
while True:
humidity, temperature = Adafruit_DHT.read_retry(DHT_SENSOR, DHT_PIN)
# Convert Celsius to Fahrenheit if needed
temp_f = temperature * 9/5 + 32 if temperature else None
if temp_f and temp_f > TEMP_HIGH:
send_alert(temp_f)
sleep(CHECK_INTERVAL)
For larger setups, SNMP provides better integration with existing monitoring systems. Most environmental sensors support SNMP traps. Here's a sample Nagios configuration to monitor temperature:
define service{
use generic-service
host_name server-room-sensor
service_description Temperature
check_command check_snmp!-o .1.3.6.1.4.1.17373.4.1.2.1.4.1 -w 75 -c 85 -l "Temperature"
}
While temperature is primary, these factors also need monitoring:
Metric | Ideal Range | Monitoring Method |
---|---|---|
Humidity | 40-60% RH | Hygrometer (often combined with temp sensors) |
Airflow | Positive pressure | Differential pressure sensors |
Water Detection | None | Leak detection strips |
Effective alerting requires multiple channels:
- Email/SMS for primary notifications
- Slack/Teams Webhooks for team awareness
- SNMP Traps for integration with NMS
- Local Alarms for on-site personnel
Here's a Python snippet for sending alerts to Slack:
import requests
import json
def slack_alert(message):
webhook_url = 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
slack_data = {'text': message}
response = requests.post(
webhook_url, data=json.dumps(slack_data),
headers={'Content-Type': 'application/json'}
)
if response.status_code != 200:
raise ValueError(
f'Slack request failed with error {response.status_code}, {response.text}'
)
In implementing this across multiple server rooms, I've learned:
- Always have redundant sensors - single points of failure defeat the purpose
- Place sensors at different heights - heat rises, creating microclimates
- Monitor intake and exhaust separately - helps diagnose airflow issues
- Test alerting monthly - notifications that don't work are worse than no notifications