Best Server Room Temperature Monitoring Solutions with Alert Notifications (Email/SNMP) for DevOps Teams


1 views

When our server room AC failed last month, the temperature spike from 70°F to 90°F went unnoticed until server fans became audibly distressed. This incident revealed our monitoring gap - while we tracked individual server temperatures through IPMI, we lacked holistic room environmental monitoring.

An effective solution should:

  • Measure ambient temperature at multiple room locations
  • Support configurable threshold alerts (email/SMS/SNMP)
  • Provide historical data for capacity planning
  • Integrate with existing monitoring stacks

1. Raspberry Pi + Sensors (Budget DIY option):

# Python script for DHT22 sensor
import Adafruit_DHT
import smtplib

sensor = Adafruit_DHT.DHT22
pin = 4
humidity, temperature = Adafruit_DHT.read_retry(sensor, pin)

if temperature > 80:  # Threshold in Fahrenheit
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.starttls()
    server.login("monitor@example.com", "password")
    msg = f"ALERT: Server room temp {temperature}°F"
    server.sendmail("monitor@example.com", "admin@example.com", msg)

2. APC NetBotz (Enterprise solution):

  • Supports multiple environmental sensors
  • SNMP traps integration
  • Web interface with historical graphs

While temperature is critical, consider adding:

Parameter Why Monitor Ideal Range
Humidity Prevents static/condensation 40-60% RH
Water Detection Early flood warning Dry
Airflow Cooling efficiency Varies by rack

For Nagios users:

define command {
    command_name check_room_temp
    command_line /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ \
    -o .1.3.6.1.4.1.318.1.1.10.2.3.1.1.2.1 -w 75 -c 85
}

Prometheus exporter configuration example:

scrape_configs:
  - job_name: 'environment'
    static_configs:
      - targets: ['netbotz:9100']
  • Set progressive alerts (75°F warning, 85°F critical)
  • Include device location in alerts ("Rack A3 Ambient")
  • Configure multiple notification channels (SMS for after-hours)
  • Test alert delivery monthly

Last Thursday's AC failure taught me a hard lesson - traditional manual checks aren't reliable for critical infrastructure monitoring. When our server room hit 90°F (32°C), we narrowly avoided hardware damage only because someone noticed the fans' abnormal noise. This incident prompted me to build a robust monitoring solution.

After evaluating multiple options, I found these solutions most effective:

  • Standalone Environmental Monitors: Devices like APC NetBotz or ITWatchDogs provide comprehensive monitoring (temperature, humidity, air flow)
  • Raspberry Pi with Sensors: Cost-effective DIY solution using DHT22 or DS18B20 sensors
  • Smart PDUs: Enterprise-grade units like Eaton or Raritan often include environmental monitoring

Here's a basic script I wrote to monitor temperature via a Raspberry Pi and DHT22 sensor:

import Adafruit_DHT
import smtplib
from time import sleep

# Sensor configuration
DHT_SENSOR = Adafruit_DHT.DHT22
DHT_PIN = 4

# Alert thresholds
TEMP_HIGH = 80  # Fahrenheit
CHECK_INTERVAL = 300  # 5 minutes

def send_alert(current_temp):
    sender = 'monitoring@yourdomain.com'
    receivers = ['admin@yourdomain.com']
    message = f"""Subject: Server Room Temperature Alert
    
Critical temperature detected: {current_temp}°F
Immediate action required."""

    try:
        smtp_obj = smtplib.SMTP('localhost')
        smtp_obj.sendmail(sender, receivers, message)
    except SMTPException:
        print("Error: unable to send email")

while True:
    humidity, temperature = Adafruit_DHT.read_retry(DHT_SENSOR, DHT_PIN)
    
    # Convert Celsius to Fahrenheit if needed
    temp_f = temperature * 9/5 + 32 if temperature else None
    
    if temp_f and temp_f > TEMP_HIGH:
        send_alert(temp_f)
    
    sleep(CHECK_INTERVAL)

For larger setups, SNMP provides better integration with existing monitoring systems. Most environmental sensors support SNMP traps. Here's a sample Nagios configuration to monitor temperature:

define service{
    use                  generic-service
    host_name            server-room-sensor
    service_description  Temperature
    check_command        check_snmp!-o .1.3.6.1.4.1.17373.4.1.2.1.4.1 -w 75 -c 85 -l "Temperature"
}

While temperature is primary, these factors also need monitoring:

Metric Ideal Range Monitoring Method
Humidity 40-60% RH Hygrometer (often combined with temp sensors)
Airflow Positive pressure Differential pressure sensors
Water Detection None Leak detection strips

Effective alerting requires multiple channels:

  1. Email/SMS for primary notifications
  2. Slack/Teams Webhooks for team awareness
  3. SNMP Traps for integration with NMS
  4. Local Alarms for on-site personnel

Here's a Python snippet for sending alerts to Slack:

import requests
import json

def slack_alert(message):
    webhook_url = 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
    slack_data = {'text': message}
    
    response = requests.post(
        webhook_url, data=json.dumps(slack_data),
        headers={'Content-Type': 'application/json'}
    )
    
    if response.status_code != 200:
        raise ValueError(
            f'Slack request failed with error {response.status_code}, {response.text}'
        )

In implementing this across multiple server rooms, I've learned:

  • Always have redundant sensors - single points of failure defeat the purpose
  • Place sensors at different heights - heat rises, creating microclimates
  • Monitor intake and exhaust separately - helps diagnose airflow issues
  • Test alerting monthly - notifications that don't work are worse than no notifications