How to Stop APC Management Card from Spamming Battery Alert Emails Programmatically


10 views

When an APC Smart-UPS with management card (AP9617 in this case) detects a battery issue during self-test, it enters an aggressive notification loop that can flood your inbox. The key indicators from syslog show:

Dec 27 21:19:10 10.16.15.50 UPS: Started a self-test. 0x0137
Dec 27 21:19:12 10.16.15.50 UPS: At least one faulty battery exists. 0x0119
Dec 27 21:19:28 10.16.15.50 UPS: Failed a self-test. 0x0106

The management card maintains an "unacknowledged alarm" state until either:

  • The physical device is inspected and reset
  • The battery condition improves (unlikely without replacement)
  • The notification threshold is modified programmatically

Here are three technical approaches to stop the email bombardment:

Method 1: SNMP Command Reset

Use net-snmp tools to clear the alarm flag:

snmpset -v1 -c private 10.16.15.50 \
1.3.6.1.4.1.318.1.1.1.2.2.1.0 i 1

This sends a "UPS test acknowledge" command (OID 1.3.6.1.4.1.318.1.1.1.2.2.1)

Method 2: APC Web Interface API

For newer cards with web interfaces, you can script a reset:

import requests

auth = ('apc', 'apc')
url = 'http://10.16.15.50/Forms/ups_alarm_1'
data = {'AlarmAcknowledge': 'Acknowledge'}

response = requests.post(url, data=data, auth=auth)
print(response.status_code)

Method 3: Email Filter Rule

As temporary mitigation, create a server-side filter for messages containing:

Subject: "UPS Alarm: Battery Fault"
X-Apc-Event-Code: 0x0119

Modify the notification thresholds in the configuration file (typically apcupsd.conf):

# Change from immediate to hourly notifications
EVENTSFILE /etc/apcupsd/apcevents.critical
EVENTSFILEMAX 10
MININTERVAL 3600

To verify the current alarm state without resetting:

snmpwalk -v1 -c public 10.16.15.50 1.3.6.1.4.1.318.1.1.1.2.2.3

Look for these critical OIDs:

  • 1.3.6.1.4.1.318.1.1.1.2.2.3.0 (Battery condition)
  • 1.3.6.1.4.1.318.1.1.1.2.2.4.0 (Test results)

When an APC Smart-UPS 3000 with AP9617 management card fails its weekly self-test (especially with battery issues), it can trigger relentless email notifications. The logs show:

Dec 27 21:19:10 10.16.15.50 UPS: Started a self-test. 0x0137
Dec 27 21:19:12 10.16.15.50 UPS: At least one faulty battery exists. 0x0119
Dec 27 21:19:28 10.16.15.50 UPS: Failed a self-test. 0x0106

The management card considers failed battery tests as persistent critical events. Until either:

  • The physical battery is replaced
  • The alert condition is manually cleared
  • The notification threshold is modified

Method 1: SNMP Command Reset
Use this Linux command to acknowledge alerts (requires snmpwalk installed):

snmpset -v1 -c private 10.16.15.50 1.3.6.1.4.1.318.2.1.1.7.2.3.0 i 6

Method 2: APC Web Interface
Navigate to: http://[UPS_IP]/cgi-bin/alert_ack.cgi and check all alert types to acknowledge.

To prevent future floods while keeping monitoring active:

# Configure email throttling via SNMP
snmpset -v1 -c private 10.16.15.50 \
1.3.6.1.4.1.318.1.1.1.7.2.1.0 i 3600 \  # Minimum alert interval (seconds)
1.3.6.1.4.1.318.1.1.1.7.2.3.0 i 3     # Change severity threshold (3=warning+)

For systems where you can't immediately modify UPS settings, create a server-side filter. Example Postfix header check:

/etc/postfix/header_checks:
/^Subject:.*(Self-Test Failed|Faulty Battery)/ DROP

Check current alert status with:

snmpwalk -v1 -c public 10.16.15.50 1.3.6.1.4.1.318.1.1.1.7.2

Look for alertPending = 0 in the output.