When an APC Smart-UPS with management card (AP9617 in this case) detects a battery issue during self-test, it enters an aggressive notification loop that can flood your inbox. The key indicators from syslog show:
Dec 27 21:19:10 10.16.15.50 UPS: Started a self-test. 0x0137
Dec 27 21:19:12 10.16.15.50 UPS: At least one faulty battery exists. 0x0119
Dec 27 21:19:28 10.16.15.50 UPS: Failed a self-test. 0x0106
The management card maintains an "unacknowledged alarm" state until either:
- The physical device is inspected and reset
- The battery condition improves (unlikely without replacement)
- The notification threshold is modified programmatically
Here are three technical approaches to stop the email bombardment:
Method 1: SNMP Command Reset
Use net-snmp tools to clear the alarm flag:
snmpset -v1 -c private 10.16.15.50 \
1.3.6.1.4.1.318.1.1.1.2.2.1.0 i 1
This sends a "UPS test acknowledge" command (OID 1.3.6.1.4.1.318.1.1.1.2.2.1)
Method 2: APC Web Interface API
For newer cards with web interfaces, you can script a reset:
import requests
auth = ('apc', 'apc')
url = 'http://10.16.15.50/Forms/ups_alarm_1'
data = {'AlarmAcknowledge': 'Acknowledge'}
response = requests.post(url, data=data, auth=auth)
print(response.status_code)
Method 3: Email Filter Rule
As temporary mitigation, create a server-side filter for messages containing:
Subject: "UPS Alarm: Battery Fault"
X-Apc-Event-Code: 0x0119
Modify the notification thresholds in the configuration file (typically apcupsd.conf):
# Change from immediate to hourly notifications
EVENTSFILE /etc/apcupsd/apcevents.critical
EVENTSFILEMAX 10
MININTERVAL 3600
To verify the current alarm state without resetting:
snmpwalk -v1 -c public 10.16.15.50 1.3.6.1.4.1.318.1.1.1.2.2.3
Look for these critical OIDs:
- 1.3.6.1.4.1.318.1.1.1.2.2.3.0 (Battery condition)
- 1.3.6.1.4.1.318.1.1.1.2.2.4.0 (Test results)
When an APC Smart-UPS 3000 with AP9617 management card fails its weekly self-test (especially with battery issues), it can trigger relentless email notifications. The logs show:
Dec 27 21:19:10 10.16.15.50 UPS: Started a self-test. 0x0137
Dec 27 21:19:12 10.16.15.50 UPS: At least one faulty battery exists. 0x0119
Dec 27 21:19:28 10.16.15.50 UPS: Failed a self-test. 0x0106
The management card considers failed battery tests as persistent critical events. Until either:
- The physical battery is replaced
- The alert condition is manually cleared
- The notification threshold is modified
Method 1: SNMP Command Reset
Use this Linux command to acknowledge alerts (requires snmpwalk
installed):
snmpset -v1 -c private 10.16.15.50 1.3.6.1.4.1.318.2.1.1.7.2.3.0 i 6
Method 2: APC Web Interface
Navigate to: http://[UPS_IP]/cgi-bin/alert_ack.cgi
and check all alert types to acknowledge.
To prevent future floods while keeping monitoring active:
# Configure email throttling via SNMP
snmpset -v1 -c private 10.16.15.50 \
1.3.6.1.4.1.318.1.1.1.7.2.1.0 i 3600 \ # Minimum alert interval (seconds)
1.3.6.1.4.1.318.1.1.1.7.2.3.0 i 3 # Change severity threshold (3=warning+)
For systems where you can't immediately modify UPS settings, create a server-side filter. Example Postfix header check:
/etc/postfix/header_checks:
/^Subject:.*(Self-Test Failed|Faulty Battery)/ DROP
Check current alert status with:
snmpwalk -v1 -c public 10.16.15.50 1.3.6.1.4.1.318.1.1.1.7.2
Look for alertPending = 0
in the output.