When deploying network monitoring at enterprise scale (5,000+ devices), SNMP trap handling becomes critical infrastructure. The core requirements break down into:
- Single entry point for all network devices (HA preferred)
- Dynamic routing based on source IP ranges
- UDP-specific load distribution capabilities
- Optional packet mirroring for secondary processing
Through testing several approaches, here's what we found:
Option 1: iptables DNAT Routing
Our current production implementation uses Linux netfilter rules:
# Site ABC devices → Processing server 10.1.2.3
iptables -t nat -A PREROUTING -p udp --dport 162 \
-s 10.0.0.0/19 -j DNAT --to-destination 10.1.2.3
# Enable packet mirroring (requires tee module)
iptables -t mangle -A PREROUTING -p udp --dport 162 \
-s 10.0.33.0/21 -j TEE --gateway 10.1.2.4
Pros: Near-line-rate performance (handles 50k+ traps/sec on modest hardware)
Cons: Limited to L3/L4 filtering, no application-layer inspection
Option 2: Custom snmptrapd Handler
A Python-based alternative for deeper inspection:
import subprocess
from pysnmp.hlapi import *
def cbFun(snmpEngine, stateReference, contextEngineId, contextName,
varBinds, cbCtx):
src_ip = snmpEngine.msgAndPduDsp.getTransportInfo(stateReference)[0]
if src_ip.startswith('10.0.'):
subprocess.run(['snmptrap', '-v2c', '-cpublic',
'10.1.2.3', '1.3.6.1.4.1.0', *varBinds])
elif src_ip.startswith('10.1.'):
# Mirror to two destinations
subprocess.run(['snmptrap', ...], check=False)
subprocess.run(['snmptrap', ...], check=False)
For complex deployments, consider these specialized solutions:
HAProxy UDP Configuration
frontend snmp_traps
bind :162
mode udp
default_backend trap_processors
backend trap_processors
mode udp
balance source
server s1 10.1.2.3:162 check
server s2 10.3.2.1:162 check
Note: Requires HAProxy 2.0+ for full UDP support
Commercial Load Balancers
F5 BIG-IP configurations should include:
- UDP profile with SNMP protocol support
- Persistence based on source IP
- iRule scripting for advanced routing logic
For most large deployments, we recommend a hybrid approach:
- Use iptables for initial fan-out to regional collectors
- Implement HAProxy for final distribution to processing nodes
- Consider Kafka or RabbitMQ for durable queueing when processing requires persistence
Performance metrics from our 5k-device deployment:
Solution | Throughput | Latency | CPU Usage |
---|---|---|---|
iptables | 58k traps/sec | 0.3ms | 12% |
HAProxy | 32k traps/sec | 1.2ms | 45% |
Custom Handler | 8k traps/sec | 15ms | 78% |
When dealing with 5000+ network devices, traditional SNMP trap handling approaches quickly become bottlenecks. The fundamental requirements break down into:
- Centralized trap collection point with HA capability
- Intelligent routing based on source IP/subnet
- Horizontal scaling for back-end processing
- Optional packet duplication for analytics/mirroring
Let's examine the technical trade-offs of each approach you've considered:
// IPTables DNAT Example (tested solution)
iptables -t nat -A PREROUTING -p udp --dport 162 \
-s 10.0.0.0/19 -j DNAT --to-destination 10.1.2.3
Pros: Kernel-level performance, minimal latency
Cons: Limited to L3/L4 filtering, no payload inspection
For maximal flexibility, consider a custom router using Go's concurrency primitives:
package main
import (
"log"
"net"
"sync"
)
func forward(src net.Conn, dest string, wg *sync.WaitGroup) {
defer wg.Done()
dst, _ := net.Dial("udp", dest)
buf := make([]byte, 65507)
for {
n, _ := src.Read(buf)
// Add source-based routing logic here
if shouldMirror(src.RemoteAddr()) {
go mirrorPacket(buf[:n])
}
dst.Write(buf[:n])
}
}
func main() {
ln, _ := net.ListenPacket("udp", ":162")
var wg sync.WaitGroup
wg.Add(1)
go forward(ln, "10.1.2.3:162", &wg)
wg.Wait()
}
Solution | Pros | Cons |
---|---|---|
F5 BIG-IP | Hardware-accelerated, GUI config | Expensive, proprietary |
HAProxy | TCP/UDP support, OSS | Steep learning curve |
Telegraf | Plugin architecture | Needs custom dev |
For your scale, I recommend this hybrid approach:
- Frontend: Keepalived + IPTables DNAT (for raw throughput)
- Middleware: Custom Go router (for complex routing rules)
- Backend: Kafka queue (for decoupling processors)
# Keepalived config snippet
virtual_server 10.0.0.10 162 {
protocol UDP
real_server 10.1.2.3 162 {
weight 1
}
}