This week I’ve been involved in an internal project to integrate HP Procurve Manager alerts in SCOM, between the various strategy I tried one was to use snmp traps. I had scarce experience in SCOM snmp modules so I had to check the documentation and other articles from the community. What I found is while the snmp discovery and monitor process is well documented, the documentation on snmp traps is scarce and many in the community keeps complaining about snmp traps not working. In this article I want to recap what I found, many of these points may seem obvious, but since I spent a few hours getting rid of this mess I think it is worth to nail them down:
- As you may suspect the SNMP service must be installed and started as well as the snmp trap service
- Contrarily on what you find on the web no particular configuration is needed. The snmp OS configurations are for incoming SNMP messages not for outgoing one, is the agent (in this case OpsMgr) that needs to take care of the proper settings (community, version and so on). Same story for the trap configuration, no specific configuration is needed, once again is the agent duty to set it properly
- If you want to get snmp traps the device must be discovered and the snmp trap sender IP address must match the discovered device address
This is all you need to do to set the SNMP environment for OpsMgr, but do not expect any help from OpsMgr itself. For example if the trap service is down you won’t get any error or warning message in the event log, so you better do your job right. And then there’s the caveat I spent my time on, the integration between OpsMgr agent and the snmp trap service:
- The snmp trap service must be running when the HealthService (HS) starts, if you start it after the HS you won’t get any trap. Big issue here on OS startup if the HS service initializes before the snmp trap service there’s no chance you get any trap.
- If you restart the snmp trap service (this can be due to a human intervation, service crashes, setup programs, and so on) while the HS is running you’ll stop receiving any trap. The handle becomes invalid but no error is recorded in the event log by OpsMgr.
This is a serious issue (immo) and the team should build a patch to check if the snmp receiver handle is valid and, if not, record an error event and then periodically retry to open a new handle. Until this the only rule I can give is: if you stop receiving snmp traps or if you need to recycle the snmp trap service always restart the HS.
I observed this behavior on Windows 2008 SP2 x64 and Windows 2003 SP2 x64 systems, using OpsMgr 2007 R2 CU2. For Windows 2003 systems, you may check KB article 982501 for another issue related to snmp monitoring.
This posting is provided "AS IS" with no warranties, and confers no rights.