Are your snmp traps working?


This week I’ve been involved in an internal project to integrate HP Procurve Manager alerts in SCOM, between the various strategy I tried one was to use snmp traps. I had scarce experience in SCOM snmp modules so I had to check the documentation and other articles from the community. What I found is while the snmp discovery and monitor process is well documented, the documentation on snmp traps is scarce and many in the community keeps complaining about snmp traps not working. In this article I want to recap what I found, many of these points may seem obvious, but since I spent a few hours getting rid of this mess I think it is worth to nail them down:

  1. As you may suspect the SNMP service must be installed and started as well as the snmp trap service
  2. Contrarily on what you find on the web no particular configuration is needed. The snmp OS configurations are for incoming SNMP messages not for outgoing one, is the agent (in this case OpsMgr) that needs to take care of the proper settings (community, version and so on). Same story for the trap configuration, no specific configuration is needed, once again is the agent duty to set it properly
  3. If you want to get snmp traps the device must be discovered and the snmp trap sender IP address must match the discovered device address

This is all you need to do to set the SNMP environment for OpsMgr, but do not expect any help from OpsMgr itself. For example if the trap service is down you won’t get any error or warning message in the event log, so you better do your job right. And then there’s the caveat I spent my time on, the integration between OpsMgr agent and the snmp trap service:

  1. The snmp trap service must be running when the HealthService (HS) starts, if you start it after the HS you won’t get any trap. Big issue here on OS startup if the HS service initializes before the snmp trap service there’s no chance you get any trap.
  2. If you restart the snmp trap service  (this can be due to a human intervation, service crashes, setup programs, and so on) while the HS is running you’ll stop receiving any trap. The handle becomes invalid but no error is recorded in the event log by OpsMgr.

This is a serious issue (immo) and the team should build a patch to check if the snmp receiver handle is valid and, if not, record an error event and then periodically retry to open a new handle. Until this the only rule I can give is: if you stop receiving snmp traps or if you need to recycle the snmp trap service always restart the HS.

I observed this behavior on Windows 2008 SP2 x64 and Windows 2003 SP2 x64 systems, using OpsMgr 2007 R2 CU2. For Windows 2003 systems, you may check KB article 982501 for another issue related to snmp monitoring.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

About these ads
  1. #1 by Gordon Fecyk on November 26, 2013 - 3:03 pm

    Is it enough to make the HealthService service depend on the SNMPTRAP service? There’s no handy configuration tool for this, but one can edit a service’s Registry value “DependOnService” to add a list of services that have to start before this one will start. It will then show up in the service’s Dependencies tab.

  2. #2 by Dave Murphy on June 19, 2012 - 5:09 pm

    Are you running SNMP on the actual management server or no? Referring to your first point up above. I can use wireshark to see the traps coming into the SCOM Management Server but cannot seem to get the server to acknowledge or process them. Wondering if not having the SNMP services running on the MS are part of the problem.

    • #3 by Daniele Grandini on June 19, 2012 - 5:31 pm

      Hi Dave, sorry but I don’t understand your question.
      What I try to descrive in this post is the dependency between the OpsMgr SNMP module and the OS snmp and trap services. There are situation where the OpsMgr snmp module ends up with an invalid handle and stops processing traps. To avoid this situation you should set a dependency to the SNMP services for the Health Service.
      Am I missing anything?

  3. #4 by Chris on September 16, 2010 - 2:59 pm

    This was very helpful, thanks!

    Another caveat when working with SNMP Traps is the trap must arrive at the MS, Gateway, or Proxy Agent which is currently handling that device.

    One of the nice features of SCOM, is you can move devices from one MS, Gateway, Proxy Agent to another to balance load, or add new servers when needed.

    With SNMP Traps this causes a problem, because you need the trap to be sent to the server currently monitoring the device sending the traps.

    The way around this is a small program called samplicator, which will receive a UDP stream on an interface, and replicate that stream to multiple destination hosts, while spoofing the source ip address. This allows you to send all traps to all SCOM systems monitoring your network infrastructure, and enables you to move devices around on your servers, and or have “failover” without losing your trap functionality.

    This allows you to scale your SNMP Monitoring infrastructure, without having to worry about which network device is being monitored by which SCOM server.

    -Chris

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 310 other followers

%d bloggers like this: