Network discovery reference (#sysctr #scom)


This is another reference post I hope to keep updated with new information as soon as I learn more on the subject. The post topic is: Operations Manager network discovery.

Fundamentals

There are a few rules you need to keep in mind for network discovery:

·         Network discovery must be run from a management server (gateways are management servers)

·         Just one network discovery rule per management server is allowed.

·         Network discovery can be recursive or explicit. Recursive network discovery takes one or more seed devices and tries to find connected devices from the ARP tables of the seeds; in explicit discovery, every device must be defined in the discovery rule. Recursive discovery doesn’t work for SNMPv3 devices.

·         Network discovery can use SNMP, ICMP or both. If you want to have the device response time dashboard widget loaded with data, ICMP must be included in discovery.

·         Network discovery, by default, can be triggered by changes in the monitored devices. This can be a bless or a curse, you just need to know this happens. More info in Notes from the field.

·         To make it possible for OpsMgr to build the relationships between the network devices and the connected servers the OS management packs must be imported and NIC discovery must be enabled (it is by default). AFAIK it correlates the MAC address of the servers NICs to the MAC address found on the device port. (didn’t find any explicit documentation on this yet)

·         Device are named in OpsMgr using DNS resolution on the following items in the order listed, the first one to succeed wins 1) Loopback IP 2) sysName 3) Public IP 4) Private IP 5)SNMP Agent IP. You can force the use of sysName modifying the discovery configuration file, not sure if it’s supported (http://blogs.inframon.com/post/2012/07/15/How-to-use-the-MIB2-System-Name-for-a-device-in-SCOM-2012.aspx)

·         Network discovery rules are persisted in an unsigned management pack Microsoft.SystemCenter.NetworkDiscovery.Internal

·         Operations Manager specific firewall rules must be enabled (see Troubleshooting)

·         We can basically have two discovery actions:

-          Full discovery, that is a manual or scheduled triggered discovery

-          Limited discovery, that is a single device or limited devices discovery triggered by a manual single device rediscover task or by an SNMP trap.

Troubleshooting

The first level of information gathering for troubleshooting is done via the eventlog on the discovering management server, the whole process is well traced. You can also follow the discovery events from the OpsMgr console under Operations Manager\Network Discovery folder.

 I traced a normal network discovery session, the events are typically logged in pairs in the form of start/stop or open/close events:

Start Id

Closing id

Description

12001

12008

Limited discovery started / Discovery completed

12002

12008

Full discovery started / Discover completed

12005

12007

Post processing started / completed

12003

12004

Device probe started / completed (in event param 5 is reported the device IP address)

12023

12024

Connections to computer started / completed

12127

12021

Discovery started for seed (in event param 5 is reported the device IP address) / discovery successful

12187

12021

Seed rediscovery started / finished

12014

n.a.

Filtered (i.e. excluded) devices

12199

n.a.

Connections to computers lost (since previous discovery)

12121

n.a.

Network topology cleared

12020

n.a.

Rediscover all devices when repository has been cleared (deleted)

 

A typical full discovery is composed by the following events:

12002 started

 

 

 

12121 topo cleared

 

 

12127 discovery for node x

 

 

 

12003 probe start for device x

 

 

12004 probe completed for device x

 

12005 post proc started

 

 

12007 post proc finished

 

 

12021 discovery for node x successful

 

12014 filtering devices

 

 

12008 completed

 

 

12023 connections start

 

 

12024 connections end

 

 

 

12001 limited started

 

 

 

12187 discovery for node x

 

 

 

12003 probe start for device x

 

 

12004 probe completed for device x

 

12005 post proc started

 

 

12007 post proc finished

 

 

12021 discovery for node x successful

 

12014 filtering devices

 

 

12008 completed

 

 

12023 connections start

 

 

12024 connections end

 

 

 

The second level of troubleshooting involves turning on tracing, this can be done modifying the network discovery configuration file by default in C:\Program Files\System Center Operations Manager 2012\Server\NetworkMonitoring\conf\discovery\discovery.conf. The discovery.conf file is very well documented, I don’t know if any mod is supported if not asked by CSS, so it’s always a good idea to make a backup copy before any change and remember that any OpsMgr fix can overwrite the changes. What Stefan’s wrote (see references) has slightly changed in RTM and SP1, now the resulting log file is in c:\windows\temp and we have a couple of new switches. To quickly recap, to perform a full debugging session:

  1. Remove all the files in the Program Files\System Center Operations Manager 2012\Server\ networkmonitoring\local\repos directory
  2. Edit networkmonitoring\conf\discovery\discovery.conf

 

Change

To

#

# Enable verbose logging of discovery progress.

#

 

LogDiscoveryProgress = FALSE

 

#

# Enable verbose logging of discovery progress.

#

 

LogDiscoveryProgress = TRUE

DebugEnabled = TRUE

 

 

To have extra information

#

# Enable SNMP Tracing during device discovery

#

 

# enableSNMPTrace = TRUE

 

#

# Enable ICMP Tracing during device discovery

#

# enableICMPTrace = TRUE

 

#

# Enable SNMP Tracing during device discovery

#

 

enableSNMPTrace = TRUE

 

#

# Enable ICMP Tracing during device discovery

#

enableICMPTrace = TRUE

 

Case is significant.

 

  1. Stop and restart the health service

4.       Turn on OM tracing

a.       go to the Program Files\System Center Operations Manager 2012\Server\Tools directory

b.      Run StartTracing.cmd DBG

 

5.       Run discovery

 

6.       Turn off OM tracing

a.       go to the Program Files\System Center Operations Manager 2012\Server\Tools directory

b.      Run StopTracing.cmd

 

7.       Turn off network discovery engine logging

a.       Edit networkmonitoring\conf\discovery\discovery.conf

 

Change

To

#

# Enable verbose logging of discovery progress.

#

 

LogDiscoveryProgress = TRUE

DebugEnabled = TRUE

 

#

# Enable verbose logging of discovery progress.

#

 

LogDiscoveryProgress = FALSE

DebugEnabled = FALSE

 

 

#

# Enable SNMP Tracing during device discovery

#

 

enableSNMPTrace = TRUE

 

#

# Enable ICMP Tracing during device discovery

#

enableICMPTrace = TRUE

 

#

# Enable SNMP Tracing during device discovery

#

 

# enableSNMPTrace = TRUE

 

#

# Enable ICMP Tracing during device discovery

#

# enableICMPTrace = TRUE

 

Case is significant.

 

b.      Stop and restart the health service

In certain cases, more tracing is needed on the OpsMgr side, in this case, you have to call CSS and get a trace information file to add to the public ones. I’m not authorized to share such a file.

Notes from the field

The typical reason for discovery failure is a firewall issue on the management server, here the rule is pretty straightforward: the Operations Manager specific rules *must* be enabled, it doesn’t matter if other rules for the same protocols are enabled, the OpsMgr specific rules need to be turned on. Another important notice is: you can, if you really want, disable the firewall but you better not disable the firewall service. In the latter case I’ve seen all sort of strange behaviors.

The rules are easily identifiable

clip_image002

clip_image004

The second issue is more subtle. I anticipated that discoveries can be triggered by specific configuration change traps from the devices, this happens if:

-          The device is configured to send trap to the monitoring management server (discovery and monitoring management servers can be different, but this is another story)

-          The change traps are enabled

When this happens a limited network discovery is triggered (now you know you’ll get event id 12001). So far so good. This feature has two drawbacks:

-          If the device(s) sends too many traps (the discovery is triggered once for every trap) your management severs is backlogged of limited discovery tasks that in turns have an impact on CPU and memory usage and block scheduled full discovery. This is not a situation you want to be in.

-          There’s a bug (still there in SP1 UR1) that when a limited discovery is triggered on a device the computer connections associated to other devices are lost (event id 12199), so your topology is basically gone.

Personally, I prefer not to use trap based discovery, to achieve this you can:

-          Filter out traps with the windows firewall

-          Not configure devices to send traps to management servers, currently traps are not used for network monitoring (this can change in future and you can have a custom MP that uses traps)

-          Disable the discovery rules that, btw, are not marked as discovery rules but rather are standard ones (sigh)

Currently I identified the following rules:

Trap Received (3Com Card Inserted)System.NetworkManagement.3Com.Node.CardInsertedEvent

Trap Received (3Com Module Inserted)System.NetworkManagement.3Com.Node.ModuleInsertedEvent

Trap Received (Cisco Configuration Change)

System.NetworkManagement.Cisco.Node.ConfigurationChanged

Trap Received (Cisco Configuration Management Event)-

System.NetworkManagement.Cisco.Node.ConfigurationManagementEvent

Trap Received (Cisco FRU Inserted) – System.NetworkManagement.Cisco.Node.FRUInsertedEvent

Trap Received (Cisco FRU Removed) – System.NetworkManagement.Cisco.Node.FRURemovedEvent

Trap Received (Cisco Module Status Change) – System.NetworkManagement.Cisco.Node.ModuleStatusChange

Trap Received (Cisco Reload) – System.NetworkManagement.Cisco.Node.Reload

A sample rule is structured as the following:

      <Rule ID=System.NetworkManagement.Cisco.Node.ConfigurationManagementEvent Enabled=true Target=NetworkLibrary!System.NetworkManagement.Cisco_Node ConfirmDelivery=false Remotable=true Priority=Normal DiscardLevel=100>

        <Category>Discovery</Category>

        <DataSources>

          <DataSource ID=Trap TypeID=NetworkLibrary!System.NetworkManagement.TrapTriggerProvider>

            <IP>$Target/Property[Type="NetworkLibrary!System.NetworkManagement.Node"]/SNMPAddress$</IP>

            <TriggerOID>.1.3.6.1.4.1.9.9.43.2.0.1</TriggerOID>

          </DataSource>

        </DataSources>

        <WriteActions>

          <WriteAction ID=WA TypeID=NetworkLibrary!System.NetworkManagement.TrapDiscoveryRequestPublishData />

        </WriteActions>

      </Rule>

 

References

My fellow MVP Stefan wrote a long ago on network discovery troubleshooting: http://www.code4ward.net/main/Blog/tabid/70/EntryId/105/Troubleshooting-Network-Discovery-in-SCOM-2012.aspx

Inframon tip for using sysName instead of DNS name http://blogs.inframon.com/post/2012/07/15/How-to-use-the-MIB2-System-Name-for-a-device-in-SCOM-2012.aspx

- Daniele

This posting is provided “AS IS” with no warranties, and confers no rights.

About these ads
  1. Creating SNMP monitoring Management Packs for System Center 2012 Operations Manager – 1 #scom #sysctr | Quae Nocent Docent
  2. Week of April 8: New blogs from Windows Server/System Center MVPs - Server and Cloud Partner and Customer Solutions Team Blog - Site Home - TechNet Blogs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 348 other followers

%d bloggers like this: