There are many questions on which components of network devices are monitored for System Center network monitoring. This post will hopefully clear up some of those questions as the post will cover what gets monitored and the conditions where the monitoring will apply. What components of the network device are monitored depends on three things:
Devices will be discovered differently depending on the manufacturer, model, and device system Object Identifier (OID). For example a Cisco Catalyst 3560 will be discovered with interfaces, processors, memory, fans, power supplies and temperature sensors. But a Cisco 2950 will only get interfaces, processors, and memory even though the device may have other components like fans, power supplies and temperature sensors. Other devices may only discover the interfaces on the device and no peripheral components will be discovered. The best way to determine what is discovered is to open the diagram view on a network node. Below you can see the memory (MEM), Processor (PSR), and ports discovered in the diagram view.
Interfaces are not all discovered equally. Interfaces that have implemented the interface MIB (RFC 2863) and MIB-II (RFC 1213) standards can have more monitoring available than other devices. This can include OperStatus, AdminStatus and performance counters like percent utilization and error packets. Devices that don’t implemented these MIBs may only have the existence of the interface discovered, or the OperStatus may depend on a vender specific MIB. To figure out if a particular interface is monitoring the standard interfaces you can open the health explorer on an interface from the diagram view of the device.
Port 1 is able to be monitored using the standard MIBs. Under the rollup monitor “Interface Status” you can see monitors for AdminStatus and OperStatus are present. Under the Performance monitor you can see the High Discard rollup monitor contains specific monitors to check the health of input and output rates the port is processing. However, in the picture you can see that even though the interface can be monitored using the standard MIBs in this case no monitoring is enabled.
Interface 30 on the device is an example of an interface where performance counters will not be collected or monitored. The interface is monitored only through AdminStatus and OperStatus as seen by the two monitors for status under the Interface Status aggregate monitor. Looking under the High Discard Percentage aggregate monitor you can see the High Input Rate Discard monitor is missing compared to Port1 above indicating performance counters won’t be collected.
Processor and memory can be monitored out of the box on devices where components are discovered. http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26831 has information on which devices will have processor and memory monitored. Other peripherals like fans, power, cards, and temperature will not get monitoring out of the box when discovered. Through the Authoring Pane in the Operations Manager console SNMP based rules and monitors can be created to monitor these components.
Rather than monitoring every interface out of the box only those interfaces that were discovered as connected are monitored. This is done to avoid noisy alerts on interfaces that are not connected, and to avoid excess monitoring on interfaces that won’t return a valid performance counter. This means the default state for an interface is to have all the monitoring disabled. For those interfaces that are known to be connected monitoring will be enabled. Interfaces you want to monitor the monitoring can also be enabled. Interfaces are only monitored if they are a member of one of three groups in Operations Manager. All three groups enable all the standard interface workflows assuming the standard MIBs were supported. These groups will also enable any vender specific workflows for the interface.
Relay Network Adapters Group
This group contains interfaces that are connecting two devices. When a full discovery is run containing both the devices, then the interfaces linking the two devices are added to this group and monitoring will become enabled.
Managed Computer Network Adapters Group
An agent computer that is directly connected to a device will have the connecting interface added to this group. For this to work the management group must have the Windows Operating System management pack for the agent’s operating system, the Windows Client Network Discovery management pack and the Windows Server Network Discovery management pack. The full discovery of the agent’s operating system has to be completed including the discovery of the agent’s network adapters. Then when network discovery runs for the device it should stitch the port on the device to the network adapter on the computer and add the port to this group.
Critical Network Adapters Group
This group can be updated through the Operations Manager console under authoring. You can add any of the interfaces to this group and the monitoring will be enabled. If the interface connected to your web server isn’t monitored then adding the interface to this group will give you alerts on problems with the interface connected to the web server.
Advanced Network Adapters Group
There is also a fourth group which behaves a little differently than the other three groups mentioned. This group turns on some extra advanced workflows for interfaces that won’t be enabled by the other groups. These workflows are disabled out of the box because they are they are often a duplication of performance counters already collected. These are advanced performance counters like Cisco Collision packets which are already reported as part error packets in the monitoring by the other three groups. If you want visibility into a particular performance metric than adding the interface to this group is one way to get that extra data.
When trying to figure out whether your Network monitoring is working correctly ask these questions to see what monitoring is taking effect.
Did Discovery Work?
Before any monitoring starts the discovery of the device needs to be completed successfully. The network monitoring has very similar dependencies to the discovery methods as both are SNMP based. Be sure in the discovery rule you specified the device should be monitored via SNMP Only or SNMP and ICMP. A future blog post should cover trouble shooting the network discovery.
What was Discovered?
The next thing to check is what was discovered on the network device. Use the Diagram view of the network device in the console to see what components of the device were discovered. Use the health explorer on the interfaces to see if performance counter and status monitoring is available or not. If your device is not getting components discovered and not getting performance counters monitoring than likely the device doesn’t support the standard MIBs.
Interface Monitoring Enabled?
Check to see if the interface you want to be monitored is a member of one of the network monitoring groups. You can view this in the Authoring pane of the console under groups.
Network Monitoring Management Pool Availability
When the Network Discovery rule was created a management pool was specified to use to monitor the devices. By default this will be the All Management Servers pool, but using a specific pool for Network Monitoring servers is advised. If the Network Device is behind a firewall or remote, then using a specific pool will be necessary. Check the Discovery rules in the Administration Pane of the Console to see which management pool should be monitoring the devices. Then check the Resource Pools in the Administration Pane to be sure the management server resource pool only contains servers that can contact the network device. It might be necessary to create multiple network discovery rules and management server resource pools to be sure your network monitoring is being fired from the correct locations.
11013 Event - SNMP Get Timeout
When the SNMP workflows timeout because a reply from the device was not received in time the Health Service will log 11013 events to the Operations Manager Log. With the out of the box workflows, Operations Manager will retry to the SNMP query on the nextinterval. There is a monitor in Operations Manager to detect these events.
Log Name: Operations Manager
Source: Health Service Modules
Event ID: 11013
SNMP GET request to IP Address 10.11.64.25 has timed out. This can be due to the device being offline or to the workflow using incorrect credentials.
11009 Event – SNMP Get failure
When a network device is queried for a particular value it might not be present. Then the Health Service will log an 11009 event in the Operations Manager log. The workflows that were using this value will be unloaded.
Log Name: Operations Manager
Event ID: 11009
Error in SNMP GET response from IP Address: 10.11.64.68, Status: noSuchInstance(129).
One or more workflows were affected by this.
Workflow name: System.NetworkManagement.MIB2_dot3.NetworkAdapter.InputPacketErrorPct
Instance name: PORT- 268
This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.
Now that SCOM's able to associate a server with a particular port on a switch, is there any form of Alert filtering so that if the switch goes down we don't get bombarded with alerts from all the servers hanging off that switch? Ie. SCOM now does dependencies.
About the topic 'What Gets Discovered' I want to know whether there is a list available or an Excel sheet which tells me in advance what is covered by OM12 for a particular network device. This is better compared to Discover the device and check Diagram View in order to know what's covered and what's not.
didn't you see this list: www.microsoft.com/.../details.aspx
From my point of view this answers your question
Peter Forster, MVP Virtual Machine, Austria
Is there a limit on the recommended number of devices and interfaces that SCOM will monitor?
@ Steve Burkett
System Center Operations Manager 2012 will be able to know whether network devices are online or offline. In the case that the device is offline, the workflows targeting the device will be suspended so you won't receive alerts for that device beyond the device offline alert. However, the management packs that are monitoring application servers are not aware of the offline network device and will continue to send alerts. For example if the device was connected to a database server, then the applications connecting to the database would still raise database unavailable alerts
There will be a sizing guide available when System Center Operations Manager 2012 is released that will cover recommended configurations and limits.
Regarding 11009: Got multiple Environments where the resolutions do not apply. I can do SNMP GET on a reported OID as often as I like and I get a value back.