Welcome to TechNet Blogs Sign in | Join | Help

Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

 

 

 

 

 

In OpsMgr 2007, when a agent experiences a heartbeat failure, several things happen.  There are diagnostics, and possibly recoveries that are run.  Alerts, and possibly notifications go out.

But what happens if my Operations team misses on of these alerts?  What can I do to "spot check" agents with issues?

Well, any time an agent has a heartbeat failure, we gray out the state icon of the agents last known state for in each state view. 

However - you CAN create a State view that will turn Red or Yellow just like any other state views.  Simply create a new State View, and scope the class to Health Service Watcher (Agent).

I called mine Heartbeat State View:

image

This view will show us when any of the agent health service watcher monitors are unhealthy:  In my case - OWA and EXCH1 have issues.  OWA is DOWN, while EXCH1 agent healthservice is stopped.

image

However - here is the issue.  This view shows us when ANY monitor rolls up unhealthy state.... this includes heartbeat failures AND computer unreachable (server IP stack is down):

image

What if I want a State View - to ONLY show me computers that are DOWN.... as in... not heartbeating AND not responding to any PING?  Most customers consider this their "most critical situation".  Well, I haven't found an easy way to do that.... so I wrote a report which handles it.  This report will query the OpsDB for the state of the "Computer Not Reachable" monitor, and only display those servers.  It is based on the following query:

SELECT bme.DisplayName, s.LastModified as LastModifiedUTC, dateadd(hh,-5,s.LastModified) as 'LastModifiedCST (GMT-5)'
FROM state AS s, BaseManagedEntity as bme
WHERE s.basemanagedentityid = bme.basemanagedentityid AND s.monitorid
IN (SELECT MonitorId FROM Monitor WHERE MonitorName = 'Microsoft.SystemCenter.HealthService.ComputerDown')
AND s.Healthstate = '3' AND bme.IsDeleted = '0'
ORDER BY s.Lastmodified DESC

You can import this report if you have created a data source as shown in my previous post: 

http://blogs.technet.com/kevinholman/archive/2008/06/27/creating-a-new-data-source-for-reporting-against-the-operational-database.aspx

Import this report into your custom folder... and run it.  You can schedule it to receive it first thing every day... if you like the output:

image

*****  Update 6-30-08  I removed a section of the original query relating to maintenance mode.  We found that if a down server had never been in maintenance mode, the server would not show up in the report.  The query and report download have been updated to address this.

Report is attached below:

Published Friday, June 27, 2008 10:39 PM by kevinhol

Attachment(s): Servers_Down_Report.rdl

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Monday, June 30, 2008 10:59 AM by StuartR

With MOM 2005, we can accomplish this quite easily using the following approach:

An SERVER DOWN alert can be generated in response to an internally-generated ping failure event created by a MOM Agent ping script which is part of a MOM Agent connectivity rule.

This rule monitors for the internal failure event and will generate a "Service Unavailable" alert indicating that the Agent Computer is most likely down (or has lost network connectivity).

Note that 12 ping attempts over a 90 second period along with an additional ping after 15 minutes must all have failed before this alert is generated.

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Monday, June 30, 2008 1:01 PM by kevinhol

One note to add - in OpsMgr you will get a distinct alert whenever an agent doest not respond to ping, in addition to the heartbeat failurre alert.  What we dont have - is a state view JUST for computers that are down...

You could easily write a custom monitor that runs a ping script - and build your own state view for this in the console... and not need this report.  The benefit of the report is being able to schedule it and deliver via email or sharepoint.

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Monday, July 28, 2008 9:20 AM by daviesg

Hi Kevin

Trouble is by creating the monitor you mention you are actually duplicating work that OpsMgr is doing. It sort of highlights the lack of logic in some functionality.

To me, it makes no sense that I have to do a ping script as a monitor when OpsMgr has a much more powerful solution - agent heartbeat with associated ping of servers on which the agent heartbeat has been missed. I just need to get that information into the console .... and the fact that OpsMgr can't is a something of design flaw.  

As I mentioned on the newsgroups, I don't think the report is feasible for near real time info in a large environment.

Cheers

Graham

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Monday, July 28, 2008 9:23 AM by daviesg

Ahhh .. didn't read that properly before I posted!! Meant the fact that agent health state couldn't be incorporated into the computer state view is something of a flaw ... realise there are the agent health state views as per my posting in the newsgroup ;-)

# A cool way to use a web page view in the console - run a report!

Friday, November 07, 2008 5:14 PM by Kevin Holman's OpsMgr Blog

Here is a unique way to use web page views in the OpsMgr console. You can create a web page view in the

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Monday, December 22, 2008 1:19 AM by bradje

Hi,

I do not understand how to IMPORT the report in to the new Custom report folder.

I notice that the reports on my reporting server are *.rpdl but this attachment is *.rdl.

How do I get this report into the new folder?

Thx,

John Bradshaw

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Tuesday, December 30, 2008 7:00 AM by hnehnes

Hi,

I have a different problem to the same topic. If a server goes down I do not receive any alerts. When I open Health Explorer with the above settings, I see only white bullets under Availability except Local Health Service Availability. Computer not Reachable, ... are disabled in their sealed MP. What is wrong in our configuration and what do I have to change to get an alert when a server goes down?

Thanks

Hendrik

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Friday, January 23, 2009 6:17 AM by Ren

I tried to use this UDL file by following the steps as mentioned in this site. When i run the report getting this error "An error has occurred during report processing.

Cannot create a connection to data source 'ops'.

For more information about this error navigate to the report server on the local server machine, or enable remote errors ".

Please advise.

Ren

# re: Whoops!

Friday, January 23, 2009 9:13 AM by kevinhol

You are correct - It looks like in this RDL file I named my data source "Ops" instead of "OpsDB".

Simply open the RDL file - edit that, and import..... or simply go to your imported report - edit it - change the data source to your live data source that points to the opsDB.

# re: Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Friday, February 13, 2009 4:16 PM by mccreerJ

Do you know of any way to setup subscriptions for only "ping failed" notifications?  Right now every time a server fails a heart beat and cannot be pinged we receive two text messages.  One for the heart beat failure and one for the Ping failure.

# re: Subscribe to server down alerts only?

Friday, February 13, 2009 4:36 PM by kevinhol

YES!  I do.  :-)

In R2 - this is super easy - because we can subscribe to alerts rule by rule - monitor by monitor.

In SP1 - it is doable - just a bit more difficult.  Please see my how to post at:

http://blogs.technet.com/kevinholman/archive/2008/10/12/creating-granular-alert-notifications-rule-by-rule-monitor-by-monitor.aspx

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker