Covering support and news on SMS, MOM, Configuration Manager, Operations Manager and System Center Essentials.
http://blogs.technet.com/smsandmom/default.aspx
Yesterday the PerformancePoint team released a sample solution that shows how to build cubes from the Operations Manager DW and then build balanced IT scorecards using Office PerformancePoint server from these cubes.
The solution contains the queries to pull from the OperationsManagerDW using SSIS, the samples cubes and sample scorecards.
The solution can be found on www.microsoft.com/BI and www.microsoft.com/PerformancePoint
This moves the quality data we collect in OpsMgr into the hands of the BI Analyst and the suite of BI tools that SSAS/PerformancePoint support.
Daniel Savage
Program Manager | System Center Operations Manager.
In my last post, I described about creating an actionable alert to a specific unit monitor - the status code monitor. You can do the same for all the other unit monitors. To that post, John Curtiss responded 'Availability aggregate rollups for the web application monitors are pretty useless'. John is right. The rollup simply says 'something is wrong in this web app' and it is down. For understanding why, let us look at the monitor tree. Below, is most of the monitor tree that forms health of a web application. The leaf nodes are the unit monitors and the health is rolled up to aggregate monitors. Unit monitors can be numeric, content match, numeric or security certificate related.
The aggregate monitor generated alert of the web application (Web app- URL) does not contain the precise description that identifies the exact cause of the problem. A web application alert could happen due to multiple failures. An alert is raised to indicate a problem and ideally it is one alert per problem. For example, if status code error caused the status code monitor to go error and then that caused the web app monitor to go error, it will generate the alert due to status code error. In the meantime, the status code got fixed but there was another failure – say certificate expired, the Web app monitor would still be error but due to a different problem. The alert would still remain in the same resolution state viz New, without a new alert being generated, as the Web app monitor remained in Error state. If the user had looked at an alert description that mentioned the first problem – status code, it may mislead them into thinking that it was the status code and not the certificate expiration. Alert is only indication of the problem and not assisting in diagnosis of the problem. Diagnosis is a complex process that may require additional data collection which is why connecting to the health explorer is the preferable method. At the aggregate level the problem may have triggered due to multiple causes whereas at the unit monitor level, we have precise indication of the problem. Hence, unit monitors can get more precise descriptions that indicate the problem, whereas at aggregate monitors, it is harder to create a precise description. If you think that majority of the problems are due to status code, I would recommend using the alert description that is stated in the feedback thread, but its hard to generalize a description of the alert at the aggregate level. And Alert is not intended to be the mechanism for live problem diagnostics.
Another factor to take into consideration is reduction of number of outstanding alerts in the system. Alerting at the aggregate level is meant to generate one alert at the application level instead of generating multiple alerts for each problem. Constant generation of alerts may be undesirable in most cases. Hence, by default we have disabled alerting on the unit monitor level. But users have the option to enabling the alert at every unit monitor that they need to. Alerts for monitors in sealed Management packs using overrides. One could develop a tool using the SDK that automates and applies the appropriate overrides for a large number of web applications
On the implementation level, there are optimizations in the monitoring infrastructure that are intentionally reducing unnecessary updates of monitor state for every state change notification unless the state is truly going to change from one state to another. In the above example, if monitor goes to error due to status code and then remains error due to another problem, there is no need to update the state from error to error and to generate an alert at the aggregate level. If we did that for every event that would generate a lot of state update notifications that could create other performance and scalability problems.
We are looking into ways of fixing the aggregate monitor alerts in one of our next releases to look at some options to make those alerts usable. Following questions may help me refine the proposal:
- Would it be okay if the alert description indicates the first error condition when the monitor went error/warning and created the alert but did not update subsequent state change events?
- What if the alert description is not updated after creation of alert but the history is modified with subsequent changes?
- What does the user want to determine the issue for the error after the error has gone away and resolved?
I would like to hear thoughts from the readers.
Next, let me look into bulk editing of configuration of the monitors.
With Operations Manager 2007, monitoring of Web based applications using a synthetic transaction approach was much easier with the inclusion of Web application monitoring template. This replaced the Web sites and services Management Pack in MOM 2005 and leveraged the benefits of the model based approach and state-centric monitoring available in OpsMgr. One of the side effects was that the Alerts raised when the watcher node detected a problem have a blank description. In the article, I will describe some tips to help you make the Alerts more meaningful and actionable.
Since I work on monitoring of Web based applications, I often hear from customers who are using the Web application templates that the default alert descriptions for the web applications are useless. When a Web application synthetic transaction fails whether it was an error status code or DNS lookup, that the Web application alert does not indicate anything useful.
In order to reproduce the problem, I created a simple web application with a non-existent URL. I selected all the default settings to create a new Web application. Within a couple of minutes, I received two alerts which showed up in the Alert view - One for the Web request and the other one for the Web application.
As you can see, neither of them had a good description. So what was the real cause of the problem? I had to click on the Health Explorer and review the monitor tree.
Now you can see that the Alert was raised because the HTTP Status code is 404 - not found. As a user, I have to look through the alerts, load up the health explorer and then identify the unit monitor causing the alert. That's too many steps. If I am forwarding this alert to a separate system from where I cannot launch the Health Explorer, then I am out of luck.
The problem is detected by a specific unit monitor which is not alerting by default. In the above case, it is the Status code monitor. One way I can make it generate an alert that is actionable is in simple steps
1. Right click and get properties of the monitor from the health explorer.
2. Enable the Alert for that monitor
3. Set the Alert description as follows
Status code is $Data/Context/RequestResults/RequestResult["1"]/BasePageData/StatusCode$
4. Reset the monitor. Close the alerts and wait for it to fire again.
Now I get a new Alert for status code with the detail of the failure that I can act upon. That is much more useful.
How did I find the magic description string for the alert? There is no magic to this - the information is in the context of the Alert as you can see in the State change Event above in the Health Explorer. (For more details on authoring alert descriptions, please refer to the Authoring guide). It has the request code in it. For the exact string in the context, use the following steps:
1. Go to the Authoring space and Edit the settings for the Web application. In the Web application editor, click on the Run Test link
2. Once the Test is run, click the 'View Full results'
3. Now click on the Raw tab to see full details.
4. Identify the status Code field (or any field that you are interested) that you wanted to see in the Alert description. Also note the fields name in the XML.
5. Construct the parameterized alert description based on the field you are interested using the Status code parameters as follows:
$Data/Context/RequestResults/RequestResult["%ReqID%"]/BasePageData/StatusCode$
Now you know a little documented trick for creating actionable alert descriptions. Do you think this was useful?
If we know this, why did we not make the Web application description reflect this information , by default. For that, check my next blog post.
(Updated PS script below which works on PowerShell V2 and above)
We have had some customers ask us if there is a way to remove an agent managed computer using PowerShell from OpsMgr 2007 after the agent managed computer has been turned off. The scenario is that the machine that has an agent installed on it gets de-commissioned for some reason and now the user would like to get rid of all the traces left by that particular agent machine using PowerShell. The attached PowerShell script should take care of this for you. Note: This script is not officially supported by the product team and used only when needed.
You need to specify the FQDN of the machine name after you run the script \DeleteAgent.ps1 satyamachine.vel.net
Why am I am not able to execute custom scripts in PowerShell?
On some PowerShell installs you will not be able to run this script because PowerShell is probably running on “Restricted” mode which means that the scripts need to be signed. You can bypass this by doing the following.
To change the script execution mode from the default RemoteSigned script execution mode, use the Set-Unrestricted cmdlet in the OpsMgr 2007 Command Shell. The OpsMgr 2007 Command Shell recognizes the change to the policy immediately.
Users that want to set a consistent script execution mode for all computers that are running the OpsMgr 2007 Command Shell should apply the script execution mode setting by using an Active Directory group policy. You configure the Active Directory group policy to set the ExecutionPolicy value located under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PowerShell\1\ShellIds\Microsoft.PowerShell registry key to the desired script execution mode.
Satya Vel | Program Manager | System Center |