Company Knowledge is used to capture the steps required to resolve an alert in your OpsMgr installation. When paired with the Product knowledge (which provides you with the application developers knowledge on the causes and suggested resolution steps for an alert), the two will help any operator with the best steps to take to resolve an alert.
Product Knowledge is embedded in a rule or monitor when it is authored. Company Knowledge can be added at any time provided you have the correct applications loaded and you are logged into the console with an account that is assigned to the correct role.
Software requirments for the machine that you will be accessing and editing the Company Knowledge on (this was tested on a management server):
Role requirements:
-Chris Fox
This blog is to outline the supported configurations for Operations Manager Reporting and help align this to the supported configurations of SQL Reporting Services.
Note: This does not cover the OperationsManagerDW component. For information on the high availability solutions for the OperationsManagerDW see - http://technet.microsoft.com/en-us/library/bb309428.aspx
Install Notes: Operations Manager Reporting is fully supported in this configuration.
Install Notes: Operations Manager Reporting is not supported in this configuration as OM Reporting installs a custom security extension as part of the setup of the front end components which cannot be replicated across the web farm.
Install Notes: Operations Manager Reporting is not supported in this configuration. There is an issue during Operations Manager Reporting upgrade that causes the upgrade to fail if the ReportServerDBs are not on the first installed node when the upgrade is performed. Aside from this issue Operations Manager will work as expected in this scenario.
Based on the above the first 2 are the supported options. For high availability scenarios we would recommend Option 2 and combine this with the recovery knowledge found in the Operations Guide which easily shows how to recover from a failure in the reporting components.( http://download.microsoft.com/download/7/4/d/74deff5e-449f-4a6b-91dd-ffbc117869a2/OM2007_OpsGuide.doc )
The Operations Manager 2007 SDK documentation is available on MSDN at http://go.microsoft.com/fwlink/?LinkId=108753, and it is now also available on the Microsoft Download Center at http://go.microsoft.com/fwlink/?LinkId=108754.
Recent updates have been made to class descriptions, the architecture overview topic, and a terms list (glossary) was also added. The biggest change in this documentation update is the addition of the following new topics and code examples:
Monitoring Object and Partial Monitoring Object Comparison - http://msdn2.microsoft.com/en-us/library/bb960508.aspx
Exceptions in Operations Manager - http://msdn2.microsoft.com/en-us/library/bb960495.aspx
How to Create State Views, Diagram Views, and View Folders - http://msdn2.microsoft.com/en-us/library/bb960509.aspx
How to Delete Views and Folders - http://msdn2.microsoft.com/en-us/library/bb960497.aspx
How to Create a Group - http://msdn2.microsoft.com/en-us/library/bb960490.aspx
How to Delete a Group - http://msdn2.microsoft.com/en-us/library/bb960483.aspx
How to Create a Knowledge Article - http://msdn2.microsoft.com/en-us/library/bb960494.aspx
How to Automate the Setup of URL Monitoring - http://msdn2.microsoft.com/en-us/library/bb960493.aspx
How to Create an Override - http://msdn2.microsoft.com/en-us/library/bb960481.aspx
How to Create an Override for a Diagnostic - http://msdn2.microsoft.com/en-us/library/bb960498.aspx
How to Create an Override for a Discovery - http://msdn2.microsoft.com/en-us/library/bb960485.aspx
How to Create an Override for a Monitor - http://msdn2.microsoft.com/en-us/library/bb960482.aspx
How to Create an Override for a Recovery - http://msdn2.microsoft.com/en-us/library/bb960489.aspx
How to Create an Override for a Rule - http://msdn2.microsoft.com/en-us/library/bb960505.aspx
How to Get Information About an Override - http://msdn2.microsoft.com/en-us/library/bb960504.aspx
How to Create a Unit Monitor - http://msdn2.microsoft.com/en-us/library/bb960506.aspx
How to Create an Event Log Unit Monitor - http://msdn2.microsoft.com/en-us/library/bb960507.aspx
How to Query for Agents - http://msdn2.microsoft.com/en-us/library/bb960492.aspx
How to Query for All Monitoring Objects in an Error State - http://msdn2.microsoft.com/en-us/library/bb960500.aspx
How to Query for All Rules That Have a Non-Category Override - http://msdn2.microsoft.com/en-us/library/bb960479.aspx
How to Query for Diagnostics - http://msdn2.microsoft.com/en-us/library/bb960499.aspx
How to Query for Discoveries - http://msdn2.microsoft.com/en-us/library/bb960486.aspx
How to Query for Management Packs - http://msdn2.microsoft.com/en-us/library/bb960491.aspx
How to Query for Monitors - http://msdn2.microsoft.com/en-us/library/bb960480.aspx
How to Query for Overrides - http://msdn2.microsoft.com/en-us/library/bb960503.aspx
How to Query for Recoveries - http://msdn2.microsoft.com/en-us/library/bb960487.aspx
How to Query for Rules - http://msdn2.microsoft.com/en-us/library/bb960496.aspx
How to Query for Tasks - http://msdn2.microsoft.com/en-us/library/bb960488.aspx
How to Query for Computers Running Windows Server 2003 - http://msdn2.microsoft.com/en-us/library/bb960501.aspx
Operations Manager 2007 SDK Glossary - http://msdn2.microsoft.com/en-us/library/cc268403.aspx
Every week I see one customer running into an issue where they either see the management servers or agents showing up as not monitored. While there a number of reasons why these roles may show up as being ‘not monitored’ I wanted to list out a couple of things to check to help you figure out what the problem maybe. Below are some steps to drill down to understanding why you are seeing a 'Not Monitored' state for Management Servers and Agents
1) If you are seeing Not Monitored on the “Root Management Server” log onto to the RMS and verify that the Configuration and Health Services are running. This can be done by opening the service snap-in. If the Management Server and Agent is showing up as being not monitored then verify that the Health Service is running on each of those individual roles.
2) If you are upgrading from RTM to SP1 make sure that you import the latest updated management packs. The core system MP’s are updated automatically after upgrade but the non-core MP’s do not get updated automatically and users need to manually re-import those MPs. Kevin Holman wrote a good blog about this which can be found here. If you are seeing unmonitored after you upgrade this could be another possible reason. All the SP1 MP’s should have version 6.0.6278.0 you can find the version of the MP’s by opening the OpsMgr console going to the Administration view and selecting the Management Packs node. The version number of the MP’s will be listed in the version column. If you see an MP with a version nuber that is below 6278 then you will need to re-import those MP’s. Kevin wrote this SQL script which will also list out the MP’s and the version number.
SELECT MPName, MPFriendlyName, MPVersion, MPIsSealed
FROM ManagementPack WITH(NOLOCK)
ORDER BY MPName
3) If users renames a server or shuts down a server that is being monitored then users will see that the server will show up as being un-monitored. This is only if they just deployed a MS or an agent and someone brings down the servers. In most cases you would see a grey icon with a checkbox which indicates that the server stopped heart beating and the last known state was healthy.
4) If there is no trust between the domain the agent is on and the management server then you are likely to get the ‘not monitored’ state. If there is only a one way trust between the Management server and agent then you would also see this issue. In OpsMgr 2007 agents initiate communication with the MS and the MS uses the same channel to communicate back to the agents. An easy way to figure out if this is the issue is to go to the event log on the agent and see if you see error events that state mutual authentication could not be established.
5) Gateway Servers are one of the big criminals of showing the ‘not monitored’ state. 99% of the times when they show that they are not monitored it usually means the users have not configured the certificates correctly, they not run the Gateway approval command line tool or they are not using the right Public Key Infrastructure (PKI).
6) When you deploy a management pack and the action account is configured as a low priv account, some workflows (monitors/rules/discoveries/tasks/diagnostics/recoveries) may not be able to execute because by default they will run under the low priv account and may not have sufficient rights to access the instrumentation they need in order to function properly. You can get more information on this from Boris’s blog.
7) If the agent action account does not have enough privileges some of the properties of the server will show up as being not monitored. Check the OpsMgr agent event log for event 1201. The agent health service normally will log that event for each management pack it downloads. Alternatively you can also go to the Heath State folder under the %Program files%\Microsoft System Center Operations Manager 2007\ (there should be a management pack folder) you can check to see if the agent downloaded the various MPs. If there are no P’s in the folder it is another sign that agents are not getting configuration from the Management Server.
While the above steps may not be the direct solution to the problem they should help put you on the right track to diagnose the root of the issue.
We had a question the other day asking how you know if you have the latest version of the Operations Manager documentation. The easiest way to do this is to get to the documentation from this page:
http://technet.microsoft.com/en-us/opsmgr/bb498235.aspx
This page lists all documentation along with the publish date, so you can compare it with any copies of the guides you've downloaded earlier.
I get asked a lot about the best way to use Operations Manager Reporting to answer business specific questions such as "How do I gain efficiencies with my IT Operators?" or "How can I continually improve my monitoring system?" . Although these may be process related, the data we keep within the data warehouse can be used in conjunction with our reporting to supplement these business improvements.
Throughout this blog series I will explore various data collections and attempt to show how these questions can be best answered.
Alert data is a good place to start since it is what most operators pertain to when using the monitoring system. Typically they will have an operations console and will act upon alerts as and when they happen. Mining this data therefore allows you to see what objects cause the most amount of work, what alerts take the longest time to fix and what rules may be configured incorrectly.
Lets start with what we provide out of the box.
Alerts Report.Allows you for a specified time period to show all alerts for a managed entity or group of managed entities. This is particularly useful to answer questions like, For the past week which of my Windows 2003 Servers has the most Priority 1 alerts. Click for Link to Alerts Report
Alert Detail ReportAvailable as a drill through from the Alerts report this will show you all instances of a specific alert useful for determining what alerts have high repeat counts. Also includes additional data like Owner and Ticket ID.Click for Link to Alert Details Report
Most Common Alerts ReportThis report is extremely useful when you are trying to fine tune your environment to ensure your operators are spending their time working on priority issues. Filtered by Management Pack you can show the most common alerts by percentage of total, the time taken to resolve these alerts and the average time taken per alert. Click for link to Most Common Alerts Report 1 Report 2
So using features such as published reports and linked reports you can create views into the Alert data that help you not only see what is happening but also drive improvements over time.
The reports above provide a certain amount of functionality but there may be cases where you need to drill in further. Questions such as "on average for last week how long were alerts open for?","How many alerts per category were closed last week?" and "What's the average length of time per alert category an alert takes to close?"
This then requires you querying the additional data available in the warehouse either to create your own reports, pivot in excel or publish to a dashboard for regular review.
A simple query that answers these additional questions is shown below.
SELECT
COUNT(vAlertDetail.AlertGuid) AS TotalAlerts,
Alert.vAlert.AlertName,
Alert.vAlert.Category,
AVG(Alert.vAlert.RepeatCount) AS AverageRepeatCount,
vManagedEntity.Path,
vManagedEntity.DisplayName,
vManagedEntity.ManagedEntityDefaultName,
vResolutionState.ResolutionStateName,
AVG(Alert.vAlertResolutionState.TimeFromRaisedSeconds) AS AverageOpenTimeSeconds
FROM Alert.vAlertDetail INNER JOIN
Alert.vAlert ON Alert.vAlertDetail.AlertGuid = Alert.vAlert.AlertGuid INNER JOIN
vManagedEntity ON Alert.vAlert.ManagedEntityRowId = vManagedEntity.ManagedEntityRowId INNER JOIN
Alert.vAlertResolutionState ON Alert.vAlert.AlertGuid = Alert.vAlertResolutionState.AlertGuid INNER JOIN
vResolutionState ON Alert.vAlertResolutionState.ResolutionState = vResolutionState.ResolutionStateId
WHERE
Alert.vAlert.RaisedDateTime BETWEEN getutcdate()-7 AND getutcdate()
AND Alert.vAlert.AlertName <> ''
--AND vResolutionState.ResolutionStateName = 'New'
GROUP BY
vResolutionState.ResolutionStateName
ORDER BY COUNT(vAlertDetail.AlertGuid) DESC, AVG(Alert.vAlertResolutionState.TimeFromRaisedSeconds) DESC
This query is an example and has not been performance tuned, you should evaluate your dataset size and tune the query accordingly so as not to affect DW performance.
To view a schema diagram of the Alert data click here.
This posting is provided "AS IS" with no warranties, and confers no rights.Use of included utilities are subject to the terms specified athttp://www.microsoft.com/info/cpyright.htm
Daniel SavageProgram Manager | System Center Operations Manager
Our evaluation upgrade strategy has changed from the RTM version of OpsMgr 2007 to the SP1 version. So in today blogs I wanted to share the various evaluation upgrade strategies customers have.
General note for all upgrades: When you upgrade any evaluation version (180days) of OpsMgr 2007 to a fully licensed version users will never lose any exiting data, functionality and they can use all their custom or modified management packs. When we run the license upgrade all we are doing is updating a particular database entry which is encrypted that modifies the license version and stops spamming users with alerts that their license has expired.
Scenario A: I have OpsMgr 2007 Evaluation and want to upgrade fully licensed OpsMgr 2007
Customers will need to get the Licensed (Select) CD image from the Microsoft Volume Licensing site and just run the Licensing Wizard MSI tool that can be found under the setup folder.
Scenario B: I have Desktop Error Monitoring (DEM) that came with Microsoft Desktop Optimization Pack (MDOP) and want to upgrade fully licensed OpsMgr 2007 version.
Users just need to run Licensing Wizard MSI tool that can be found under the setup folder of the licensed version of OpsMgr 2007.
Scenario C: I have OpsMgr 2007 SP1 Release Candidate (RC) Evaluation that came with Microsoft Desktop Optimization Pack (MDOP) and want to upgrade fully licensed OpsMgr 2007 version.
Customers will need to get the Licensed Service Pack 1 (Select) CD image from the Microsoft Volume Licensing site and just run the OMSetup.exe which will automatically upgrade license version to Full OpsMgr 2007 SP1 version.
Scenario D: I have OpsMgr 2007 SP1 RTM Evaluation and want to upgrade fully licensed OpsMgr 2007 SP1 RTM version.
Scenario E: I have Desktop Error Monitoring (DEM) that came with Microsoft Desktop Optimization Pack (MDOP) and want to upgrade fully licensed OpsMgr 2007 Service Pack 1 version.
Users will first need to run the DEM Service Pack 1 upgrade package which is found on the download center which will make their install DEM with SP1. Then if they want to upgrade it to a fully licensed version of OpsMgr 2007 SP1 all they need to do is download the Licensed Service Pack 1 (Select) CD image from the Microsoft Volume Licensing site and just run the OMSetup.exe which will automatically upgrade license version to Full OpsMgr 2007 SP1 version.
How do I know when my license will expire?
OpsMgr 2007 will drop alerts in the Operations Console (2 weeks, 1 week, 48 hours, 24 hours and 1 hours) before the evaluation is about to expire.
Here is the next entry in the posts I’m migrating over. Again this intended to be a “snapshot in time” supplement to the Implementing System Center Operations Manager 2007 at Microsoft white paper.
The Root Management Server (RMS)
So what exactly does the RMS do?
The RMS server, by definition, is the first management server installed in a management group. The RMS is differentiated from other management servers (MS) by two distinct services and a host of distinct workflows that run as a part of the health service on the RMS.
The “SDK Service” (OMSDK): When I hear the term “SDK” I typically think code libraries that I can use to write custom code against. With OpsMgr 2007 the SDK is really two things: 1) A software development kit - http://msdn2.microsoft.com/en-us/library/bb437575.aspx 2) A service running on the RMS, which is the single point of access for SDK connectivity. It is the latter part of that definition that is most relevant when thinking about deployment.
The “Config Service” (OMCFG): In MOM 2005 the centralized configuration store for the management group is the OpsDB. In turn each MOM 2005 management server was querying directly to the OpsDB to get its understanding of configuration. This was a fairly costly process that required constant resource overhead on the DB server, which was already busy enough processing operational data. The OpsDB is still the central store of configuration in OpsMgr 2007, but the RMS server has taken on the role of being the single point of access for that configuration data from the DB via the Config Service. All other systems in the management group get their configuration (directly or indirectly) from the RMS.
Workflows under the “Health Service”: A number of distinct workflows, assigned to the RMS by rules in a number of the out of the box MP’s are run exclusively by the HealthService on the RMS. Examples of these workflows include “AD assignment rules”, “Notifications”, “Health Watcher Instances” and the “OpsDB partitioning and grooming processes”. In effect many things that in MOM 2005 used to be scripts, or SQL jobs or functionality written directly into product code is now running as a rule on the RMS.
The RMS Platform
Now that you have a basic idea of the role an RMS plays in a management group let’s talk a bit about how Microsoft IT deployed this role. Given all the distinct functions the RMS serves, and the scale of IT’s management groups, they opted for the same server platform as their Operational Database (OpsDB) servers:
o Server Model: HP ProLiant DL385 G1
o Processors: 2 x dual core (4 procs in the OS’ eyes) 2.2 Ghz AMD Opteron Processors:
o RAM: 8 GB
o Drives: 2 SAN drives; one for the cluster quorum and the other for storing the various OpsMgr 2007 service state directories that are shared between nodes of the RMS.
o Quorum drive: 2GB RAID 5 – nothing fancy here; less than 20mb is actually in use.
o State drive: 10GB RAID 0+1: Typically less than 3GB of actually data on this drive but I/O is high at scale.
o OS: Windows Server 2003 Enterprise x64 Edition with SP1
Using that platform Microsoft IT has seen the average RMS at 38.6% CPU utilization and memory paging of ~86 pages/sec. The state drive is quite busy sustaining an average of ~1200 transfers/sec and an average data rate of 14.89MB per second. In both cases ~95% of the drive activity is writes.
If you take resource utilization down to the level of the OpsMgr 2007 specific process that are running on the RMS the top consumer in IT’s deployments is the config service (Microsoft.MOM.ConfigServiceHost.exe), followed by the Health Service (HealthService.exe) and then the SDK service (Microsoft.MOM.Sdk.ServiceHost.exe) and Monitoring Host (MoniotringHost.exe) processes. The following table shows the average “% Processor Time” and “Private Bytes” for the relevant processes on the RMS:
% Processor Time
Average
Microsoft.MOM.ConfigServiceHost.exe
18.62375
HealthService.exe
12.556125
Microsoft.MOM.Sdk.ServiceHost.exe
2.438125
MonitoringHost.exe
0.739875
Private (Mega)Bytes
3310.138577
2289.074942
1024.69
1249.211722
Redundancy
Given that the RMS is so vital to the functionality of a management group, IT planned from the earliest design phases to make the investment to ensure high availability (HA) of this role. With that in mind IT worked with the OpsMgr product group early on to test the setup and use of clustered RMS’. A clustered RMS is comprised of a resource group containing a network name, a dedicated IP, 3 shared services (HealthService; Config Service; SDK Service) and a shared drive for holding the central state files used by the shared services (referred to above as the state drive). With clustering of the RMS configured, automated failover can occur during and un-expected outage, as well as planned failovers during system upgrades or maintenance work. Microsoft IT’s experiences to date with RMS clustering have been very positive, but the key take away from both the deployment of the RMS and the OpsDB is that the monitoring team has built up its knowledge around configuring and working with clusters and clustered resources. The setup process is well documented in the OpsMgr 2007 deployment guide
Similarly to how IT deployed the OpsDB, they rely on a different approach for business continuance/disaster recovery (BC/DR) than they do for HA. In order to achieve this Microsoft IT deployed an additional management server in each management group, whose sole purpose is to await the day that a disaster occurs. Here are the general steps that were taken to setup this “RMS standby”:
o Build out the standard management group.
o Install the final management server on the geographically remote server.
o Backup the clustered RMS’ encryption keys with the “SecureStorageBackup.exe” tool from the \SupportTools directory (a one-time deal).
In the future event of a disaster the encryption keys can be restored to the remote management server, the OpsDB failed over, and then the remote management server can be promoted to be the RMS.
The RMS golden rule: Location, location, location!
Choosing the right location (domain and network) for a root management server is very important when designing a deployment. Whether intentionally and proactively, or inadvertently and reactively, the IT monitoring team found themselves thinking a lot about the following features when choosing where they were going to setup their root management servers:
o Mutual authentication: In MOM 2005 it was optional, but in OpsMgr 2007 mutual authentication is required. This means that every communication channel that exists, both ends need to be able to confirm the identity of the other end-point. This can be done via Kerberos (domain/forest trusts) or via certificate based authentication. As such IT intentionally joins every RMS to an active directory domain that has the greatest number of two-way trusts with the various domains where agents reside.
o AD assignment rules: One of the top 3 new features of OpsMgr 2007 (in my unofficial opinion) is the fact that agents can now be installed with zero-configuration, and when they start up they can query AD for what management groups, and servers they should be talking to. That configuration information that exists in AD is maintained by rules running on the root management server. If AD assignment rules are to be used then the RMS must be able to communicate with domains that rules will be run against.
o Operations Console/SDK Access: The SDK service running on the RMS is the only point to which users’ consoles can connect. If the SDK service is inaccessible then visibility to the monitoring data being collected by that management group is inaccessible, and in some senses the management group can be considered logically unavailable. As such IT was deliberate to locate their RMS’ in locations that are widely accessible.
o User Roles across tiers: Microsoft IT is deployed in a tiered fashion with three management groups as of the time of this post. Their users access a single management group, and from there they use the “Show Connected Alerts” feature to view alerts from all management groups within a single view. When that user turns on “Show Connected Alerts” their credentials are passed to the various mid-tier management groups to get the alerts from each. During that authentication process their user role’s scope is applied and the resulting alert set is filtered appropriately. Ensuring that all RMS servers can communicate and authenticate with the same domains allows users roles to be defined once in a central MG and then replicated to the other MG’s. This is by no means a requirement, but it can simply administration.
o The Data Warehouse: While most write activity occurs between the management servers and the data warehouse DB server, the RMS does write some data to the DWH DB as well. This communication needs to be accounted for when designing a deployment. Further details on the ports and protocols required can be found in the Operations Manager 2007 Security Guide.
OpsMgr Community,
If you are receiving alerts “Performance Module could not find a performance counter” in the Operations Manager Console, please perform thefollowing steps to disable the rule via override.
Note: When you run the Operations Console on a computer that is not a Management Server, the Connect To Server dialog box displays. In the Server name text box, type the name of the Operations Manager 2007 Management Server to which you want the Operations Console to connect.
1. Navigate to the Authoring Space in the Console.
2. Select “Rules” under “Management Pack Objects”.
3. Type “Performance Data Source Module” in the “Look for:” box and click “Find Now”. Be sure a Scope is not set or filtering the “Health Service” Target.
4. Find the rule, “Performance Data Source Module could not find a performance counter” under “Type: Health Service (2)”, right-click, select “Overrides”, “Disable the Rule”, “For all objects of type: Health Service”.
5. When prompted, “Are you sure you want to disable this rule for Health Service?” click “Yes”.
Microsoft is working on the long-term solution to address this problem.
Thanks, Justin Incarnato
-------------------------This posting is provided "AS IS" with no warranties, and confers no rights.Use of included script samples are subject to the terms specified athttp://www.microsoft.com/info/cpyright.htm-------------------------