Now that you have had a chance to test SCOM 2012, please provide feedback!
Usage Survey
Allow about 20 minutes to complete this survey.
Complete the survey and qualify to Win an Xbox & Kinect!
Sweepstakes Official Rules
Our team has made a few posts around APM with Operations Manager 2012, how to get things running, how it works, and how to simulate errors for testing. Here I’m going to talk about the application centric alerts you will see in OM when you start using APM.
There are two main consoles used for working with APM events, the Operations Console and the Application Diagnostics Console. The Operations Console is where you do your alert triaging and this is the same as working with any other feature of Operations Manager. The Application Diagnostics Console, installed when you install the Web Console, provides deeper diagnostics for the application events that are collected.
By default performance thresholds are set quite high; this is because you want to make sure you find the major bottlenecks before you start looking deeper. The first time you configure monitoring, use the defaults and then work on tightening the monitoring as you find and resolve the main issues within the application. A good target for the performance threshold is a 5-8 second response time for web applications, this is the threshold at which users typically start to abandon pages due to the perception they are ‘slow’.
With the Operations console there are a couple of ways we raise alerts:
Alerting Rules
There is a rule for each type of event we alert on: Performance, Connectivity, Security and Application Failure. We raise an individual alert when those types of events are detected in the monitored application. These alerts do not affect the health state of the monitored application since a single performance or exception event doesn’t mean your application is unhealthy.
These alerts provide a deep dive into the issues that are happening with the monitored application. Performance alerts provide context around the slow calls and which tier is the root cause of the issue. Exception alerts tell you the type of exception raised, where it came from and the call stack that led up to it. This is the information you need to know so that alerts can be handled correctly: was the root cause a ‘slow query’, ‘connection refused by host’, 'Invalid Logon’, etc.
Monitors
Following the mantra that a single captured event does not make our application unhealthy, we have 3 monitors defined for the applications that monitor performance counters that get registered when the System Center Management APM service is installed:
The % Exception Events and % Performance Events monitors are your indicators that the application’s reliability is on the decline. If you are getting a high number of exception or performance events, these monitors will let you know and turn your application unhealthy since it’s time to dig into the individual performance and exception events to find the root cause.
Alerts are only raised in the Operations Console, but the underlying events can be accessed through the Operations Console (alerting rules) or Application Diagnostics.
Moving between consoles
When server or client performance or exception alerts are raised you are given a description of the problem, KB around what the problem signifies and Alert Context that provides a closer look at the cause of the issue.
When working with the Alert Context there is a link in the top left corner that allows you to transition from the Operations Console to Application Diagnostics. With Application Diagnostics you can dig deeper into the alert and look at not only the current event but also related events, similar events, event chains and a snapshot of the server performance at the time of the event. These concepts are outlined in more detail in the Operations Guide.
Controlling Alerts
Finding that you are getting too many alerts in OM? You can disable the alerting rules that I outlined above by un-checking Performance and Exception event monitoring in the template. This will stop the alerts from being raised in OM, but they will continue to be logged to Application Diagnostics.
The flow for working with alerts changes a bit when you do this, now you use the monitors in OM to be notified that there are a large % of performance or exception events occurring and you use Application Diagnostics to drill into the problems. This works well if triaging is done solely by the application team, they can use Application Diagnostics directly and the Operation team can keep the application specific alerts out of the Operations Console. The downside is that you won’t be able to forward the performance and exception alerts through connectors since they don’t get raised in OM.
This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.
OM Community,
In this blog post, I will explain the changes made to the Operations Manager 2012 infrastructure topology. The purpose of this post is not to do a deep technical explanation on how some of these new features work but more of an overview around the new changes and how they may affect you. Over the next few months, we (Operations Manager Team) plan to blog additional technical details.
In previous versions, Operations Manager (2007, 2007 SP1, R2) had a parent-child topology, meaning that in a Management Group a Management Server called the Root Management Server (commonly known as the RMS) acted as a parent to one or more secondary Management Servers or Gateways. The RMS has many unique responsibilities in the Management Group (see below)
The RMS provides the following services:
The RMS also introduces the following customer challenges:
With these kind of challenges it was very important for most IT & Operations teams to ensure their RMS was highly available and easily recoverable in cases of a disaster.
This left them with two options:
Unfortunately, both of these options created additional complexity and burdens to the IT & Operations teams, Windows clustering is complex to setup and requires additional shared storage. Patching a clustered RMS was cumbersome and prone to creating instability in the Management Group. Promoting a secondary Management Server was a manual process that required the person to run a specialized command line tool then change multiple configuration files and registry keys on the other components like Reporting Server, Web Console, or other Management Servers. Depending on the customers SLA to the business they would implement one or both of these solutions to ensure some level of availability in the Management Group.
Also, by having this single point of failure in the Management Group it has created a bottleneck that limits the scale out numbers around how many Windows Agents, Unix Agents and console a single Management Group can support.
During product planning for OM12, we quickly identified this as one of our highest priorities. By removing the single point of failure we can provide our customers a much better story around High Availability and lower their costs to maintaining a Operations Manager infrastructure. Also, we can scale the Management Group out to support new OM12 features like Network Monitoring and Application Monitoring (APM).
After an in-depth investigation we decided to remove the Root Management server role from a Operations Manager topology. As a result, we needed to figure out how to distribute the workloads the RMS performed. This boils down to three things.
In OM12, setup sets this service to automatically start on every Management Server during install. We support any SDK client connecting to any Management Server. At Beta, you will need to configure NLB on the SDK Service for automatic failover.
In order to federate the Config Service we needed to rewrite the config service almost completely. If you remember in OM 2007 versions the RMS always required a huge amount of memory to properly function. One of the main reasons for this was the Config service. You see every time the Config Service starts, it reads the Operational Database and loads its view of the instance space into memory in XML. In larger Management Groups, this file can easily grow to over 6 GB. The Config Service uses this file to compare against the Operational Database to detect changes and issue new configuration to Health Services. Now that every Management Server will have a running active Configuration Service it is not reasonable to store this in memory any longer. Moving forward the Config Service will store this data in a centralized database (Operational db) that all Config Services in the Management Group participate in keeping up to date and utilizing it to detect configurations changes to the instance space. A fantastic benefit that came out of this design is a much faster startup of Config Service. Once the database is initially created, on subsequent starts the Config Service does not need to rebuild this database from scratch and instead just maintains it. Therefore, it starts issuing configuration much sooner after restart. This is a major improvement over OM 2007 versions where in a large management group it could take up to an hour to start issuing configuration to agents.
To distribute the RMS specific workloads to all management servers, we needed to develop a mechanism for each Health Service on the management server to function independently, while still having awareness of the workloads the other management servers are performing. This helps to ensure we do not get workflow duplication or missed workflows. To achieve this we added a new feature to OM12 called Resource Pool. Resource Pools are a collection of Health Services working together to manage instances assigned to the pool. Workflows targeted to the instances are loaded by the Health Service in the Resource Pool that ends up managing that instance. If one of the Health Services in the Resource pool were to fail, the other Health Services in the pool will pick up the work that the failed member was running. We also use Resource Pools to bring high availability to other product features like Networking and Unix monitoring. In a follow up blog post I will dive into far more detail on how resource pools work and how to tell where things are running.
To distribute the RMS specific workloads we create three resource pools by default.
Update April 2, 2012:Notifications are no longer managed by the All Management Servers Pool. There is a dedicated Notifications Resource Pool now which is described below.
Notice in the screen capture above we have a column called Membership and it is set to “Automatic” for the default pools. This means all management servers in the management group are automatically a member of these pools. In order to change this you need to open PowerShell and run a PowerShell command (see below).
Get-SCOMResourcePool –Name “AD Assignment Resource Pool” | Set-SCOMResourcePool –EnableAutomaticMembership $FALSE
Now I can right click properties of the “AD Assignment Resource Pool” and modify the management server membership. Note: New management servers added to the Management Group will no longer be members of this resource pool automatically.
At this point you may be wondering about workflows targeted to the RMS that are outside of the OpsMgr product groups control (other management packs from different Microsoft teams or third party vendors). In order for us to not to break backwards compatibility and provide support for legacy management packs we decided to leave the Root Management Server instance and add a special role to one of the management servers in the Management group called the RMS Emulator. This RMS Emulator is only for backwards compatibility to legacy management packs and is in no way required for the management group to function correctly.
You can easily tell which management server is the RMS Emulator by opening the Console and navigating to the Management Servers view in the Administration space. We have added a new column called “RMS Emulator”. By default the first management server installed in the management group is the RMS Emulator. When upgrading to OM12 the former RMS is the RMS Emulator. Note: When upgrading from a secondary management server using the UpgradeManagementGroup switch the RMS Emulator is the management server you are running this from. On a follow-up blog post we will dive into more detail on setup and upgrade changes.
We have provided PowerShell cmdlets to move the RMS Emulator from one management server to another incase the management server acting as the RMS Emulator where to fail.
• To identify the current RMS Emulator in PowerShell
get-SCOMRMSemulator
•Move to the another Management Server
–First assign the new RMS Emulator management server to a variable
$MS = get-scommanagementserver –Name <FQDN of Management Server>
Set-SCOMRMSEmulator $MS
•Delete the RMS Emulator
Remove-SCOMRMSEmulator
–Type “Y” to approve
–Run get-SCOMRMSemulator to validate it is removed. You should see a message that says the RMS Emulator Role not found.
•Add RMS Emulator role to the MG
–Run “get-SCOMRMSEmulator” to verify its been created
A few things to keep in mind when planning your OM12 Management groups with the topology changes.
I hope this post has provided you with a lot of information to get you started on designing a Operations Manager 2012 topology. The next post in our series will be about the Setup and Upgrade changes.
Thanks
Rob Kuehfus | Program Manager | System Center
http://blogs.msdn.com/b/sergkanz/
See KB article
http://support.microsoft.com/kb/2592561
For those who already know me, it has been a couple of weeks since I relocated to the Seattle area and started working as a Program Manager on the Operations Manager Application Monitoring team and this is my first post on this blog. For those who don’t know me, I am a new Program Manager on the OpsMgr team and I come from a previous experience in Microsoft, supporting OpsMgr as a Premier Field Engineer.
The area of OpsMgr I am working on is Application Monitoring (or “Application Performance Monitoring”, or shortly APM) – that is the feature in the product that allows you to achieve monitoring of .NET Applications and obtain rich insights into their health. Michael has already blogged about how we acquired a company called AVIcode, how this technology is being integrated in OM2012 and how the deployment and configuration are greatly simplified in this release.
We now have a single agent, a single set of databases, and the only channel used over the network is OM Channel. While Michael has already shown the user experience for this feature, here I want to go a bit deeper and look at the components and architecture “behind the GUI”.
So, first of all, you will have installed OM12 just like Kevin has been teaching you, right? Here’s a diagram which you might find useful to refer to as I go ahead and explain which new pieces you might see as you explore the system and learn the work that those various pieces do.
AGENT Machine
We now have a single agent package/installer. When we push an agent from a Management Server (or install manually), we are really installing two services now: the “usual” OpsMgr Health Service as well as the new "System Center Management APM" service.
Anyhow, this new service is installed but left disabled, therefore it stays “dormant” on most system (similarly to what the “ACS Forwarder” service does) and does nothing until we configure APM. This avoids any un-necessary load on those systems where APM is not going to ever be used.
When you configure APM thru our Template just like Michael has described for you, what happens behind the scenes is that a Management Pack is created, and distributed to the appropriate agents. This MP consists of various things, including configuration for some generic rules and monitors as well as views that are specific to the application being configured. This set of pre-existing, generic rules and monitors will use the configuration to do the following for you (using new write action modules that have been specifically written in order to do this):
This way, you don’t need to perform any other configuration task, or take care of enabling the service yourself – just running the template wizard takes care of this. Once APM is loaded it uses this configuration to start monitoring.
So let’s say that you have enabled monitoring for your web application. The application itself (running inside a W3WP.exe, in IIS7) gets instrumented to load our “APM Agent” code.
In order for this to happen and depending on the configuration, you might need to restart IIS or recycle a specific application pool. This is of course something that can’t and won’t be done automatically – the Operations Team and the Application Owner should always be planning a maintenance window to do this. Anyway, to simplify the process, we’ll raise an Alert telling you that either of these actions is necessary, and the knowledge base in the Alert will provide a link to a Task to perform the IIS Reset or the App Pool Recycle.
APM Agent produces a couple of things:
In case we have also enabled the Client Monitoring feature, as a result of the added instrumentation we will also add some JavaScript into the pages returned to our real end users. This is shown in the diagram as “CSM”, and it is what allows returning information around the load times and exceptions being raised in the browser, as opposed to the server side. This is what enables a deep understanding of the end to end user experience, and breaking that down to the client, network and server side, as shown in the chart below:
Once the data is received, we use new Write Action Modules that have been written to allow the new data types to be inserted in the database, synchronized across OpsDB and DW, and groomed when necessary. As expected, the user can control data retention, grooming and frequency for these processes.
We only have our “familiar” OpsMgr databases: OpsDB and DW – all of the information previously stored by AVIcode in separate databases are now consolidated within OpsMgr databases. This means we have a bunch of new tables in both OpsDB and DW, as well as some new synchronization and grooming mechanisms. As expected, the user can control data retention, grooming and frequency for these processes.
“Application Diagnostics” and “Application Advisor” consoles are now installed together with OpsMgr WebConsole. Why would I use Advisor and Diagnostics as opposed to OpsMgr Console, and what is the need for new consoles?
Albeit your mileage may vary, we found that most of the times Developers may not install the Operations Console, and the Operations people might not need to delve into each and every occurrence of an Exception that happened within an application’s code. With Application Diagnostics and Advisor, as they are web interfaces, access can be given to Developers to directly take a look at what they care most about, without completely entering the realm of Operations and without having to install a separate console.
The following article applies to SCOM 2012 BETA and may or may not apply to RC or RTM release. I’ll try to repro the issue in the upcoming releases to see if the behavior changed and provide updates if necessary.
I guess everyone is testing SCOM 2012 Beta right now and a lot of people are already blogging about their experience. I thought it’s time to do the same and share some experience I had with network discovery.
http://www.code4ward.net/main/Blog/tabid/70/EntryId/105/Troubleshooting-Network-Discovery-in-SCOM-2012.aspx
System Center Operations Manager 2007 SP1 and System Center Operations Manager 2007 R2 now supports SQL Server 2005 SP4. Note: We will have the Supported Configuration and a KB article posted in the next few weeks to make this more official, but feel free to go ahead and install it.
For the most part, nothing special needs to be done when install the Operational, Data Warehouse, and Audit Collection databases. But for the Operations Reporting Role you will need to do the following additional steps to complete the SP4 installation.
1. Open Internet Information Services (IIS) Manager (not 6.0) – found under Administrative Tools from Start menu Within IIS Manager:
Expand local machine connection to see App Pools and Sites
Select Application Pools
Find the app pool created by the Reporting Server installation, which has the Identity column’s value set to the domain account used for the DW Reader account.
Select that app pool and right click, selecting “Advanced Settings” from the context menu
Under the “Process Model” section, change the value for “Identity” from the domain account to “NetworkService”
Click “OK” to close the Advanced Settings dialog and save the changes
With that app pool still selected, click “Recycle” under the “Application Pool Tasks” section of the Actions area to the right
2. Run SQL2005 SP4 – it should now complete successfully NOTE: At this point, if the Console were opened, Reporting would fail to load
3. Within IIS Manager, reverse the previous process:
Find the app pool created by the Reporting Server installation, which has the Identity column’s value set to “NetworkService”
Under the “Process Model” section, change the value for “Identity” from “NetworkService” back to the original domain account
4. Open the Console and Reporting should load successfully
5. Verify that Reports work as expected
Thanks!
Rob Kuehfus | Program Manager | System Center Operations Manager
http://www.systemcentersolutions.com/blog/
I am a Microsoft System Center Operations Manager MVP and work for AKCSL, a Microsoft Gold Partner in the UK.
I’ve been working with Enterprise Management Systems since 1999, when I joined NetIQ to do implementation and training for their AppManager product in Enterprise Accounts throughout Europe before moving on to work with Operations Manager.
In between bouts of walking, sailing and photography I have even been known to do some work. I’ve been “in IT” for nearly 15 years and these days specialise in designing, implementing and customising solutions that leverage the Microsoft System Center suite. I’ve been using Operations Manager since 1999 when it was a Mission Critical Software \ NetIQ product and have enjoyed watching it evolve over the years into the most popular windows monitoring solution on the market.
If you’d like more information on any of the System Center Products or demonstrations on funtionality then please feel free to contact me.
The SCOM team is very happy to announce the release of Cumulative Update 5 for System Center Operations Manager 2007 R2.
Cumulative Update 5 for Operations Manager 2007 R2 resolves the following issues:
Cross Platform Cumulative Update 5 for Operations Manager 2007 R2 resolves the following issues:
Cross Platform Cumulative Update 5 for Operations Manager 2007 R2 adds the following features:
For additional information about this release, please see the CU5 KB article on TechNet.