August, 2011

Was this helpful? Share It!
  • System Center: Operations Manager Engineering Team Blog

    Operations Manager 2012 Beta - Usage Survey - Win and Xbox and Kinect

    • 0 Comments

    Now that you have had a chance to test SCOM 2012, please provide feedback!

    Usage Survey

     

    Allow about 20 minutes to complete this survey.

     

    Complete the survey and qualify to Win an Xbox & Kinect!

    Sweepstakes Official Rules

  • System Center: Operations Manager Engineering Team Blog

    Application Monitoring – Working with Alerts

    • 0 Comments

    Our team has made a few posts around APM with Operations Manager 2012, how to get things running, how it works, and how to simulate errors for testing. Here I’m going to talk about the application centric alerts you will see in OM when you start using APM.

    There are two main consoles used for working with APM events, the Operations Console and the Application Diagnostics Console.  The Operations Console is where you do your alert triaging and this is the same as working with any other feature of Operations Manager. The Application Diagnostics Console, installed when you install the Web Console, provides deeper diagnostics for the application events that are collected.

    By default performance thresholds are set quite high; this is because you want to make sure you find the major bottlenecks before you start looking deeper.  The first time you configure monitoring, use the defaults and then work on tightening the monitoring as you find and resolve the main issues within the application.  A good target for the performance threshold is a 5-8 second response time for web applications, this is the threshold at which users typically start to abandon pages due to the perception they are ‘slow’.

    Operations Console

    With the Operations console there are a couple of ways we raise alerts:

    Alerting Rules

    There is a rule for each type of event we alert on: Performance, Connectivity, Security and Application Failure.  We raise an individual alert when those types of events are detected in the monitored application.  These alerts do not affect the health state of the monitored application since a single performance or exception event doesn’t mean your application is unhealthy. 

    These alerts provide a deep dive into the issues that are happening with the monitored application. Performance alerts provide context around the slow calls and which tier is the root cause of the issue. Exception alerts tell you the type of exception raised, where it came from and the call stack that led up to it.  This is the information you need to know so that alerts can be handled correctly: was the root cause a ‘slow query’, ‘connection refused by host’, 'Invalid Logon’, etc. 

    Monitors

    Following the mantra that a single captured event does not make our application unhealthy, we have 3 monitors defined for the applications that monitor performance counters that get registered when the System Center Management APM service is installed:

    1. Average Response Time: monitors the .NET Apps/Avg. Request Time performance counter
    2. % Exception Events: monitors the .NET Apps/% Exception Events/sec performance counter
    3. % Performance Events: monitors the .NET Apps/% Performance Events/sec performance counter

    The % Exception Events and % Performance Events monitors are your indicators that the application’s reliability is on the decline.  If you are getting a high number of exception or performance events, these monitors will let you know and turn your application unhealthy since it’s time to dig into the individual performance and exception events to find the root cause.

    Working with Alerts

    Alerts are only raised in the Operations Console, but the underlying events can be accessed through the Operations Console (alerting rules) or Application Diagnostics.

    Moving between consoles

    When server or client performance or exception alerts are raised you are given a description of the problem, KB around what the problem signifies and Alert Context that provides a closer look at the cause of the issue. 

    When working with the Alert Context there is a link in the top left corner that allows you to transition from the Operations Console to Application Diagnostics.  With Application Diagnostics you can dig deeper into the alert and look at not only the current event but also related events, similar events, event chains and a snapshot of the server performance at the time of the event.  These concepts are outlined in more detail in the Operations Guide.

    Controlling Alerts

    Finding that you are getting too many alerts in OM?  You can disable the alerting rules that I outlined above by un-checking Performance and Exception event monitoring in the template.  This will stop the alerts from being raised in OM, but they will continue to be logged to Application Diagnostics.

    The flow for working with alerts changes a bit when you do this, now you use the monitors in OM to be notified that there are a large % of performance or exception events occurring and you use Application Diagnostics to drill into the problems.  This works well if triaging is done solely by the application team, they can use Application Diagnostics directly and the Operation team can keep the application specific alerts out of the Operations Console.   The downside is that you won’t be able to forward the performance and exception alerts through connectors since they don’t get raised in OM.

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.

     

  • System Center: Operations Manager Engineering Team Blog

    Topology changes in System Center 2012 Operations Manager (Overview)

    • 2 Comments

    OM Community,

    In this blog post, I will explain the changes made to the Operations Manager 2012 infrastructure topology.  The purpose of this post is not to do a deep technical explanation on how some of these new features work but more of an overview around the new changes and how they may affect you.  Over the next few months, we (Operations Manager Team) plan to blog additional technical details.

    First things first(let’s review):

    In previous versions, Operations Manager (2007, 2007 SP1, R2) had a parent-child topology, meaning that in a Management Group a Management Server called the Root Management Server (commonly known as the RMS) acted as a parent to one or more secondary Management Servers or Gateways.  The RMS has many unique responsibilities in the Management Group (see below)

    image

    The RMS provides the following services:

    • Console access
    • Role based access control
    • Distribution of configurations to agents
    • Connectors to other mgmt systems
    • Alert notifications
    • Health aggregation
    • Group Calculations
    • Availability
    • Dependency Monitor
    • DB Grooming
    • Enables model based mgmt

    The RMS also introduces the following customer challenges:

    • Performance and scalability bottleneck
    • Single point of failure (for RMS workloads)
    • High availability requires clustering

    With these kind of challenges it was very important for most IT & Operations teams to ensure their RMS was highly available and easily recoverable in cases of a disaster. 

    This left them with two options:

    1. Cluster the Root Management Server (see picture)
    2. Promote a secondary Management Server to the RMS

    image

    Unfortunately, both of these options created additional complexity and burdens to the IT & Operations teams,  Windows clustering is complex to setup and requires additional shared storage.  Patching a clustered RMS was cumbersome and prone to creating instability in the Management Group.  Promoting a secondary Management Server was a manual process that required the person to run a specialized command line tool then change multiple configuration files and registry keys on the other components like Reporting Server, Web Console, or other Management Servers.  Depending on the customers SLA to the business they would implement one or both of these solutions to ensure some level of availability in the Management Group.

    Also, by having this single point of failure in the Management Group it has created a bottleneck that limits the scale out numbers around how many Windows Agents, Unix Agents and console a single Management Group can support. 

    During product planning for OM12, we quickly identified this as one of our highest priorities.  By removing the single point of failure we can provide our customers a much better story around High Availability and lower their costs to maintaining a Operations Manager infrastructure.  Also, we can scale the Management Group out to support new OM12 features like Network Monitoring and Application Monitoring (APM). 

    The Major Change (RMS Removal)

    After an in-depth investigation we decided to remove the Root Management server role from a Operations Manager topology.  As a result, we needed to figure out how to distribute the workloads the RMS performed.  This boils down to three things.

    1. SDK Service - Make sure this service is running on all Management servers and that any kind of SDK client (Console, Web Console, Connector, PowerShell) can connect to it. 
    2. Configuration Service (responsible for detecting and issuing new configuration to all Health Services in the Management Group).  Federate the Configuration Service to each Management Server so they all work together to keep the Management Group up to date.
    3. Health Service - Balance the RMS specific workloads amongst all Management servers in the Management Group and make sure during a failure the work is redistributed.

     

    SDK Service

    In OM12, setup sets this service to automatically start on every Management Server during install.  We support any SDK client connecting to any Management Server.  At Beta, you will need to configure NLB on the SDK Service for automatic failover.

    Configuration Service

    In order to federate the Config Service we needed to rewrite the config service almost completely.  If you remember in OM 2007 versions the RMS always required a huge amount of memory to properly function.  One of the main reasons for this was the Config service.  You see every time the Config Service starts, it reads the Operational Database and loads its view of the instance space into memory in XML.  In larger Management Groups, this file can easily grow to over 6 GB.  The Config Service uses this file to compare against the Operational Database to detect changes and issue new configuration to Health Services.  Now that every Management Server will have a running active Configuration Service it is not reasonable to store this in memory any longer.  Moving forward the Config Service will store this data in a centralized database (Operational db) that all Config Services in the Management Group participate in keeping up to date and utilizing it to detect configurations changes to the instance space.  A fantastic benefit that came out of this design is a much faster startup of Config Service.  Once the database is initially created, on subsequent starts the Config Service does not need to rebuild this database from scratch and instead just maintains it.  Therefore, it starts issuing configuration much sooner after restart.  This is a major improvement over OM 2007 versions where in a large management group it could take up to an hour to start issuing configuration to agents.

    Health Service (Resource Pools)

    To distribute the RMS specific workloads to all management servers, we needed to develop a mechanism for each Health Service on the management server to function independently, while still having awareness of the workloads the other management servers are performing.  This helps to ensure we do not get workflow duplication or missed workflows.  To achieve this we added a new feature to OM12 called Resource Pool.  Resource Pools are a collection of Health Services working together to manage instances assigned to the pool.  Workflows targeted to the instances are loaded by the Health Service in the Resource Pool that ends up managing that instance.  If one of the Health Services in the Resource pool were to fail, the other Health Services in the pool will pick up the work that the failed member was running.  We also use Resource Pools to bring high availability to other product features like Networking and Unix monitoring.  In a follow up blog post I will dive into far more detail on how resource pools work and how to tell where things are running.

    To distribute the RMS specific workloads we create three resource pools by default.

    image

    • All Management Servers Resource Pool – We have re-targeted most RMS specific instances and workflows to this pool.  By default if a instance does not have a “should manage” relationship set the Configuration Service will assign it to this pool.

    image

    Update April 2, 2012:Notifications are no longer managed by the All Management Servers Pool.  There is a dedicated Notifications Resource Pool now which is described below.

    • Notifications Resource Pool – We have re-targeted the Alert Subscription Service instance to this pool.  The reason we did not use the “All Management Servers Resource Pool” was so you can easily remove the management servers from the pool that should not be participating in Notifications.  For example, you may have three management servers but only one SMS modem.  You would remove the other management servers from the pool so the Notifications workflows do not run where no modem is present.
    • AD Assignment Resource Pool – Again we have re-targeted the AD Integration workflows to this pool so you can more easily control the location around where the AD assignment workflows will be running.

    Notice in the screen capture above we have a column called Membership and it is set to “Automatic” for the default pools.  This means all management servers in the management group are automatically a member of these pools.  In order to change this you need to open PowerShell and run a PowerShell command (see below).

    Get-SCOMResourcePool –Name “AD Assignment Resource Pool” | Set-SCOMResourcePool –EnableAutomaticMembership $FALSE

    image

    image

    Now I can right click properties of the “AD Assignment Resource Pool” and modify the management server membership.  Note: New management servers added to the Management Group will no longer be members of this resource pool automatically.

    image

    image

    At this point you may be wondering about workflows targeted to the RMS that are outside of the OpsMgr product groups control (other management packs from different Microsoft teams or third party vendors).  In order for us to not to break backwards compatibility and provide support for legacy management packs we decided to leave the Root Management Server instance and add a special role to one of the management servers in the Management group called the RMS Emulator.  This RMS Emulator is only for backwards compatibility to legacy management packs and is in no way required for the management group to function correctly. 

    You can easily tell which management server is the RMS Emulator by opening the Console and navigating to the Management Servers view in the  Administration space.  We have added a new column called “RMS Emulator”.  By default the first management server installed in the management group is the RMS Emulator.  When upgrading to OM12 the former RMS is the RMS Emulator.  Note: When upgrading from a secondary management server using the UpgradeManagementGroup switch the RMS Emulator is the management server you are running this from.  On a follow-up blog post we will dive into more detail on setup and upgrade changes. 

    image

    We have provided PowerShell cmdlets to move the RMS Emulator from one management server to another incase the management server acting as the RMS Emulator where to fail. 

    image

    • To identify the current RMS Emulator in PowerShell

     get-SCOMRMSemulator

    •Move to the another Management Server

    –First assign the new RMS Emulator management server to a variable

    $MS = get-scommanagementserver –Name <FQDN of Management Server>

    Set-SCOMRMSEmulator $MS

    •Delete the RMS Emulator

    Remove-SCOMRMSEmulator

    –Type “Y” to approve

    –Run get-SCOMRMSemulator to validate it is removed. You should see a message that says the RMS Emulator Role not found.

    •Add RMS Emulator role to the MG

    –First assign the new RMS Emulator management server to a variable

    $MS = get-scommanagementserver –Name <FQDN of Management Server>

    Set-SCOMRMSEmulator $MS

    –Run “get-SCOMRMSEmulator” to verify its been created

    Design Considerations

    A few things to keep in mind when planning your OM12 Management groups with the topology changes.

    1. Due to the introduction of Resource Pools it is recommended that all management server have no more then 5ms latency between them.  This means that if you are currently using management servers in multiple datacenters or sites we recommend you move all management servers to a single data center and use Gateway servers at the other sites.
    2. Moving forward the Product Group recommendation will always be to have two management servers in a Management Group at all times.  By doing this you will always have High Availability for your management group and a much easier recovery during a disaster.

    I hope this post has provided you with a lot of information to get you started on designing a Operations Manager 2012 topology.  The next post in our series will be about the Setup and Upgrade changes.

    Thanks

    Rob Kuehfus | Program Manager | System Center

    Disclaimer

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.

  • System Center: Operations Manager Engineering Team Blog

    Meet Sergey Kanzhelev, developer on the Operations Manager Team

    • 0 Comments

    http://blogs.msdn.com/b/sergkanz/

  • System Center: Operations Manager Engineering Team Blog

    Guidance, Tuning and Known Issues for the Exchange 2010 Management Pack for System Center Operations Manager 2007

    • 0 Comments

    See KB article

    http://support.microsoft.com/kb/2592561

  • System Center: Operations Manager Engineering Team Blog

    Application Monitoring Architecture in OpsMgr 2012 Beta

    • 7 Comments

    For those who already know me, it has been a couple of weeks since I relocated to the Seattle area and started working as a Program Manager on the Operations Manager Application Monitoring team and this is my first post on this blog. For those who don’t know me, I am a new Program Manager on the OpsMgr team and I come from a previous experience in Microsoft, supporting OpsMgr as a Premier Field Engineer.

    The area of OpsMgr I am working on is Application Monitoring (or “Application Performance Monitoring”, or shortly APM) – that is the feature in the product that allows you to achieve monitoring of .NET Applications and obtain rich insights into their health. Michael has already blogged about how we acquired a company called AVIcode, how this technology is being integrated in OM2012 and how the deployment and configuration are greatly simplified in this release.

    We now have a single agent, a single set of databases, and the only channel used over the network is OM Channel. While Michael has already shown the user experience for this feature, here I want to go a bit deeper and look at the components and architecture “behind the GUI”.

    So, first of all, you will have installed OM12 just like Kevin has been teaching you, right? Here’s a diagram which you might find useful to refer to as I go ahead and explain which new pieces you might see as you explore the system and learn the work that those various pieces do.

    APM Architecture in OM2012

    AGENT Machine

    We now have a single agent package/installer. When we push an agent from a Management Server (or install manually), we are really installing two services now: the “usual” OpsMgr Health Service as well as the new "System Center Management APM" service.

    Anyhow, this new service is installed but left disabled, therefore it stays “dormant” on most system (similarly to what the “ACS Forwarder” service does) and does nothing until we configure APM. This avoids any un-necessary load on those systems where APM is not going to ever be used.

    When you configure APM thru our Template just like Michael has described for you, what happens behind the scenes is that a Management Pack is created, and distributed to the appropriate agents. This MP consists of various things, including configuration for some generic rules and monitors as well as views that are specific to the application being configured. This set of pre-existing, generic rules and monitors will use the configuration to do the following for you (using new write action modules that have been specifically written in order to do this):

    • Write (or update) the right configuration files that the “System Center Management APM” service needs
    • Set up the service for automatic start up, and enable it

    This way, you don’t need to perform any other configuration task, or take care of enabling the service yourself – just running the template wizard takes care of this. Once APM is loaded it uses this configuration to start monitoring.

    APM.Agent

    So let’s say that you have enabled monitoring for your web application. The application itself (running inside a W3WP.exe, in IIS7) gets instrumented to load our “APM Agent” code.

    In order for this to happen and depending on the configuration, you might need to restart IIS or recycle a specific application pool. This is of course something that can’t and won’t be done automatically – the Operations Team and the Application Owner should always be planning a maintenance window to do this. Anyway, to simplify the process, we’ll raise an Alert telling you that either of these actions is necessary, and the knowledge base in the Alert will provide a link to a Task to perform the IIS Reset or the App Pool Recycle.

    IIS Application Pools recycle is required Alert

     

    APM Agent produces a couple of things:

    • “Events” (“APM Events” in my diagram) that report about:
      • Application Exception Events (handled or un-handled by your code) that we are detecting
      • Performance Events – method calls in your monitored application that exceed the specified thresholds
    • Both the above types of events will also contain a snapshot of how the machine’s performance looked like around the time of the exception, and 15 minutes earlier (we keep watching a few key counters in a sliding window so that when we generate an event, such a snapshot of the performance of the machine around that time is ready and can be quickly attached to the event)
      • “.Net Apps” Performance Counters presenting numerical information about exceptions and performance events as they are occurring

    In case we have also enabled the Client Monitoring feature, as a result of the added instrumentation we will also add some JavaScript into the pages returned to our real end users. This is shown in the diagram as “CSM”, and it is what allows returning information around the load times and exceptions being raised in the browser, as opposed to the server side. This is what enables a deep understanding of the end to end user experience, and breaking that down to the client, network and server side, as shown in the chart below:

    Measuring User Experience in the Browser

     

    MANAGEMENT SERVERS

    Once the data is received, we use new Write Action Modules that have been written to allow the new data types to be inserted in the database, synchronized across OpsDB and DW, and groomed when necessary. As expected, the user can control data retention, grooming and frequency for these processes.

    DATABASES

    We only have our “familiar” OpsMgr databases: OpsDB and DW – all of the information previously stored by AVIcode in separate databases are now consolidated within OpsMgr databases. This means we have a bunch of new tables in both OpsDB and DW, as well as some new synchronization and grooming mechanisms. As expected, the user can control data retention, grooming and frequency for these processes.

    UI / CONSOLE

    “Application Diagnostics” and “Application Advisor” consoles are now installed together with OpsMgr WebConsole. Why would I use Advisor and Diagnostics as opposed to OpsMgr Console, and what is the need for new consoles?

    • Application Diagnostics organizes and links events across application components
    • Application Advisor  provides rich, details reports highlighting the top issues within your applications and environment as a whole.

    Albeit your mileage may vary, we found that most of the times Developers may not install the Operations Console, and the Operations people might not need to delve into each and every occurrence of an Exception that happened within an application’s code. With Application Diagnostics and Advisor, as they are web interfaces, access can be given to Developers to directly take a look at what they care most about, without completely entering the realm of Operations and without having to install a separate console.

    image

    Disclaimer

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.

  • System Center: Operations Manager Engineering Team Blog

    Troubleshooting Network Discovery in SCOM 2012 by Stefan Koell (MVP)

    • 0 Comments

    The following article applies to SCOM 2012 BETA and may or may not apply to RC or RTM release. I’ll try to repro the issue in the upcoming releases to see if the behavior changed and provide updates if necessary.

    I guess everyone is testing SCOM 2012 Beta right now and a lot of people are already blogging about their experience. I thought it’s time to do the same and share some experience I had with network discovery.

    http://www.code4ward.net/main/Blog/tabid/70/EntryId/105/Troubleshooting-Network-Discovery-in-SCOM-2012.aspx

     

  • System Center: Operations Manager Engineering Team Blog

    SCOM 2007 R2 and SP1 now supports SQL Server 2005 SP4

    • 0 Comments

    OM Community,

    System Center Operations Manager 2007 SP1 and System Center Operations Manager 2007 R2 now supports SQL Server 2005 SP4.  Note: We will have the Supported Configuration and a KB article posted in the next few weeks to make this more official, but feel free to go ahead and install it.   

    For the most part, nothing special needs to be done when install the Operational, Data Warehouse, and Audit Collection databases.  But for the Operations Reporting Role you will need to do the following additional steps to complete the SP4 installation.

    1. Open Internet Information Services (IIS) Manager (not 6.0) – found under Administrative Tools from Start menu
    Within IIS Manager:

    Expand local machine connection to see App Pools and Sites

    Select Application Pools

    Find the app pool created by the Reporting Server installation, which has the Identity column’s value set to the domain account used for the DW Reader account.

    Select that app pool and right click, selecting “Advanced Settings” from the context menu

    Under the “Process Model” section, change the value for “Identity” from the domain account to “NetworkService”

    Click “OK” to close the Advanced Settings dialog and save the changes

    With that app pool still selected, click “Recycle” under the “Application Pool Tasks” section of the Actions area to the right

    2. Run SQL2005 SP4 – it should now complete successfully
    NOTE: At this point, if the Console were opened, Reporting would fail to load


    3. Within IIS Manager, reverse the previous process:

    Expand local machine connection to see App Pools and Sites

    Select Application Pools

    Find the app pool created by the Reporting Server installation, which has the Identity column’s value set to “NetworkService”

    Select that app pool and right click, selecting “Advanced Settings” from the context menu

    Under the “Process Model” section, change the value for “Identity” from “NetworkService” back to the original domain account

    Click “OK” to close the Advanced Settings dialog and save the changes

    With that app pool still selected, click “Recycle” under the “Application Pool Tasks” section of the Actions area to the right

    4. Open the Console and Reporting should load successfully


    5. Verify that Reports work as expected

    Thanks!

    Rob Kuehfus | Program Manager | System Center Operations Manager

  • System Center: Operations Manager Engineering Team Blog

    Graham Davies - MVP

    • 0 Comments

    http://www.systemcentersolutions.com/blog/

    I am a Microsoft System Center Operations Manager MVP and work for AKCSL, a Microsoft Gold Partner in the UK.

    I’ve been working with Enterprise Management Systems since 1999, when I joined NetIQ to do implementation and training for their AppManager product in Enterprise Accounts throughout Europe before moving on to work with Operations Manager.

    In between bouts of walking, sailing and photography I have even been known to do some work. I’ve been “in IT” for nearly 15 years and these days specialise in designing, implementing and customising solutions that leverage the Microsoft System Center suite. I’ve been using Operations Manager since 1999 when it was a Mission Critical Software \ NetIQ product and have enjoyed watching it evolve over the years into the most popular windows monitoring solution on the market.

    If you’d like more information on any of the System Center Products or demonstrations on funtionality then please feel free to contact me.

  • System Center: Operations Manager Engineering Team Blog

    Cumulative Update 5 for OpsMgr 2007 R2 is now available

    • 0 Comments

    The SCOM team is very happy to announce the release of Cumulative Update 5 for System Center Operations Manager 2007 R2.

     Cumulative Update 5 for Operations Manager 2007 R2 resolves the following issues:

    • Restart of non-Operations Manager services when the agent is updated.
    • Updated ACS reports.
    • TCP Port Probe incorrectly reports negative ping latency.
    • MissingEvent Manual Reset Monitor does not work as expected.
    • Drill-through fails due to rsParameterTypeMismatch in the EnterpriseManagementChartControl.
    • ACS - Event log message is truncated or corrupted in SCDW.
    • UI hang caused by SDK locking.
    • ACS Filter fails for certain wildcard queries.
    • Edit Schedule button is disabled with SQL Server 2008 R2.
    • Web console is timing out while opening the left navigation tree.
    • Scheduled Reports view for Windows Server 2003 and SQL SRS 2005 SP3 CU9 - returns System.IndexOutOfRangeException: Index was outside the bounds of the array.
    • Signed MPs cannot be imported when new attributes are added to existing classes.

    Cross Platform Cumulative Update 5 for Operations Manager 2007 R2 resolves the following issues:

    • Performance data for LVM managed partitions is not available
    • Process monitor does not retain name if run via symbolic link
    • AIX with large number of processes crashes with bad alloc

    Cross Platform Cumulative Update 5 for Operations Manager 2007 R2 adds the following features:

    • Support for Red Hat 6

    For additional information about this release, please see the CU5 KB article on TechNet.

     

Page 1 of 2 (13 items) 12
Was this helpful? Share it!