• Tweaking SCOM 2012 Management Servers for large environments

     

    There are many articles on tweaking certain registry settings for SCOM agents, Gateways, and Management servers, for many reasons.  Large deployments, custom 3rd party MP’s, monitoring Exchange 2010 to name a few.  Matt Goedtel has a good list on his blog:  http://blogs.technet.com/b/mgoedtel/archive/2010/08/24/performance-optimizations-for-operations-manager-2007-r2.aspx

     

    The default settings in SCOM 2012 work for MOST environments, out of the box.  It is fairly rare to have to change these settings, and should only be done with the understanding of each setting, and why you’d be adjusting it.

     

    Below – I’d like to post some settings that I change on Management Servers, when monitoring very large environments.  What does “very large” mean?  Well, I’d characterize that as a management group with a very large agent count (>5000), or a very large instance space (lots of Management Packs deployed both MS and 3rd party, and custom MP’s which don’t always behave well).  Perhaps you have a very large number of groups, or groups with complex expressions.  It could be your are monitoring a large number of “agentless” items, such as Linux servers, or Network Devices, or URLs, etc.

    I stress – these settings are NOT designed to be changed for all SCOM deployments.  These will not make your SCOM deployment “run better” or “faster”.  These are simply commonly required changes for large scale deployments under specific scenarios.

     

    All management servers, that host a large amount of agentless objects, which results in the MS running a large number of workflows: (network/URL/Linux/3rd party/VEEAM)  This is an ESE DB setting which controls how often ESE writes to disk.  A larger value will decrease disk IO caused by the SCOM healthservice but increase ESE recovery time in the case of a healthservice crash. 
    Key: 
        HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
    REG_DWORD Decimal Value: 
        Persistence Checkpoint Depth Maximum = 104857600
    SCOM 2012 default existing registry value = 20971520

    All management servers in a large management group:  This sets the maximum size of healthservice internal state queue.  It should be equal or larger than the number of monitor based workflows running in a healthservice.  Too small of a value, or too many workflows will cause state change loss.  http://blogs.msdn.com/b/rslaten/archive/2008/08/27/event-5206.aspx
    Key: 
        HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
    REG_DWORD Decimal Value: 
        State Queue Items = 20480
    SCOM 2012 default existing registry value: not present.  Value must be created.  Default code value = 10240

    All management servers, that participate in any resource pools, that run a large number of workflows:
    Key:
        HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\
    REG_DWORD Decimal Value: 
        PoolLeaseRequestPeriodSeconds = 600
        PoolNetworkLatencySeconds = 120
    SCOM 2012 existing registry value:  not present (must create PoolManager key and both values)  Default code value =  120/30 seconds

    All management servers that participate in the All Management Servers resource pool, that have a large agent count or large number of groups:  This setting will slow down how often group calculation runs to find changes in group memberships.  Group calculation can be very expensive, especially with a large number of groups, large agent count, or complex group membership expressions.  Slowing this down will help keep groupcalc from consuming all the healthservice and database I/O.
    Key: 
        HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\
    REG_DWORD Decimal Value: 
        GroupCalcPollingIntervalMilliseconds = 900000
    SCOM 2012 existing registry value:  not present (must create value).  Default code value = 30000 (30 seconds)

    All management servers in a management group, this helps with dataset maintenance as the default timeout of 10 minutes is often too short.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance.  This is a very common issue.   http://blogs.technet.com/b/kevinholman/archive/2010/08/30/the-31552-event-or-why-is-my-data-warehouse-server-consuming-so-much-cpu.aspx
    Key:
        HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
    REG_DWORD Decimal Value:
        Command Timeout Seconds = 1200
    SCOM 2012 existing registry value: not preset (must create "Data Warehouse" key and value)  Default in code value = 300

    All management servers in ANY management group.  This setting configures the SDK service to attempt a reconnection to SQL server upon disconnection, on a regular basis.  Without these settings, an extended SQL outage can cause a management server to never reconnect back to SQL when SQL comes back online after an outage.   Per:  http://support.microsoft.com/kb/2913046/en-us  All management servers in a management group should get the following:
    Key:
        HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\
    REG_DWORD Decimal Value:
        DALInitiateClearPool = 1
        DALInitiateClearPoolSeconds = 60
    SCOM 2012 existing registry value:   not present - code default - 30 seconds?

    To summarize:

    Registry Key

    Reg DWORD Value Name Reg DWORD Decimal Value

    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\

    Persistence Checkpoint Depth Maximum 104857600

    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\

    State Queue Items 20480

    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\

    PoolLeaseRequestPeriodSeconds

    600

    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\

    PoolNetworkLatencySeconds 120

    HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\

    GroupCalcPollingIntervalMilliseconds 900000

    HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\

    Command Timeout Seconds 1200

    HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\

    DALInitiateClearPool 1

    HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\

    DALInitiateClearPoolSeconds 60

     

    ****NOTE:

    On modifying the following:

        HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\
    REG_DWORD Decimal Value: 
        PoolLeaseRequestPeriodSeconds = 600
        PoolNetworkLatencySeconds = 120

    This should NOT be done unless you are guided to by Microsoft support, generally speaking.  If you make changes to this setting, the same change must be made on ALL management servers, otherwise the resource pools will constantly fail.  All management servers must have identical settings here.  If you add a management server in the future, this setting must be applied immediately if you modified it on other management servers, or you will see your resource pools constantly committing suicide and failing over to other management servers, reinitializing all workflows in a loop.   All the other settings in this article are generally beneficial.  This specific one for PoolManager should receive great scrutiny before changing, due to the risks.

     

     

    Below are some simple reg add statement examples on how you can run to make setting these easy:

    reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "State Queue Items" /t REG_DWORD /d 20480 /f
    reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "Persistence Checkpoint Depth Maximum" /t REG_DWORD /d 104857600 /f
    reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0" /v "GroupCalcPollingIntervalMilliseconds" /t REG_DWORD /d 900000 /f
    reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Command Timeout Seconds" /t REG_DWORD /d 1200 /f
    reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPool" /t REG_DWORD /d 1 /f
    reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPoolSeconds" /t REG_DWORD /d 60 /f

  • WMI Leaks Memory on Windows Server 2012 R2 Domain Controller / DNS server roles – Hotfix available

     

    There was an issue when you monitored DNS server roles on Windows Server 2012 R2 servers.  The DNS PowerShell WMI provider would leak memory each time it was called.  When you monitor DNS, and leverage this WMI provider, you would see an aggressive memory leak occur in ONE of the WmiPrvSE.exe processes on the server. 

    This leak would continue until the WMI process reached around 500 to 600 MB of private bytes, until the WMI process would eventually become unresponsive, and crash:

    Log Name:      Application
    Source:        Application Error
    Date:          6/2/2014 4:15:39 PM
    Event ID:      1000
    Task Category: (100)
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      DC01.opsmgr.net
    Description:
    Faulting application name: wmiprvse.exe, version: 6.3.9600.16384, time stamp: 0x5215f9c9
    Faulting module name: DnsServerPsProvider.dll, version: 6.3.9600.16384, time stamp: 0x5215e759
    Exception code: 0xc0000005
    Fault offset: 0x00000000000ef9d1
    Faulting process id: 0x16b4
    Faulting application start time: 0x01cf7c789301e26b
    Faulting application path: C:\Windows\system32\wbem\wmiprvse.exe
    Faulting module path: C:\Windows\System32\wbem\DnsServerPsProvider.dll
    Report Id: 0b622ace-ea9b-11e3-80ce-00155d0ad51b
    Faulting package full name:
    Faulting package-relative application ID:

    During this time just before the crash, SCOM management packs querying WMI might generate alerts, such as:

    Script Based Test Failed to Complete. 

    The error returned was: 'Object required' (0x1A8)

    Failed to convert to UTC time.
    The error returned was: 'No more threads can be created in the system.' (0x800700A4)

    Operations Manager failed to run a WMI query

    HRESULT: 0x800700a4
    Details: No more threads can be created in the system.

    Windows DNS - WMI Validation Failed

    Testing the WMI namespace root\MicrosoftDNS has failed twice in a row.

    HRESULT: 0x8004101d
    Details: Unexpected error

    If you monitor the WMI process private bytes memory utilization, you will see the leak quite clearly:

    image

     

    There is now a hotfix to address this issue!

    I recommend applying this hotfix as soon as possible to any DNS server or Domain Controller running the DNS server role.

     

    The hotfix/KB article for this specific issue is located at:

    http://support.microsoft.com/kb/2954185

     

    You can apply the hotfix in one of two very specific ways:

    Option 1:  Apply the May 2014 Windows Server Hotfix Rollup for WS2012R2 (2955164) which includes this fix: 

    http://support.microsoft.com/kb/2955164

    Option 2:  Apply the April 2014 Windows Server Hotfix Rollup for WS2012R2 (2919355) *and* then the specific hotfix for the issue (2954185)

    http://support.microsoft.com/kb/2919355

    http://support.microsoft.com/kb/2954185

     

    And remember – I also recommend the following hotfix in addition – to resolve a problem with the agents failing on Windows Server 2012 R2 Domain Controllers:  http://blogs.technet.com/b/kevinholman/archive/2014/03/03/agents-on-windows-2012-r2-domain-controllers-can-stop-responding-or-heart-beating.aspx

     

    I have added both of these to my recommended SCOM Hotfix list:

    http://blogs.technet.com/b/kevinholman/archive/2009/01/27/which-hotfixes-should-i-apply.aspx

  • UR2 for SCOM 2012 R2 – Step by Step

     

    Sorry I am a bit behind in publishing this post.  Smile

    image

     

    KB Article:   http://support.microsoft.com/kb/2929891

    Download catalog site:  http://catalog.update.microsoft.com/v7/site/Search.aspx?q=2929891

     

    Key fixes:

    Issue 1 - This update rollup makes the stored procedure performance aggregate more robust against out-of-range values.
    Issue 2 - Adding multiple regular expressions (RegEx) to a group definition causes an SQL exception when the group is added or run.
    Issue 3 - Web applications fail when they are monitored by the System Center Operations Manager 2012 R2 APM agent.
    Issue 4 - Service Level Objectives (SLO) dashboards sometimes load in several seconds and sometimes take minutes to load. Additionally, the dashboard is empty after it loads in some cases.
    Issue 5 - Operations Manager Console crashes when you try to override the scope in the Authoring pane.
    Issue 6 - The System Center Operations Manager console is slow to load views if you are a member of a custom Operator role.
    Issue 7 - This update rollup includes a fix for the dashboard issue that was introduced in Update Rollup 1.
    Issue 8 - SQL Time Out Exceptions for State data (31552 events) occur when you create Data Warehouse workflows.
    Issue 9 - This update rollup includes a fix for the Event Data source.

    Xplat updates:
    Issue 1 - All IBM WebSphere application servers that run on Linux or AIX computers are not automatically discovered by the Management Pack for Java Enterprise Edition (JEE) if multiple application servers are defined in a single WebSphere profile.

     

    Lets get started.

    From reading the KB article – the order of operations is:

     

    1. Install the update rollup package on the following server infrastructure:
      • Management servers
      • Gateway servers
      • Web console server role computers
      • Operations console role computers
    2. Apply SQL scripts (see installation information).
    3. Manually import the management packs.
    4. Update Agents

    Now, we need to add another step – if we are using Xplat monitoring – need to update the Linux/Unix MP’s and agents.

           5.  Update Unix/Linux MP’s and Agents.

     

    1.  Management Servers

       image

    Since there is no RMS anymore, it doesn’t matter which management server I start with.  There is no need to begin with whomever holds the RMSe role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

    I can apply this update manually via the MSP files, or I can use Windows Update.  I have 3 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

    The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

    image

     

    Then extract the contents:

    image

    Once I have the MSP files, I am ready to start applying the update to each server by role.

    ***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator (SA) role to the database instances that host your OpsMgr databases.

    My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

    image

    This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update does not provide any feedback that it had success or failure.  You can check the application log for the MsiInstaller events for that:

    Log Name:      Application
    Source:        MsiInstaller
    Date:          6/2/2014 1:58:33 PM
    Event ID:      1035
    Task Category: None
    Level:         Information
    Keywords:      Classic
    User:          OPSMGR\kevinhol
    Computer:      SCOM01.opsmgr.net
    Description:
    Windows Installer reconfigured the product. Product Name: System Center Operations Manager 2012 Server. Product Version: 7.1.10226.1015. Product Language: 1033. Manufacturer: Microsoft Corporation. Reconfiguration success or error status: 0.

     

    You can also spot check a couple DLL files for the file version attribute. 

    image

     

    Next up – run the Web Console update:

    image

    This runs much faster.   A quick file spot check:

    image

    Lastly – install the console update (make sure your console is closed):

    image

    A quick file spot check:

    image

     

    Secondary Management Servers:

    image

    I now move on to my secondary management servers, applying the server update, then the console update. 

    On this next management server, I will use Windows Update.  I check online, and make sure that I have configured Windows Update to give me updates for additional products:

    image29

    This shows me two applicable updates for this server:

    image

    I apply these updates (along with some additional Windows Server Updates I was missing, and reboot each management server, until all management servers are updated.

     

    Updating Gateways:

    image

    I can use Windows Update or manual installation.

    image

    The update launches a UI and quickly finishes.

    Then I will spot check the DLL’s:

    image

    That said – there is a long running bug in the gateway update.  The gateway update is NOT placing a very important file here – for agents.

    BUG:  In the \Program Files\System Center Operations Manager\Gateway\AgentManagement\ directories – we should be dropping an agent update MSP file for updating agents behind gateways, for x86 and amd64 agents.  However, the GW update does not include this.  If you want to push-deploy agents behind gateways, and need them to be fully up to date, you should copy the correct files from your updated management servers directories.

     

     

    2. Apply the SQL Script

     

    In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

    %SystemDrive%\Program Files\System Center 2012\Operations Manager\Server\SQL Script for Update Rollups

    image

    First – let’s run the script to update the OperationsManager database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file.  Make sure it is pointing to your OperationsManager database, then execute the script.

    image44

    Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

    You will see the following (or similar) output:

    image47  

    or

     image

    IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a large environment, you might have to run this several times, or even potentially shut down the services on your management servers, to break their connection to the databases, to get a successful run.

    Technical tidbit:  If you had previously ran this script by applying it during the application of SCOM 2012 R2 UR1, this script is unchanged in UR2.  Therefore it does not have to be executed again during the UR2 deployment.  There is no harm in running it again, especially if you are not 100% sure it was run with success during the UR1 deployment, if applicable.  Always best to just run it again with the deployment of UR2.  However, if you have a large environment and it is difficult to get the script to execute with success, you might skip this step.  Again – only if you already applied UR1, and you are 100% sure it was run with success then.

     

    image

    Next, we have a new script in UR2 to run against the warehouse DB.  Do not skip this step under any circumstances.    From:

    %SystemDrive%\Program Files\System Center 2012\Operations Manager\Server\SQL Script for Update Rollups

    Open a SQL management studio query window, connect it to your OperationsManagerDW database, and then open the script file UR_Datawarehouse.sql.  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

    If you see a warning about line endings, choose Yes to continue.

      image

    Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

    You will see the following (or similar) output:

    image

     

    3. Manually import the management packs?

    image

    We have five updated MP’s to import  (MAYBE!).

    image

    The TFS MP bundles are only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

    The Advisor MP’s are only needed if you are using System Center Advisor services.

    However, the Image and Visualization libraries deal with Dashboard updates, and these need to be updated.

    I import all of these without issue.

     

     

    4.  Update Agents

    image

    There is a known issue in UR2 for agents – read carefully below:

    Agents should be placed into pending actions by this update (mine worked great):

    image

    If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending

    You can approve these – which will result in a success message:

    image

    HOWEVER – this didn’t actually do any update.  You can see from the system event logs, that MOMAgentinstaller did run, but when we check the DLL versions, we can see they are not updated.

    What you need to do is REJECT any pending updates in the SCOM console – then run a REPAIR on your agents to get them to apply the update.  Alternatively – use a software distribution tool like Configuration Manager to apply agent updates where applicable.  Any agents that are manually installed (Remotely Manageable = No) will not be available for a repair, as always. 

    You can track running repairs in Pending Management:

    image

     

    Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

    image

     

     

     

    5.  Update Unix/Linux MPs and Agents

    image

    Next up – I download and extract the updated Linux MP’s for SCOM 2012 SP1 UR2

    http://www.microsoft.com/en-us/download/details.aspx?id=29696

    7.5.1021.0 is current at this time for SCOM 2012 R2 UR2. 

    ****Note – take GREAT care when downloading – that you select the correct download for R2.  You must scroll down in the list and select the MSI for 2012 R2:

    image50

     

    Download the MSI and run it.  It will extract the MP’s to C:\Program Files (x86)\System Center Management Packs\System Center 2012 R2 Management Packs for Unix and Linux\

    Update any MP’s you are already using.

    image

    You will likely observe VERY high CPU utilization of your management servers and database server during and immediately following these MP imports.  Give it plenty of time to complete the process of the import and MPB deployments.

    Next up – you would upgrade your agents on the Unix/Linux monitored agents.  You can now do this straight from the console:

    image

    image

    You can input credentials or use existing RunAs accounts if those have enough rights to perform this action.

     

     

     

    5.  Update the remaining deployed consoles

    image

    This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the UR2 update.

     

     

    Review:

    Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

    image

    Known issues:

    See the existing list of known issues documented in the KB article.

    1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop the services on the management servers, or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

    ------------------------------------------------------
    (1 row(s) affected)
    (1 row(s) affected)
    Msg 1205, Level 13, State 56, Line 1
    Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
    Msg 3727, Level 16, State 0, Line 1
    Could not drop constraint. See previous errors.
    --------------------------------------------------------

    2.  Gateway Servers don’t get agent patch update files.  See body of this blog article for more details.

    3.  Agents don’t go into pending, or go into pending but the agent update doesn’t actually work.  This is a known issue and will be addressed in the next UR3.  For this release, simply use a “repair” to repair the agents that need the update, or use a software distribution mechanism to deploy the update.

  • Introducing Thing 1 and Thing 2

     

    image

     

    My blog has been silent for a bit lately.  This is because of the birth of my first children.  SCOM world - Meet Logan and Lexi Holman.  I’ll be taking some time off work to spend with them during the next few weeks as well.

    Logan has already been discussing a management pack to monitor his diaper status, Lexi is gathering the requirements and having the necessary customer meetings.  We aren't in full agreement yet on what constitutes warning versus critical, so I‘ll keep you up to date on the status.  Smile

  • Creating Groups of Health Service Watcher Objects based on other Groups

     

    It has been a well known requirement for most customers, to be able to Create Groups of Windows Computers that also contain corresponding Health Service Watcher objects.  This was needed for Alert Notification subscriptions so that different teams could receive alert notifications filtered by groups, but also include alerts from the Watcher, such as Heartbeat failure and Computer Unreachable.  There are several articles on this but I will reference a very popular one, on Tims’ site: 

    http://www.scom2k7.com/dynamic-computer-groups-that-send-heartbeat-alerts/

    Essentially, we needed to add an extra membership rule, to the XML, that would also add any Health Service Watcher objects that have a relationship to the Windows Computer objects already in the group.  We did this with the following XML:

    <MembershipRule> <MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthServiceWatcher"]$</MonitoringClass> <RelationshipClass>$MPElement[Name="MicrosoftSystemCenterInstanceGroupLibrary!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass> <Expression> <Contains> <MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthService"]$</MonitoringClass> <Expression> <Contained> <MonitoringClass>$MPElement[Name="Windows!Microsoft.Windows.Computer"]$</MonitoringClass> <Expression> <Contained> <MonitoringClass>$Target/Id$</MonitoringClass> </Contained> </Expression> </Contained> </Expression> </Contains> </Expression> </MembershipRule>

    However, what if we ONLY want a group of Health Service Watcher objects, and NOT the Windows Computers.  BUT – we wish to based the HSW membership list from another group of Windows Computers.  This is useful if we want to create availability reports for a group of Windows Computers, but need to based the report on the availability of a specific up/down monitor, and not anything related to Windows Computer objects.

    Here is a code example of exactly that:

    In this sample – we will create a simple group of Windows Computers, that start with the name “DB”.  Then – we will create another group only containing HSW objects, corresponding the SQL computers group.

    <ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <Manifest> <Identity> <ID>grouptest</ID> <Version>1.0.0.8</Version> </Identity> <Name>grouptest</Name> <References> <Reference Alias="MSCIGL"> <ID>Microsoft.SystemCenter.InstanceGroup.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="SC"> <ID>Microsoft.SystemCenter.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Windows"> <ID>Microsoft.Windows.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Health"> <ID>System.Health.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="System"> <ID>System.Library</ID> <Version>6.1.7221.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> </References> </Manifest> <TypeDefinitions> <EntityTypes> <ClassTypes> <ClassType ID="grouptest.compgroup" Accessibility="Internal" Abstract="false" Base="SC!Microsoft.SystemCenter.ComputerGroup" Hosted="false" Singleton="true" /> <ClassType ID="grouptest.SQLWatchers" Accessibility="Internal" Abstract="false" Base="MSCIGL!Microsoft.SystemCenter.InstanceGroup" Hosted="false" Singleton="true" /> </ClassTypes> </EntityTypes> </TypeDefinitions> <Monitoring> <Discoveries> <Discovery ID="grouptest.DiscoverSQLServersComputerGroup" Enabled="true" Target="grouptest.compgroup" ConfirmDelivery="true" Remotable="true" Priority="Normal"> <Category>Discovery</Category> <DiscoveryTypes> <DiscoveryRelationship TypeID="SC!Microsoft.SystemCenter.ComputerGroupContainsComputer" /> </DiscoveryTypes> <DataSource ID="GP" TypeID="SC!Microsoft.SystemCenter.GroupPopulator"> <RuleId>$MPElement$</RuleId> <GroupInstanceId>$MPElement[Name="grouptest.compgroup"]$</GroupInstanceId> <MembershipRules> <MembershipRule> <MonitoringClass>$MPElement[Name="Windows!Microsoft.Windows.Computer"]$</MonitoringClass> <RelationshipClass>$MPElement[Name="SC!Microsoft.SystemCenter.ComputerGroupContainsComputer"]$</RelationshipClass> <Expression> <RegExExpression> <ValueExpression> <Property>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Property> </ValueExpression> <Operator>MatchesWildcard</Operator> <Pattern>DB*</Pattern> </RegExExpression> </Expression> </MembershipRule> </MembershipRules> </DataSource> </Discovery> <Discovery ID="grouptest.DiscoverSQLWatchers" Enabled="true" Target="grouptest.SQLWatchers" ConfirmDelivery="true" Remotable="true" Priority="Normal"> <Category>Discovery</Category> <DiscoveryTypes> <DiscoveryRelationship TypeID="MSCIGL!Microsoft.SystemCenter.InstanceGroupContainsEntities" /> </DiscoveryTypes> <DataSource ID="GP" TypeID="SC!Microsoft.SystemCenter.GroupPopulator"> <RuleId>$MPElement$</RuleId> <GroupInstanceId>$MPElement[Name="grouptest.SQLWatchers"]$</GroupInstanceId> <MembershipRules> <MembershipRule> <MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthServiceWatcher"]$</MonitoringClass> <RelationshipClass>$MPElement[Name="MSCIGL!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass> <Expression> <Contains> <MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthService"]$</MonitoringClass> <Expression> <Contained> <MonitoringClass>$MPElement[Name="grouptest.compgroup"]$</MonitoringClass> </Contained> </Expression> </Contains> </Expression> </MembershipRule> </MembershipRules> </DataSource> </Discovery> </Discoveries> </Monitoring> <LanguagePacks> <LanguagePack ID="ENU" IsDefault="true"> <DisplayStrings> <DisplayString ElementID="grouptest"> <Name>Group Test</Name> <Description /> </DisplayString> <DisplayString ElementID="grouptest.compgroup"> <Name>SQL Servers Computer Group</Name> </DisplayString> <DisplayString ElementID="grouptest.DiscoverSQLServersComputerGroup"> <Name>Discovery for SQL Servers Computer Group</Name> </DisplayString> <DisplayString ElementID="grouptest.DiscoverSQLWatchers"> <Name>Discovery for SQL Health Service Watchers Group</Name> <Description /> </DisplayString> <DisplayString ElementID="grouptest.SQLWatchers"> <Name>SQL Health Service Watchers Group</Name> </DisplayString> </DisplayStrings> </LanguagePack> </LanguagePacks> </ManagementPack>

     

    The key to this is the specific reference of the other group – shown here:

    <MembershipRules> <MembershipRule> <MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthServiceWatcher"]$</MonitoringClass> <RelationshipClass>$MPElement[Name="MSCIGL!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass> <Expression> <Contains> <MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthService"]$</MonitoringClass> <Expression> <Contained> <MonitoringClass>$MPElement[Name="grouptest.compgroup"]$</MonitoringClass> </Contained> </Expression> </Contains> </Expression> </MembershipRule> </MembershipRules>