Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Kevin Holman's System Center Blog

Posts
  • OpsMgr MP Update: SQL MP version 6.3.173.1 ships

     

    This MP is not showing up on the MP Catalog just yet, however the MP catalog does take you to the latest version of the MP.

    Get it at:  http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=10631

     

    This is a simple update, per the guide it only fixes a bug in the SQL DB file free space monitors.  Previously in 6.3.173.0, this monitor would create a lot of alert noise in specific databases even when free space was not really an issue.

  • Integrating VMM 2012 and OpsMgr 2012

     

    If you want to monitor your System Center Virtual Machine Manager installation, you don't just import VMM management packs like you would for other applications.  There is a process to connect VMM 2012 to OpsMgr 2012.

    This is covered here:  http://technet.microsoft.com/en-us/library/hh427287.aspx

     

    The primary reasons to integrate are to use the process to import the VMM management packs so that SCOM can monitor the VMM server and guests from a virtualization health perspective.  Additionally, you can enable PRO (resource optimization) where SCOM can provide advanced optimizations in VMM beyond the basic resource management provided out of the box in VMM.  You can also enable the ability to integrate Maintenance mode in OpsMgr with actions in VMM.

     

    Let’s get started. 

     

    This assumes you have installed and set up OpsMgr 2012 and VMM 2012.  If not, here are some handy blogs to get those deployed quickly:

    http://blogs.technet.com/b/kevinholman/archive/2011/07/26/deploying-opsmgr-2012-a-quick-start-guide.aspx

    http://blogs.technet.com/b/kevinholman/archive/2011/09/30/scvmm-2012-quickstart-deployment-guide.aspx

     

     

    Following along at:    http://technet.microsoft.com/en-us/library/hh882396.aspx  we have some pre-reqs to cover:

    PowerShell 2.0 should be on all servers as they should be Server 2008 R2 SP1 or later.

    Install the OpsMgr Console on the SCVMM server.  This is needed for the SDK binaries for SCOM connections

    Ensure you have the IIS and SQL MP’s in place in SCOM, as SCVMM MP’s have dependencies.

     

    Next, we need to open the VMM console – Settings > System Center Settings > Operations Manager Server.  Right click and choose properties.

    Enter in one of your management server names to provide the SDK connection to VMM.  Next – we will need two accounts.  One for SCVMM to connect to SCOM, and one for SCOM to connect to SCVMM. 

    When SCVMM connects to SCOM – it will need to have SCOM admin rights.  It needs this to be able to manage the MP import process, manage maintenance mode, create product connectors, etc.  The simplest path is to take the existing SCVMM service account, and make that a SCOM admin by placing it in the SCOM admins global group.  Alternatively – you can create a special “run as” account for this purpose, assign it SCOM admin rights, and use a specific dedicated account that is locked down for this purpose, to limit the number of people with access to this credential.  I will generally just use the SCVMM service account for simplicity.

    When SCOM connects to SCVMM, likewise it will need to have the Administrator role in SCVMM.  So we will need another account for this purpose, or we can use the SCVMM service account, or any other SCVMM administrative account.  I will generally also use the existing SCVMM service account for this purpose, as it should already be granted SCVMM admin rights.

    Going through the wizard – we will use the existing service account, and enable PRO, and Maintenance mode.  Then we will input the account for SCOM>SCVMM.  This will create a run-as account in SCOM behind the scenes.  Click Finish, and away we go.  What is happening in the background, is that we are creating product connectors, run-as accounts, and importing the management packs.  If you get an access denied at this point, make sure the person running the configuration is a SCVMM admin, SCOM admin, and that the credentials we entered previous have the necessary rights.  If you just recently added an account to a global group – it might take a reboot of the SCVMM or SCOM server to ensure the service accounts pick up that new token of group membership.  This process will use considerable resources on your SCOM management server and database server while it is creating the resources in OpsMgr.

     

    When you are complete – you can go back in and see your configuration by right clicking Operations Manager Server and choose Properties:

    image

     

    image

     

    In OpsMgr – you will see a new Run As account:

    image

     

    And profile:

    image

     

    The one thing I don't like – is that this Run As account is set to “less secure”

     

    image

     

    ***Note:  It is my opinion that this account should not be set to “less secure” which will distribute the credential to all healthservices in the management group.  I will research if we can limit the scope and distribution.  Less Secure is not a good option for most customers and I am not sure why the product chose this for a default.  This will cause a large number of alerts to be sent from all your SCOM agents, where this credential cannot “Log on locally”.  Generally, your SCVMM service account does not need, nor will it have, “Log on Locally” rights to all managed agents.  If you see a ton of those alerts after configuring SCVMM integration – this is why.

     

    We will also see many product connectors configured in SCOM.

    image

     

    Next step – we need to update the SCVMM MP’s to the current version – quickly.  The SCVMM MP’s version 3.0.6005.0 that shipped with RTM have many issues and need to be updated for a number of fixes.  At the time of this writing – the current version is 3.0.6019.0 available at:  http://www.microsoft.com/en-us/download/details.aspx?id=29679  This update is part of UR1 for System Center, which is detailed at:  http://support.microsoft.com/kb/2686249

    Following the MP guide – we do NOT just import these downloaded MP’s.  We first must ensure VMM is integrated with OpsMgr (what we just accomplished above) and then we need to update some items on the VMM server.  From http://technet.microsoft.com/en-us/library/hh882396.aspx we need to open the registry on the SCVMM server, and find the following key:

    HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/Microsoft System Center Virtual Machine Manager Server/Setup

    Edit the string value for “CompatibleMPVersion” from 3.0.6005.0 to 3.0.6019.0.  Then bounce the SCVMM service.

    On the VMM management server, open the management packs directory. By default, the location is C:\Program Files\Microsoft System Center 2012\Virtual Machine Manager\ManagementPacks.  Make a backup copy of these files in a different location so you have them.  Then copy over the extracted MP files from the MP update MSI you downloaded and installed.

     

    At this step – we have two choices on how to continue.  The TechNet documentation has us manage the MP import by removing the OpsMgr integration, and reconfiguring it to handle the MP import.  However, the MP guide has different instructions – it tells us to simply import the new management packs manually using the SCOM console.  So – your choice.  For my example – I am going to remove and re-configure the integration, and let SCVMM handle the MP import.

     

    So – we must remove the integration we previously created.  In the SCVMM console – right click Operations Manager Server and choose Remove.

     

    image

    Confirm the operation and wait a bit for it to complete.  This leaves the MP’s behind and the run-as accounts, however it does delete the Product connectors for SCVMM.

    Then – click Properties and the configuration wizard should start.  Input the same info that we configured previously for the accounts.  Click finish.  Behind the scenes it is re-creating the SCVMM product connectors, and updating all the management packs.

     

    At this point, your SCVMM integration is up to date.  Technically, you could have updated the management packs FIRST – before handling the initial integration, however I wanted to walk through the MP update process for future updates.

    At this point – you can go into SCOM, and close any alerts with the following names, if they don't auto-close on their own:

      • Host has exceeded the supported maximum number of running VMs
      • VGS not installed

    To test integration – open the SCVMM properties of the OpsMgr connection – and click Test PRO.  You should see a test alert pop up in SCVMM, and a test alert in SCOM:

     

    image

  • OpsMgr 2012: Update Rollup 2 ships, and my experience installing it

     

    Originally I had not planned an update article on this, but I have been getting a lot of questions on it, so lets just throw one out there.

     

    Cumulative Update Rollup 2 (UR2) for OpsMgr 2012 has shipped.  This, like other CU’s or Update Rollups, is *Cumulative*.  This means that you can apply UR2 directly to a SCOM 2012 deployment with no previous updates, OR you can apply UR2 to a SCOM 2012 UR1 level management group.

    Download: http://www.microsoft.com/en-us/download/details.aspx?id=30421

    KB link: http://support.microsoft.com/kb/2706783

    Here is a list of the fixes:

    • The Windows PowerShell module runspace configuration cache grows indefinitely. This causes memory usage to increase.
    • The Set-SCOMLicense cmdlet fails if the management group evaluation expiration time-out has expired.
    • The System Center Operations Manager agent may crash on Oracle Solaris root zones that are configured to use a range of dedicated CPUs.
    • The UNIX/Linux agent process provider may not enumerate running processes if a process has arguments that include non-ASCII characters. This prevents process/daemon monitoring.
    • The .rpm specification file for the agent for Red Hat Enterprise Linux does not specify the distribution.

     

     

    Let’s Roll:

    So – first – I download it. The hotfix file “SystemCenter2012OperationsManager-UR2-KB2731874-X86-X64-IA64-ENU.exe” is ONLY about 76 megabytes

    NOTE:  Cumulative Updates/Update Rollups for SCOM 2012 has changed in a very fundamental way from the old Cumulative Updates in SCOM 2007.  No longer is there a bootstrapper program for the CU for SCOM 2012. Now – the package simply extracts the files to a directory.  We will then run each MSP file independently based on whatever server roles are installed.  It will not detect your installed roles – you must handle this, and apply which updates are applicable.

     

    Next step – READ the documentation… understand all the steps required, and formulate the plan.

    I build my deployment plan based on the release notes in the KB article. My high level plan looks something like this:

    1. Backup the Operations and Warehouse databases, and all unsealed MP’s.
    2. Apply the hotfix to all the Management Servers
    3. Apply the hotfix to my Gateway Servers.
    4. Apply the hotfix to the OpsMgr Reporting server.
    5. Apply the hotfix to my Web Console server
    6. Apply the hotfix my dedicated consoles (Terminal servers, desktop admin machines, etc…)
    7. Import the management pack updates
    8. Agents: Apply the hotfix to my agents by approving them from pending.
    9. Agents: Update manually installed agents…. well, manually.
    10. Unix/Linux: Download and import the updated MP’s for Unix/Linux monitoring.
    11. Unix/Linux: Upgrade the Unix/Linux agent providers.

    Ok – looks like 11 easy steps. This order is not set in stone – it is a recommendation based on logical order, from the release notes.

    ****Requirement – as a required practice for a major update/hotfix, you should log on to your OpsMgr role servers using a domain user account that meets the following requirements:

    • OpsMgr administrator role
    • Member of the Local Administrators group on all OpsMgr role servers (RMS, MS, GW, Reporting)
    • SA (SysAdmin) privileges on the SQL server instances hosting the Operations DB and the Warehouse DB.

    These rights (especially the user account having SA priv on the DB instances) are often overlooked. These are the same rights required to install OpsMgr, and must be granted to apply major hotfixes and upgrades (like RTM>SP1, SP1>R2, etc…) Most of the time the issue I run into is that the OpsMgr admin logs on with his account which is an OpsMgr Administrator role on the OpsMgr servers, but his DBA’s do not allow him to have SA priv over the DB instances. This must be granted temporarily to his user account while performing the updates, then can be removed, just like for the initial installation of OpsMgr as documented HERE. At NO time do your service accounts for MSAA or SDK need SA (SysAdmin) priv to the DB instances…. unless you decide to log in as those accounts to perform an update (which I do not recommend).

    Ok, Lets get started.

     

    1. Backups. I run a fresh backup on my OpsDB and Warehouse DB’s – just in case something goes really wrong.

    I also will take a backup of all my unsealed MP’s. You can do the backup in PowerShell, here is an example which will backup all unsealed MP’s to a folder C:\mpbackup:

    Get-SCOMManagementPack | where {$_.Sealed -eq $false} | export-SCOMmanagementpack -path C:\MPBackup

    We need to do this just in case we require restoring the environment for any reason.

     

    2. Apply the hotfix to the Management Servers.

    Pro Tip #1: Here is a tip that I have seen increase the success rate: Reboot your Management Server nodes before starting the update. This will free up any locked processes or WMI processes that are no longer working, and reduce the chances of a timeout for a service stopping, file getting updated, etc. This is not a requirement, just something to consider if you have had issues applying such a fix.

    Pro Tip #2: If you are running any SDK based connectors – it is a good idea to stop these first. Things like a Remedy product connector service, Alert Update Connector, Exchange Correlation Engine, etc… This will keep them from throwing errors like crazy when setup bounces the SDK services.

    I start by running the download, SystemCenter2012OperationsManager-UR2-KB2731874-X86-X64-IA64-ENU.exe. This is simply an extractor. You can run this anywhere, and provide a location for the update. No longer do we need to run this extractor on each server role. We can now simply extract these files to a network share, and update all server roles from there.

    Here are the files after extraction:

    image

     

    I start on my first management server, OMMS1. This has the Server role, Web Console Role, and Console installed. So I will need to run 3 MSP’s.

    I will start by running KB2731874-AMD64-Server.msp.  Right click the file and choose “apply” or call it from an elevated CMD.

    Pro Tip: Open an ELEVATED COMMAND PROMPT and run these MSP' files from the command prompt. User Access Control (UAC) will still block a successful install in many cases, so you will experience greater success if you always run updates from an elevated CMD.  Even if you *think* you don’t have UAC enabled.

    Pro tip:  The “AMD64” files are for any 64bit server.  This is what you will always execute on a server role, since SCOM 2012 ONLY supports 64 bit servers for server roles.  The “i386” files were included only for 32bit agents, and where the console is installed on 32bit workstations.

     

    You will see a dialogue box like below:

     

    image

    image

    The server update goes pretty quickly, depending on how long your server takes to restart the OpsMgr services.  It bounced the DAS, Config, and SC Mgmt services during the update. 

    ***Note – you MIGHT get a message that a reboot is required, if any of the files to be updated are locked and cannot be replaced.  If so, I recommend rebooting before continuing.

    Next, I will update the web console by running KB2731874-AMD64-WebConsole.msp. This only takes a few seconds.

    Next, I will update the Console files. I need to make sure that I close any open consoles on this server, from my session or any other logged in sessions via RDP. I will apply KB2731874-AMD64-Console.msp. Again – this only takes a few seconds to run.

    Now – I am done updating all three installed server roles on this server. I can spot check to make sure the files got updated:

    Server role update: From \Program Files\System Center 2012\Operations Manager\Server

    image

    Web Console role update: From \Program Files\System Center 2012\Operations Manager\WebConsole\WebHost\bin

    image

    Console role update: From \Program Files\System Center 2012\Operations Manager\Console

    image

     

    Next – I am ready to update the rest of my management servers. I only have one other MS – OMMS2. This only runs the console and server roles, so only two MSP’s to apply.

    I apply each MSP, takes a couple minutes.

    Another spot-check to make, is to each that the Agent binaries get updated on each Management Server role, for Windows agents. These are located at: \Program Files\System Center 2012\Operations Manager\Server\AgentManagement\ in the AMD64 or i386 directory:

    Check to ensure that the CU dropped the appropriate agent update file, such as KB2731874-amd64-Agent.msp.

    image

     

    3. Apply the hotfix to my Gateway Servers.

    I don’t have any gateways in my lab right now – but this would be a very simple execution of the KB2731874-AMD64-Gateway.msp file.

     

    4. Apply the hotfix to the OpsMgr Reporting server.

    I have SCOM reporting and SSRS installed on a SQL server, DB01.

    I log on and apply KB2731874-AMD64-Reporting.msp. This runs in a few seconds.

    Spot check: From \Program Files\System Center 2012\Operations Manager\Reporting\Tools

    image

     

    5. Apply the hotfix to my Web Console server

    I already did this in step 2 – but I could have waited and done this now, or patched any dedicated Web Console servers.

     

    6. Apply the hotfix my dedicated consoles (Terminal servers, desktop machines, etc…)

    I need to apply this update anywhere the console is installed – including consoles installed on management servers. I have already updated the console patch on my management servers OMMS1 and OMMS2. Now I have a Terminal Server where I run the console – so I will patch this server now, which only runs the console. Make sure you close the console and close any other user sessions out – or you might require a reboot to finish the update for the console files which will be locked by an open console. This also includes any open Powershell windows connected to the SDK. You will get a warning if setup detects any. You can run the same spotcheck for the file version update as above. (Note – help/about will not show the updated version in the console – this stays the same major version, so don’t go looking there.)

     

    7. Import the management pack updates

    Open a console – and import the files previously extracted:

    Microsoft.SystemCenter.DataWarehouse.Library.mp

    Microsoft.SystemCenter.Visualization.Library.mpb

    Microsoft.SystemCenter.WebApplicationSolutions.Library.mpb

    image

    These import in a few minutes without a hitch.

     

    8. Agents: Apply the hotfix to my agents by approving them from pending

    I open the console – Administration > Device Management > Pending Management, and see all my agents in pending that are not manually installed (Remotely Manageable = Yes)

    Let me stop and talk about how agents get into Pending. This is a ONE TIME operation, which is created at the exact moment that you run the CU on a management server.  What will happen, is that the CU runtime will look for all agents ASSIGNED to that management server, that are Remotely Manageable (not manually installed) and will put ONLY those agents into pending at that time. We will NEVER go back and re-inspect to put old agents into pending, because this is not a SCOM workflow that handles this.  It is done only by applying the update. If you don’t have agents in pending – you either aren't running the MSP from an elevated cmd, or you aren't running the update as a user that is BOTH a SCOM admin, Local OS Admin, and SQL Systems Administrator role to the databases.

     

    image

     

    I right click all mine – provide an account that has rights to deploy/install the update remotely, and kick it off.

    100% Success!

    How can I tell?  Open the Console, Monitoring > Operations Manager > Agent Details > Agents by Version (state view):

    image

    9. Agents: Update manually installed agents…. well, manually.

    I simply run the file KB2731874-AMD64-Agent.msp or KB2731874-i386-Agent.msp depending on which OS version (64bit or 32bit) I am updating. I can deploy these manually, or via software distribution packaging up each MSP and applying them to the correct OS by version/architecture.

    10. Unix/Linux: Download and import the updated MP’s for Unix/Linux monitoring

    The Unix/Linux update files are located in a separate download. This includes new versions of the management packs, and updated agent binaries. I go to the link in the KB article and grab these update files. Monitoring Pack for UNIX and Linux Operating Systems.msi. It is the typical MP installer/extractor program. Run setup and extract these files to your preferred location. I generally just recommend installing/extracting these to your default location on any workstation/server, and then copying the extracted files to your OpsMgr software/MP/Updates network share.

    Download link: http://www.microsoft.com/en-us/download/details.aspx?id=29696

    Next – import all the files for your versions f Unix/Linux Agents that you support. The RTM version of the Unix/Linux MP’s was 7.3.2026.0. The UR1 version is 7.3.2097.0.  The UR2 version is 7.3.2119.0.

    ***Note – several of these new MP files are actually in the new .MPB format. This new format allows management packs to take advantage of the new schema extensions in System Center 2012, to be able to transfer binaries among other items. Such as the agent binaries

    image

     

    11. Unix/Linux: Upgrade the Unix/Linux agent providers

    One you have imported the updated MP files for Unix/Linux agents, spot check that these files got updated on your management servers. Look in the following folder:

    \Program Files\System Center 2012\Operations Manager\Server\AgentManagement\UnixAgents\DownloadedKits

     

    image

     

    Notice the new update files are version 1.3.0-218. Also, it may take some time for this folder/files to show up, after importing the MP’s, as your database and management servers will be experiencing high CPU utilization from SDK and Config activities, and this is normal.  So be patient. You need to ensure that these files are updated on any members of your management server resource pool that monitors Unix/Linux agents before continuing. 

    *** Note – there is a minor issue with these Unix/Linux files. They should be copied over to this folder automatically, however they are not. You might have to restart the System Center Management service, after waiting for 10-20 minutes for the post import processes to complete. After bouncing the System Center Management service on each management server, the binaries should show up. HOWEVER – it does appear we are leaving behind the previous version files, so you will see the new 218 version and the previous 214 version from UR1 if you applied that.  This should be cleaned up in a future cumulative update.

     

    Now – to update my Unix/Linux agents. I can use the Update-SCXAgent cmdlet in powershell, or I can use the Administration wizard. I right click my version -214 agent, and choose Upgrade Agent.

    image

     

    image

    You can use an existing RunAs account profile if you have configured these with an agent upgrade account that has the necessary rights, or you can input new credentials using an account that has SSH privileges in order to remotely install software on the Unix/Linux agent machine. If all goes well – you get a success:

    image

    image

    Now – the update is complete.

    image

    The next step is to implement your test plan steps. You should build a test plan for any time you make a change to your OpsMgr environment. This might include scanning the event logs on the Management servers for critical and warning events… looking for anything new, or serious. Testing reporting is working, test your web console, check the database for any unreasonable growth, run queries to see if anything looks bad from a most common alerts, events, perf, state perspective. Run a perfmon – and ensure your baselines are steady – and nothing is different on the database, or management servers. If you utilize any product connectors – make sure they are functioning.

    The implementation of a solid test plan is very important to change management. Please don't overlook this step.

  • OU, Logical Processor, Physical Processor, and other class properties are not discovered with OpsMgr 2012 on Windows Server 2003

    You may notice on your Windows Server 2003 servers monitored with OpsMgr 2012, that some class properties on Windows Computer instances are not discovered.  Organizational Unit, IP Address, Logical Processor, Physical Processor, etc.

     

    image

     

    Windows Server 2003 machines need a hotfix for this to work, due to some changes that were apparently made to the Windows Computer discovery script in OpsMgr 2012.

     

    The hotfix is http://support.microsoft.com/kb/932370  “The number of physical hyperthreading-enabled processors or the number of physical multicore processors is incorrectly reported in Windows Server 2003”

    Once you apply this, changes are made to the Win32_ComputerSystem and Win32_Processor classes in WMI to make them similar to Windows Server 2008 and later.  Makes me think that the SCOM team isn't testing a whole lot with Windows Server 2003 anymore.  Smile

     

    After applying the hotfix and a reboot, the info comes in quickly.

    image

     

     

    Thanks to PFE Greg Davies for bringing this to my attention, and credit goes to Cameron Fuller and RobK for finding the solution.  Cameron has a quick way of using groups to find out who needs this hotfix quickly.

    http://www.systemcentercentral.com/BlogDetails/tabid/143/IndexID/94314/Default.aspx

    http://www.systemcentercentral.com/Forums/tabid/60/categoryid/4/indexid/93243/Default.aspx

    Apparently Daniele was recommending this ages ago – I am just catching up to these guys. 

    http://nocentdocent.wordpress.com/2011/09/17/are-your-windows-2003-vms-reporting-just-1-processor/

  • OpsMgr: MP Update: Exchange 2010 SP2 MP version 14.02.0247.005 (14.3.38.2 actual mp version) is released

     

    ****Note 14.3.38.2 has been pulled.  If you implemented this already, please see

    http://blogs.technet.com/b/exchange/archive/2012/06/28/mailboxes-on-a-database-are-quarantined-in-an-environment-with-system-center-operations-manager.aspx

     

     

     

     

    Version 14.02.0247.005 (14.3.38.2 MP version) of the Exchange 2010 MP has been updated.  This is the SP2 compatible version of the MP that many have been expecting.

    Available here:  http://www.microsoft.com/en-us/download/details.aspx?id=692

     

    What’s new?  From the guide:

     

    • Resolved ObjectNotFoundExceptions in correlation engine   The SP1 version of the Correlation Engine could encounter ObjectNotFoundExceptions on a regular basis.  The number of exceptions of this type is significantly reduced in this update.
    • Reduced load on RMS/MS   A number of improvements were made to reduce the load of the Management Pack on the RMS/MS.  The following specific changes were made:
      • Reduced the number of read operations the Correlation Engine makes to the SDK to get entity and monitor states
      • Improved cache handling in Correlation Engine when Management Pack updates are applied
      • Increased correlation interval time from 1.5 minutes to 5 minutes
    • Reduced load due to discovery  The discovery interval was increased from 4 hours to 24 hours and improved handling of Domain Controller objects to decrease churn
    • Improved Database Copy Health monitoring  Replaced KHI: Database dismounted or service degraded with One Healthy Copy monitor to decrease load on RMS
    • Improved Performance monitoring  Non reporting Perf Instances are now enabled by default and some write operations were removed to decrease unnecessary writes to the database

     

    Doesn’t sound like to many major changes.  Hopefully this will help the MP scale better for very larger Exchange 2010 environments and have less of an impact on the management group for all environments.

     

    My experience:

     

    One of the things the guide fails to tell you, is how to *upgrade* to this version of the Correlation Engine and management pack.  It is essentially the same process as installing new.  Just keep in mind if you had previously customized items or configuration, you will need to revisit those customizations.

     

    I download and install the correct MSI of the Exchange pack for my OS version (64bit), by executing Exchange2010ManagementPackForOpsMgr2007-x64.msi.

    As you step through the installer, you need to ensure you provide the correct locations if you had changed these previously.

    image

     

    The install/extraction begins…. it can hang for a long time on “Stopping Services”:

     

    image

     

    When complete:

     

    image

     

    Apparently this update is going to require a reboot of your management server where you run the Exchange correlation engine, so be prepared for that!

     

    image

     

    Log Name:      Application
    Source:        MsiInstaller
    Date:          6/18/2012 12:42:34 PM
    Event ID:      1038
    Task Category: None
    Level:         Information
    Keywords:      Classic
    User:          OPSMGR\kevinhol
    Computer:      OMMS2.opsmgr.net
    Description:
    Windows Installer requires a system restart. Product Name: Microsoft Exchange 2010 Management Pack for OpsMgr 2007. Product Version: 14.3.38.2. Product Language: 1033. Manufacturer: Microsoft Corporation. Type of System Restart: 2. Reason for Restart: 0.

    So, I reboot my management server that is running the correlation engine. 

     

    First thing I noticed – is that my service was reset from running as my SDK account, back to local system.  This is not documented behavior so watch out, if you were previously running your correlation engine under a defined account.

    image

     

     

    Next – I go looking for the updated management packs, in the default folder:

    Before:

    image

    After:

    image

     

    So I import the two updated MP’s.  You can see for some reason the download version and MP version don’t match for these MP’s.

    image

     

    We can also see the updated version in Programs and Features:

    Before:

    image

    After:

    image

     

     

    Next, I’ll have a peek at the actual correlation engine binary and config files:

     

    Before:

    image

    After:

    image

     

    Open up my Microsoft.Exchange.Monitoring.CorrelationEngine.exe.config file:

     

    Before:

    image

    After:

    image

     

    You can see – we have changed the interval for the CE to connect to the SDK to every 5 minutes, from 90seconds.  We apparently have also change the logging days kept from 7 to 30 days, which is kept at C:\Program Files\Microsoft\Exchange Server\v14\Logging\MonitoringLogs\Correlation

     

    Overall, pretty straightforward.

  • OpsMgr: MP Update: New Base OS MP 6.0.6972.0 Adds new cluster disks, changes free space monitoring, other fixes

    There is a new Base OS MP version 6.0.6972.0 available here:  http://www.microsoft.com/en-us/download/details.aspx?id=9296

     

    Be very careful updating to this new version – there are multiple changes and potential issues you should plan for and test with, that might impact your existing environments.  I will discuss them below.

     

    I previously wrote about the last MP update HERE and HERE.  Then I wrote about some issues in the MP’s with Logical Disk monitoring HERE.  Additionally, there were some problems with the network monitoring utilization scripts HERE.  All of these items have been addressed in this latest MP update. (somewhat)

     

    First – lets cover the list of updates from the guide:

    Changes in This Update

    •    Updated the Cluster shared volume disk monitors so that alert severity corresponds to the monitor state.
    •    Fixed an issue where the performance by utilization report would fail to deploy with the message “too many arguments specified”.
    •    Updated the knowledge for the available MB monitor to refer to the Available MB counter.
    •    Added discovery and monitoring of clustered disks for Windows Server 2008 and above clusters.
    •    Added views for clustered disks.
    •    Aligned disk monitoring so that all disks (Logical Disks, Cluster Shared Volumes, Clustered disks) now have the same basic set of monitors.
    •    There are now separate monitors that measure available MB and %Free disk space for any disk (Logical Disk, Cluster Shared Volume, or Clustered disk).

    Note :  These monitors are disabled by default for Logical Disks, so you will need to enable them if you want to use them in place of the default Logical Disk monitor for free space.

    •    Updated display names for all disks to be consistent, regardless of the disk type.
    •    The monitors generate alerts when they are in an error state.  A warning state does not create an alert.
    •    The monitors have a roll-up monitor that also reflects disk state. This monitor does not alert by default. If you want to alert on both warning and error states, you can have the unit monitors alert on warning state and the roll-up monitor alert on error state.
    •    Fixed an issue where network adapter monitoring caused high CPU utilization on servers with multiple NICs.
    •    Updated the Total CPU Utilization Percentage monitor to run every 5 minutes and alert if it is three consecutive samples above the threshold.
    •    Updated the properties of the Operating System instances so that the path includes the server name it applies to so that this name will show up in alerts.
    •    Disabled the network bandwidth utilization monitors for Windows Server 2003.
    •    Updated the Cluster Shared Volume monitoring scripts so they do not log informational events.
    •    Quorum disks are now discovered by default.
    •    Mount point discovery is now disabled by default.

    Notes:  This version of the Management Pack consolidates disk monitoring for all types of disks as mentioned above. However, for Logical Disks, the previous Logical Disk Free Space monitor, which uses a combination of Available MB and %Free space, is still enabled.  If you prefer to use the new monitors (Disk Free Space (MB) Low Disk Free Space (%) Low), you must disable the Logical Disk Free Space monitor before enabling the new monitors.
    The default thresholds for the Available MB monitor are not changed, the warning threshold (which will not alert) is 500MB and the error threshold (which will alert) is 300MB. This will cause alerts to be generated for small disk volumes. Before enabling the new monitors, it is recommended to create a group of these small disks (using the disk size properties as criteria for the group), and overriding the threshold for available MB.

    Ok, sounds good.  But what does all that mean to me?

     

    I will summarize the fundamental changes below:

     

    1.  Disk discovery and monitoring has changed.  We now will UNDISCOVER any “Logical Disks” that are hosted by a Windows Server 2008 R2 cluster, and REDISCOVER those as a new entity, of the “Cluster Disk” class.  This discovery only pertains to Windows Server 2008 R2 and later, it does not affect Server 2008 and older clusters.

     

    There are now THREE types of disks we will discover and monitor:

    • Logical Disks
    • Cluster Disks
    • Cluster Shared Volumes

    Logical Disks include disks that are not part of/hosted by a cluster, and include disks with a drive letter, and any disks without a drive letter (which are discovered as mount points).

    Cluster Disks include any disk that is hosted by a Microsoft Cluster as a shared resource, but not a specific Cluster Shared Volume.

    Cluster Shared Volumes are a specific type of cluster disks, that is leveraged by Hyper-V clusters for placement of virtual machines.

    For most customers, the impact will be if you have placed any instance or group specific overrides for your cluster disks, these will no longer apply, as these disks are going to be re-discovered as a new entity of a new class, “Cluster Disk”.  This new class will have entirely different monitoring targeting it, described below.

    However, this is a GOOD thing!  In the past, if you had a disk that was part of a cluster, it was undiscovered and rediscovered on each NODE when a failover occurred.  If you did overrides for the disk while it was on one node, your changes would no longer apply when it failed over to another node, because it was literally discovered as a different disk! (basemanagedentity)  This is now resolved – the disk will retain the same BaseManagedEntityId (its unique GUID under the covers in SCOM) as it moves from node to node.  It is also now “hosted” by the cluster, and not the Operating System class.

    I put together a state dashboard that demonstrates these different disk types:

     

    image

     

    There are also distinct views for these that ship inside the management pack:

    image

     

    Another point to make here – is that the Mount Point discovery, which has been enabled in all previous Base OS MP’s, is now DISABLED.  This means you will no longer discover mount points by default.  You can enable this via override if you want mount point discovery, or selectively enable it only for specific servers that you know host a mount point that you wish to monitor.

    Our mount point discovery is a bit misleading.  We don’t actually only discover mount points, we actually use the mount point discovery to discover ANY disk that does not have a drive letter assigned.  For instance, you may have noticed on your Server 2008 R2 machines, that you discovered a 100MB logical disk. 

     

    image

     

    These 100MB disks are System Reserved for Bitlocker use, to hold the boot loader.  Once you upgrade to the new MP version – new mounted disks (non-clustered disks with no drive letter) will no longer be discovered, as this discovery is disabled by default.  This will NOT remove the previously discovered disks, however.   Neither will running Remove-DisabledMonitoringObject.    The reason that Remove-DisabledMonitoringObject does NOT remove these discovered disks, is because it will only remove objects if there is an explicit *override* for a discovery, disabling it.  If we change the default configuration of a discovery to disabled, the cmdlet has no impact.  So if you wanted to remove these from your management group, you simply need to add an explicit override disabling the mount point discovery, and THEN run the cmdlet.  Keep in mind – doing this will undiscover ALL your mounted disks, possibly including real mount points if you have those.  As there is ZERO value in discovering and monitoring these 100MB disks, I’d recommend disabling the mounted disk discovery with an explicit override, then create instance specific or group specific overrides for your servers that DO host a mounted disk.

     

     

    2.  Logical Disk free space monitoring, along with Cluster Disk and Cluster Shared Volume monitoring has changed.  Here are the details:

    The default configuration of the “Logical Disk Free Space” monitor is largely UNCHANGED from MP version 6.0.6958.0, which I wrote about HERE.  This was done to create the lowest possible impact on you, the admin, who is using this monitor, and likely already has many overrides and has implemented this alert into any ticketing systems.  There were many complaints that this monitor (once it was modified to allow for consecutive samples) no longer generated alerts that contained free space and MB free in the alert description.  This is still the case in this version – the monitor was not modified.  This monitor will also generate alerts for warning state AND critical state, which is NOT a good thing.  When a single monitor generates alerts on both warning and critical state, a *new* alert is *not* generated when the monitor changes from warning to critical.  We simply modify the existing alert from warning to critical (if it exists in an open state).  This modification will NOT generate a new notification subscription, nor will it route the alert to a connector subscription set with a filter for “critical” severity alerts, because it has already been inspected and watermarked.  For this reason I never recommend using three state monitors and alerting on a warning and a critical state.

    However, another complaint we often got was that customers didn’t understand how this monitor worked, in that we inspect BOTH % free threshold AND MB free threshold, and BOTH conditions need to be met before we will change the state of the monitor and generate an alert.  This is a very good design, because it helps cut out the majority of noise and remains flexible for disks of different sizes.  That said, many customers would say “I just want a simple monitor to alert on % free ONLY, or MB free ONLY…” which was easier for them to understand.  Therefore, we have added THREE new monitors for disk space monitoring of logical disks.

    These new monitors are disabled by default, to allow customers to choose if they want to implement them.  What we have done is to create two new Unit monitors, one for % free and one for MB free.  Then place both of these under an aggregate rollup monitor.

     

    image

     

    If enabled, the customer can pick if they want only %, or only MB free, or both, via overrides.  These new Unit monitors also provide a richer alert description as seen below:

    The disk F: on computer computer1.domain.com is running out of disk space. The value that exceeded the threshold is 28 free Mbytes.

    The disk F: on computer computer1.domain.com is running out of disk space. The value that exceeded the threshold is 4% free space.

    Additionally, if the customer DOES want alerts on warning state for these monitors, they can enable this, and additionally enable alerting on the Aggregate rollup monitor above, to issue critical alerts only.  This way, you can have unique alerting for a warning state, but if any monitor is critical, we can roll up health and generate a NEW alert for critical state, which can be used to send a notification or send to a ticketing system.

    As you can see, a lot of thought went into this new design, trying to make the new format fit as many customer requested scenarios as possible.  You essentially have three options now:

     

    • Continue to use the existing Logical Disk Free space monitor that is provided and enabled in the management pack.
    • Enable and start using the newly designed Logical Disk free space monitors, based on your specific requirements.
    • Use my addendum MP which uses a single free space monitor that is similar to the old Base OS management packs, described and available HERE

     

    For Cluster Disks, and Cluster Shared Volume disks – both of those are using the new format for free disk space monitoring:

     

    image

    image

     

    Based on this, I’d recommend considering and testing a move of your logical disk free space monitoring over to the new style as well, to have a consistent experience.  I welcome your feedback on this point.

     

    ***Note – if you enable the new Logical Disk free space monitors, the MB Free monitor will go into a critical state for any Logical disk that is under 2GB (non-system) or 500MB (system).  This means if you have any tiny disks, such as the 100MB bitlocker disks, this monitor will alert on all of those disks, potentially creating a large number of alerts.  I’d recommend undiscovering those 100MB disks (see #1 above) or create a dynamic group of disks in your override MP, based on “size is less than a specific numerical size”, and use this group to disable free space monitoring.

     

    3.  The previous “Cluster Shared Volume” MP with was “Microsoft.Windows.Server.ClusterSharedVolumeMonitoring.mp” has a new displayname of “Windows Server Cluster Disks Monitoring” and the new classes for Cluster disks mentioned above are included in this MP, so if you didn’t import it previously because you weren't using Hyper-V Cluster Shared Volumes, you need this MP now to discover and monitor clustered disks.

     

    4.  We have disabled the Network Utilization scripts by default on Server 2003, and fixed them for Server 2008 to make them consume less resources.  I wrote about this previously HERE.  This now should be addressed, so if you previously disabled these, but want that counter for alerting or perf collection, you can consider enabling it. It should REMAIN disabled for Windows 2003, as there is an issue with Netman.dll which causes the crash of services.

     

    5.  The “Total CPU Utilization Percentage” monitor was changed.  In previous management packs, it would inspect the value every 2 minutes, and if the AVERAGE of 5 samples for “CPU Queue length”AND “% Processor Time” were over their default thresholds, we would generate an alert.  Now, we inspect the value every 5 minutes, and if the AVERAGE of 3 samples for both counters are over the thresholds, then an alert is generated.  I am told this change was made on customer request, I have to assume to spread out the time period over a longer time span…. not really sure.  Seems fairly insignificant.

     

     

    Known Issues/Things to remember:

     

    1.  Which MP’s to import:  This MP update contains the following files:

    image

    Don’t import management packs that you don’t need or use. 

    Don’t import the BPA management pack if you don’t want to see alerts for this new feature.

    Don’t import the Microsoft.Windows.Server.Reports.mp if your back-end SQL is still running SQL 2005, this MP is supported on SQL 2008 and newer only.  It will cause your reporting to break if you import this MP and your management group leverages SQL 2005 on the back-end.

    DO import the Microsoft.Windows.Server.ClusterSharedVolume.mp because this contains the discovery and monitoring for Cluster Disks, not just Cluster Shared Volumes.  If you don’t import this your monitoring of clustered disks will disappear.

     

    2.  The knowledge for the Total CPU Utilization Percentage is incorrect – the monitor was updated to a default value of 3 samples but the knowledge still reflects 5 samples.

     

    3.  There is no free space perf collection rules for “Cluster Disks”.  We have multiple performance collection rules for Logical Disks, and for Cluster Shared Volumes, however there are none for the new Cluster Disks class.  If you want performance reports on free space, disk latency, idle time, etc, you will need to create these.

     

    4.  Perf collection and disk monitoring for cluster disks and CSV’s only works when the resource group hosting the disks, are on the same node that is hosting the cluster name (quorum) resource.  If the disk’s resource group is running on a different node than the cluster name itself, perf collection and monitoring will cease.

  • Some of the beta Private Cloud exams from MMS are scored and available

    Many took the beta exams at MMS 2012 this year, for the Private Cloud certifications.

     

    http://www.microsoft.com/learning/en/us/certification/cert-private-cloud-all.aspx

     

    The two tests that most took were

    Exam 70-246:  Monitoring and Operating a Private Cloud with System Center 2012

    and

    Exam 70-247: Configuring and Deploying a Private Cloud with System Center 2012

     

    If you took these and want to see your status – you can get this from the Prometric site:

    http://www.register.prometric.com/Index.asp

    Log in with your email and password, and then choose Candidate History.

     

    I only had time to take the 246 exam, and it was VERY in depth…. but looks like this one is in the books.  Smile

     

    image

  • OpsMgr: AD Client monitoring - There are not enough GC’s available, and other troubleshooting issues

    Ran into this one recently during a Proof of Concept at a customer using the ADMP.

    You implement the AD Management pack, then optionally enable AD Client monitoring per the ADMP Guide.  Almost immediately, you start getting a lot of alerts with high repeat counts, that state “There are not enough GC’s available”.

     

    image

     

    No problem – we likely need to adjust the number of GC’s expected for the site.  The default is 3.  So you go find the monitor that you assume is running this script – by scoping the authoring pane to “Active Directory Client Perspective” and viewing monitors.  What we find – is that the monitor for “Active Directory Global Catalog Availability” is DISABLED by default:

     

    image

     

    So why are we generating this alert?

     

    One concept in MP authoring – is that multiple workflows (including multiple rules and monitors) can share a common datasource.  A datasource might be something as simple as a script, and a timed schedule.  Then – multiple rules and monitors can all call that same script, pass the same parameters, and a single run of the script on a schedule can “feed” all the rules and monitors with the data they are expecting.  This is exactly the case here.

    There is a datasource in the AD Client MP, “AD_Client_GC_Availability.DataSource”  This is simply a script, and a timer for the interval to run.  The disabled monitor “AD_Client_GC_Availability.Monitor” uses a MonitorType of “AD_Client_GC_Availability.Monitortype” which references this datasource. 

    However – the monitor is disabled – so this script datasource should not be executing.

    UNLESS – there is some other rule or monitor that is executing it.  We can find a rule in the MP called “AD Client GC Availability PerformanceCollection”.  This rule is ENABLED and calls the same datasource, passing the same parameters to the script.  What this means is, if you are going to place an override on a monitor or rule, and that monitor or rule shares a script datasource with OTHER monitors or rules, you MUST ensure that you override ALL the monitors and rules that share the datasource to be the same.  This ensures that we do not break cookdown.  Situations like this SHOULD be documented in any MP guide, but these are often overlooked.

     

    The easiest solution to this issue is to disable this performance collection rule.  Especially since it is creating alert noise.  Or, you can override this rule (and the aforementioned monitor) to use a more reasonable number of GC’s.  You might consider enabling the GC availability monitor after configuring this, if this is something you are concerned with monitoring.

     

    image

     

    I just turned off my performance collection rule, since this wasn’t a valuable perf collection to the customer, and gets rid of another script running.

     

     

    Here is another example of something similar, when you enable AD Client monitoring.  There is a monitor for “AD Client Connectivity”.  This monitor uses a shared script datasource of “AD_Client_Connectivity.Datasource”  This datasource has a required parameter of “LogSuccessEvent” expecting “True” or “False”.

    This monitor is set to a default value of “false”.  HOWEVER – you find the event logs of your AD Client machines flooded every 1 minute with the event ID 5000 that the “AD Client Connectivity : The script 'AD Client Connectivity' has completed in n seconds.” 

    Log Name:      Operations Manager
    Source:        Health Service Script
    Date:          6/4/2012 4:10:42 PM
    Event ID:      5000
    Task Category: None
    Level:         Information
    Keywords:      Classic
    User:          N/A
    Computer:      RD01.opsmgr.net
    Description:
    AD Client Connectivity : The script 'AD Client Connectivity' has completed in 3 seconds.

    Why is this success event being logged, even when the monitor is set to false?  Again – the answer is multiple rules and monitors sharing a single datasource, but passing different parameters.  And again – if you are going to make a change to ONE monitor or rule, you MUST make the same change on ALL that share the same datasource.

    After a little review of the XML – we can see that the following rules and monitors reference this shared datasource:

     

    Rules:

    AD Client AD Client LDAP Bind Time Collection

    AD Client AD Client LDAP Ping Time Collection

    AD Client ADSI Client Search Time Collectrion

    Monitors:

    AD Client Connectivity Monitor

     

    What we can easily see – is that the “log success event” on the perf collection rules is set to True!  What I generally recommend, in order to support cookdown and allow a shared datasource to work properly, is to configure all overrides on the rules and monitors the same.  If you need to have different settings, such as intervals, understand that these will likely not cook down and you will have additional simultaneous scripts running, which may impact performance in some cases where a script uses considerable resources.

     

    ***Note – after further review – there is a bug in the AD_Client_Connectivity.Monitortype which causes the script to log a success event on every run – no matter what you input on the AD_Client_Connectivity.Monitor.  This is also why it logs an event every time.  The parameter is screwed up and instead of passing “False” it passes the interval in seconds.  I think it might be interesting to some on how to troubleshoot this – so I will include my steps:

    Troubleshooting a SCOM VBscript using RegMon/ProcMon:

     

    I start by looking at the script itself, hopefully if it is simple enough I can figure out what they are doing.  To find the scripts on a SCOM agent, browse to the \Program Files\System Center Operations Manager\Agent\Health Service State\ folder.  In here there will be one or more “Monitoring Host Temporary Files xx” folders.  Search the top level “Health Service State” folder in Windows Explorer for the name of the file, or “*.vbs” and scroll around until you find the script you are looking for:

     

    image

     

    COPY THE SCRIPT elsewhere to look at it and review it.  Editing this script will not fix ANYTHING because these folders are torn down and rebuilt on each launch of the MonitoringHost.exe process… extracted from the management packs. 

    I found in the script where the LogSuccessEvent gets evaluated:

    If CBool(GetTestParameter(SCRIPT_NAME, LOG_SUCCESS, bLogSuccess)) Then
      CreateEvent EVENT_ID_SUCCESS, EVENT_TYPE_INFORMATION, "The script '" & SCRIPT_NAME & _
                    "' has completed in " & DateDiff("s", dtStart, Now) & " seconds."
    End If

    So next I need to understand what this function is doing:  GetTestParameter(SCRIPT_NAME, LOG_SUCCESS, bLogSuccess)

    In that function – I can see that the script is checking a specific location in the registry – if it isn't there, then to use the params passed to the script by SCOM.  This is COOL, because this gives the ability to manually override the script, without having to go into SCOM to change something.  Normally I wouldn’t consider this a good thing, as it adds a lot of code to your scripts, but in this case it is fantastic because the MP has a bug, is sealed, so we don’t have any easy way to fix a busted monitortype.

    I load up my trusty ProcMon  http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx

    I configure it so that only Registry access is enabled:

    image

    I configure a filter so that I only see registry access from my specific VB script by name, and where the path is only near the registry location that I care about:

    image

    I enable Auto-scroll, then wait for my script to run (Hint – to speed things up you can run the script manually, or override the workflow that runs it to go very often for testing.

    From looking at the script, and the ProcMon output – I can see that we are looking for a non-existent registry key:

    HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Modules\{management group GUID}\S-1-5-18\Script\AD Management Pack\Tests\

    What I can tell is that the script is looking for these registry overrides in the following location:

    HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Modules\{management group GUID}\S-1-5-18\Script\AD Management Pack\Tests\AD Client Connectivity\

    The following parameters are supported (you can see this from the script, or from the regmon output):

      • FailureThreshold
      • LDAPPingTimeout
      • BindThreshold
      • SearchThreshold
      • LogSuccessEvent

     

    Voila!  So I create a String Value for LogSuccessEvent = False

    image

     

    And no longer do I see these success events logged on the AD Client Perspective machines.  Now, normally, I’d recommend leaving these events alone, as they are helpful for troubleshooting.  Only if you want to disable them because they are filling your logs and blocking your ability to see other SCOM events, should you likely need to turn these off.

  • Windows Server 2012: Red dashboard right after install? Run BPA

    Windows Server 2012 Server Manager has a dashboard view for all your servers you connect in Server Manager:

     

    image

     

    This dashboard shows all your connected servers, and their roles.  It also will display a red color if any of them have outstanding alerts.  In the above graphic – you can see that I have issues with “Manageability”.  You can click “Manageability” which is a hyperlink and brings up:

     

    image

     

    I can see the issue is, that BPA results are unavailable.  This is because I haven't run a BPA scan yet.  We can run a BPA scan via the UI, or PowerShell.

    In the UI – select a server or role, and scroll down to Best Practices Analyzer.  Choose Tasks, and start BPA Scan.

     

    image

     

    To do the same thing in PowerShell, see:  http://technet.microsoft.com/en-us/library/hh831400.aspx

    Open a PowerShell prompt as an Administrator.  Run “Get-BPAModel|Invoke-BPAModel” 

     

    image

    Note:  This may throw a ton of errors as it tries to inspect the BPA model for roles and features that are not present on this server.

     

    Once this completes, and we refresh Server Manager, I can see I still have issues, however, they are no longer about the BPA not being run under Manageability, they have shifted to issues with BPA results, and my Hyper-V role shows healthy:

     

    image

     

    Now – we can go into the BPA results, and determine if we want to correct each, or ignore.  I am going to exclude the results of the Filer Services not being installed:

     

     

    image

     

    Voila!  Healthy Dashboard:

     

    image

  • Windows Server 2012: Creating a NIC TEAM for Load Balancing and Failover

    One of the new features in Windows Server 2012 is Microsoft NIC Teaming.  In the past, NIC teaming was handled by the NIC vendor’s driver and management software.  This was often problematic as many issues with advanced applications and roles were caused by NIC driver and teaming issues.  Microsoft did not support NIC teaming for Hyper-V networks in the past because of this.  No longer the case, NIC teaming is now part of the Operating System.

     

    Here I have three network adapters in my Windows 2012 RC server:  One for server management, and two for Hyper-V:

     

    image

     

    To enable NIC Teaming – click the “Disabled” link next to NIC Teaming in Server Manager:

     

    image

     

    The NIC Teaming UI pops up. 

     

    image

     

    CTRL + Click each NIC that you want in a team, then from then right click (or select Tasks) and choose “Add to New Team”:

     

    image

     

    This allows you to name the team.  I called mine “Hyper-V Team”

     

    image

     

    You can also select the “Additional Properties” is you want to configure some advanced settings.  The TechNet section on teaming is here:

    http://technet.microsoft.com/en-us/library/hh831648.aspx

     

    The defaults are “Switch Independent” (no advanced configuration necessary on the switch), “Address Hash” (will enable load balancing and bandwidth aggregation), and Standby Adapter (if configured, will enable on one NIC and the other will be used for failover only).  Even though I am using these with Hyper-V, I will use the defaults.  If this were a production deployment with a high density server, I’d give much deeper consideration to these settings based on my requirements.

    Click OK, and our team config is complete!

     

    image

     

    If you look at the Network Connections, you will see a new (virtual) NIC:

     

    image

     

    You will see your Physical NIC’s only have a single binding to the Microsoft Network Adapter Multiplexor Protocol driver:

     

    image

     

    And the Virtual Team NIC has the normal bindings:

     

    image

     

    You can also config NIC Teaming via PowerShell, and even configure NIC teaming on remote servers via Server Manager!

     

    image

     

     

    So simple.  So standardized.  I love it.

    For deeper technical information:  http://technet.microsoft.com/en-us/library/hh831648.aspx

  • OpsMgr 2007 R2 CU6 rollup hotfix ships – and my experience installing it

    The Cumulative Update 6 (CU6) for OpsMgr 2007 R2 has shipped

    image

     

    Our previous cumulative update (CU5) shipped back in July 2011, so this one has been a while in the making.

     

    The KB article describing the fixes, changes, and instructions:

    http://support.microsoft.com/kb/2626076

    Get it from the download Center:

    http://www.microsoft.com/en-us/download/details.aspx?id=29850

    List of all OpsMgr R2 Cumulative Updates:

    http://support.microsoft.com/kb/2453149

     

    Here are the high level fixes:

      Cumulative Update 6 for Operations Manager 2007 R2 resolves the following issues:

      • RMS promotion fails if NetworkName and PrincipalNames are not in sync for agents.
      • UI is limited to only 100 MB for the Memory Usage field in the wizard.
      • Additional OIDs in auth certificate are not processed correctly.
      • AEM creates duplicate computer objects in OpsMgr based on Agents NetBIOS name.
      • Cannot open reporting pane on OpsMgr 2007 R2 remote console.
      • Cannot view schedule for scheduled report.
      • ManagementServerConfigTool with the option "promoterms" fails because it stops polling the SDK Service.
      • OpsMgr reports are failing on Windows 7 with the error: "Cannot initialize report."
      • ACS events have "n/a" as their category in the ACS database.
      • Watch agentless monitoring listener to detect failure to respond.
      • SCOM SDK memory leak on cryptography keys and cryptography contexts.
      • After you click Edit Schedule, a message box appears, and you cannot save the change value.
      • Audit events can be lost when the AdtServer process crashes.

     

    Cumulative Update 6 for Operations Manager 2007 R2 resolves the following cross-platform issues:

    • The installation process for the IBM AIX 6.1 agent incorrectly checks for AIX 5.3 packages.
    • After a system restart, the OpsMgr agent for Solaris may start to run before local file systems are mounted.
    • On Red Hat Linux version 4 and SUSE Linux version 9, LVM disks are not discovered and cannot be monitored.
    • The OpsMgr agent for AIX does not report the arguments for monitored processes.
    • When Microsoft security update MS12-006 is installed on an OpsMgr management server, that management server can no longer communicate with the OpsMgr agent on any Linux or UNIX server.
    • On HP-UX, OpsMgr cannot discover and monitor a logical volume that is composed of more than 127 physical volumes.

     

    Cumulative Update 6 for Operations Manager 2007 R2 adds the following cross-platform features:

    • Support for IBM AIX 7.1 (POWER).
    • Support for Oracle Solaris 11 (x86 and SPARC).

     

    Note The new agents for IBM AIX 7.1 and Oracle Solaris 11 are included in Cumulative Update 6 for Operations Manager 2007 R2. You can download the management packs for these new operating system versions by going to the following Microsoft website:

    System Center Operations Manager 2007 R2 Cross Platform Monitoring Management Packs

     

     

     

    Let’s Roll:

    So – first – I download it. The hotfix is about 1000MB.

    Now – before your heart rate starts rising…. understand… this update combines the Cross Plat CU with the OpsMgr CU. (CU3, CU4, and CU5 did this as well) Aligning these is a very good thing – but it ends up increasing the size of the initial download. No worries though – I will demonstrate how to only have to copy specific files to lessen the impact of distributing this update to all your management servers and gateways, if copying a 1GB file around is a problem for you. Read about that here: http://blogs.technet.com/b/kevinholman/archive/2010/10/12/command-line-and-software-distribution-patching-scenarios-for-applying-an-opsmgr-cumulative-update.aspx

    Next step – READ the documentation… understand all the steps required, and formulate the plan.

    I build my deployment plan based on the release notes in the KB article. My high level plan looks something like this:

    1. Backup the Operations and Warehouse databases, and all unsealed MP’s.
    2. Apply the hotfix to the RMS
    3. Run the SQL script(s) update against the OpsDB AND Warehouse DB.
    4. Import the updated management packs provided.
    5. Apply the hotfix to all secondary Management Servers.
    6. Apply the hotfix to my Gateway Servers.
    7. Apply the hotfix to my agents by approving them from pending
    8. Apply the hotfix my dedicated consoles (Terminal servers, desktop machines, etc…)
    9. Apply the hotfix to my Web Console server
    10. Apply the hotfix to my Audit collection servers
    11. Update manually installed agents…. well, manually.

    Ok – looks like 11 easy steps. This order is not set in stone – it is a recommendation based on logical order, from the release notes. For instance – if you wanted to update ALL your infrastructure before touching any agent updates – that probably makes more sense and would be fine.

    ****Requirement – as a required practice for a major update/hotfix, you should log on to your OpsMgr role servers using a domain user account that meets the following requirements:

    • OpsMgr administrator role
    • Member of the Local Administrators group on all OpsMgr role servers (RMS, MS, GW, Reporting)
    • SA (SysAdmin) privileges on the SQL server instances hosting the Operations DB and the Warehouse DB.

    These rights (especially the user account having SA priv on the DB instances) are often overlooked. These are the same rights required to install OpsMgr, and must be granted to apply major hotfixes and upgrades (like RTM>SP1, SP1>R2, etc…) Most of the time the issue I run into is that the OpsMgr admin logs on with his account which is an OpsMgr Administrator role on the OpsMgr servers, but his DBA’s do not allow him to have SA priv over the DB instances. This must be granted temporarily to his user account while performing the updates, then can be removed, just like for the initial installation of OpsMgr as documented HERE. At NO time do your service accounts for MSAA or SDK need SA (SysAdmin) priv to the DB instances…. unless you decide to log in as those accounts to perform an update (which I do not recommend).

     

    Ok, Lets get started.

    1. Backups. I run a fresh backup on my OpsDB and Warehouse DB’s – just in case something goes really wrong. Since I haven’t grabbed my RMS encryption key in a long while – I go ahead and make a backup of that too, just to make sure I have it somewhere.

    I also will take a backup of all my unsealed MP’s. You can do the backup in PowerShell, here is an example which will backup all unsealed MP’s to a folder C:\mpbackup:

    Get-ManagementPack | where {$_.Sealed -eq $false} | export-managementpack -path C:\MPBackup

    We need to do this just in case we require restoring the environment for any reason.

     

    2. Apply the hotfix to the RMS.

    Tip #1: Here is a tip that I have seen increase the success rate: Reboot your RMS/RMS nodes before starting the update. This will free up any locked processes or WMI processes that are no longer working, and reduce the chances of a timeout for a service stopping, file getting updated, etc.

    Tip #2: If you are running any SDK based connectors – it is a good idea to stop these first. Things like a Remedy product connector service, Alert Update Connector, Exchange Correlation Engine, etc… This will keep them from throwing errors like crazy when setup bounces the SDK service.

    Tip #3: If you are low on disk space, and you have previously installed prior R2-CU’s, you can uninstall those and make sure they are removed from \Program Files (x86)\System Center 2007 R2 Hotfix Utility\ directory. This can free up a substantial amount of disk space, and once applied these files are no longer necessary.

    Tip #4: If you are running the Exchange Correlation Service for the Exchange 2010 MP, it might be a good idea to disable this service during the CU update. This service uses a lot of resources and would be best to keep it out of the picture for the CU process.

    ****Note: If applying this update to a RMS cluster – FIRST see: How to apply a SCOM hotfix to a clustered RMS

    ****Note: Many people struggle with OpsMgr hotfixes – for failing to follow instructions. When applying an OpsMgr hotfix – you need to copy the downloaded MSI file (such as SystemCenterOperationsManager2007-R2CU6-KB2626076-X86-X64-IA64-ENU.MSI) to EACH and EVERY Management server and Gateway. You need to INSTALL this hotfix installer utility to EACH Management Server and Gateway. Don’t try and just copy the update MSP files. This wont work and you will fail to update some components. Common complaints are that the agents never go into pending actions, or the agent update files never get copied over to the \AgentManagement folders. In almost ALL cases, people were taking a shortcut and making assumptions. Don’t. Copy the 1GB file to each machine, then install the hotfix utility, then run the hotfix from the splash screen that comes up (this is a bootstrapper program), immediately after installing the downloaded MSI. The only acceptable alternative to this process – is to install/extract the 1GB MSI to a workstation, and then build a command line based package as described below. For memory limited test environments – the command line method is the way to go.

    Since my RMS is running Server 2008 R2 – I need to open an elevated command prompt to install any SCOM hotfixes. That is just how it is. So I launch that – and call the MSI I downloaded (SystemCenterOperationsManager2007-R2CU6-KB2626076-X86-X64-IA64-ENU.MSI). This will install the Hotfix Utility to the default location. I always recommend installing this hotfix utility to the default location. You can always uninstall the utility later to clean up disk space.

    Tip: (This part may take a LONG TIME to complete if calling the 1GB file on a system will limited memory resources. This is because it must consume 1GB of RAM to open the file, temporarily. For production systems meeting the minimum supported 4GB, this probably wont be as much of an issue. For virtualized labs and test environments where you are running very limited memory, (1-2GB RAM) you will see this process take a considerable amount of time. On my 1GB memory virtualized management servers, it would not install at all. I upped them to 2GB and they took about 10-20 minutes to open and run the setup program. See section at the end of this article **Command line install** for ideas on how to mitigate this issue if affected)

    Eventually – a splash screen comes up:

    image

    I choose Run Server Update, and rock and roll. You MUST execute the update from this “Run Server Update” UI. NO OTHER METHOD will work.

    It runs through with success, I click finish – then another setup kicks off. This is by design. There should be three actual setups running consecutively (once for the core update, one for the localization, and one for Xplat.)

    You could see this potentially three times:

    image

     

    Then wait around 30 seconds for any post install processes to complete, and then click “Exit” on the splash screen.

    image

    If you have trouble at with this stage – get some error messages, or if the installation rolls back – see the troubleshooting and known issues at the KB article and below in this post.

    If you are patching a clustered RMS – you can continue the process using the link posted above – and complete the second node.

    Now – it is time to validate the update applied correctly. I can see the following files got updated on the RMS in the standard install path: \Program Files\System Center Operations Manager 2007\

    image

    **note – this isn't all the files included in the hotfix package, just a spot check to make sure they are getting updated.

    Next I check my \AgentManagement folder. This is the folder that any agents will get updates from. I check the \x86, \AMD64, and \ia64 directories:

    image

    It is good – that our KB2626076 CU6 agent MSI’s got copied over. In this CU, we did remove the previous CU files if they existed.  (these will get moved to the root directory \Program Files\System Center Operations Manager 2007\AgentManagement folder)

     

    3. Time to run the SQL scripts. There are 2 scripts, located on the RMS, in the C:\Program Files (x86)\System Center 2007 R2 Hotfix Utility\KB2626076\SQLUpdate folder:

    • CU_Database.sql
    • CU_DataWarehouse.sql

    Let’s start with CU_Database.sql

    I simply need to open this file with SQL management studio – or edit it with notepad – copy the contents – and paste it in a query window that is connected to my Operations (OperationsManager) Database. I paste the contents of the file in my query window, it takes about a minute to complete in my lab. It will return a list of rows updated.

    Next up – we now need to connect to the Warehouse database instance, and open a new query window against the OperationsManagerDW database. We will execute CU_DataWarehouse.sql which will return “Command(s) completed successfully”.

    DO NOT skip step number 3 above, and do not continue on until this is completed.

     

     

     

    4. Next up – import the MP updates. That's easy enough. They are located at C:\Program Files (x86)\System Center 2007 R2 Hotfix Utility\KB2626076\ManagementPacks\ and are named:

    • Microsoft.SystemCenter.DataWarehouse.Report.Library
    • Microsoft.SystemCenter.WebApplication.Library.mp
    • Microsoft.SystemCenter.WSManagement.Library.mp

    These will upgrade existing MP’s in your environment. They take a few minutes each to import.

     

    At this point – if you are using cross platform monitoring for Unix agents – you would upgrade the Xplat MP’s via a separate download. See the KB article for steps on this, and potentially upgrading your Unix agents if required.

    System Center Operations Manager 2007 R2 Cross Platform Monitoring Management Packs

    This download site contains the latest MP’s for Solaris/AIX which were updated/included for CU6.  The other Xplat MP’s have not been revved since the previous CU.

     

    5. Time to apply the hotfix to my management servers. I have 1 secondary MS server which is Windows Server 2008 R2 SP1. So I open an elevated command prompt to run the hotfix utility MSI,

    Again – I MUST RUN SystemCenterOperationsManager2007-R2CU6-KB2626076-X86-X64-IA64-ENU.MSI on each Management server. This installs the hotfix utility, which will then launch the splash screen.

    Tip: (This part may take a LONG TIME to complete if calling the 1GB file on a system will limited memory resources. This is because it must consume 1GB of RAM to open the file, temporarily. For production systems meeting the minimum supported 4GB, this probably wont be much of an issue. For virtualized labs and test environments where you are running very limited memory, you will see this process take a considerable amount of time. On my 1GB memory virtualized management servers, it would not install. I upped them to 2GB and they took about 10-20 minutes to open and run the setup program. See section at the end of this article **Command line install** for ideas on how to mitigate this issue if affected)

    ***On this server I could NOT get the splash screen (bootstrapper) to show up.  I was being lazy and did not run the full MSI from an elevated command prompt.  (GASP!)  Once I opened an elevated CMD, mapped a drive to my CU6 MSI, and ran it, it worked perfectly.  Key message – don’t be lazy – run these updates from an elevated CMD – because UAC will kill you!

    Once the splash screen comes up I “Run Server Update” These all install without issue (again – three setups run consecutively). I spot check the \AgentManagement directories and the DLL versions, and all look great. REMEMBER – you can sure patch all your management servers at the same time, however, your agents WILL fail over during this time because we stop the MS HealthService during the update. Keep this in mind. It is best to update management servers one at a time, synchronously, to keep your agents from failing over to the RMS and overloading it, or causing massive Heartbeat failures because they have nowhere to report to.

     

    6. Next up – any Gateway machines here. Since my gateways all have limited memory, I don’t want to run the full 1GB MSI. I am running these from a command line which uses a LOT less resources. I build a local install package in my local C:\temp\ directory from my article at this LINK using the following command line modified for CU5:

    SetupUpdateOM.exe /x86msp:KB2626076-x86.msp /amd64msp:KB2626076-x64.msp /ia64msp:KB2626076-ia64.msp /x86locmsp:KB2626076-x86-ENU.msp /amd64locmsp:KB2626076-x64-ENU.msp /ia64locmsp:KB2626076-ia64-ENU.msp /Agent /noreboot

    I “Run Gateway Update” from the splash screen, and setup kicks off. It runs three separate installs and I see the following – 3 times:

    image_thumb1_thumb

    Remember to spot check your DLL versions and \AgentManagement directories. They both should be updated.

     

     

    7. I check my Pending Management view in the Administration pane of the console – and sure enough – all the agents that are set to “Remotely Manageable = Yes” in the console show up here pending an agent update. I approve all my agents (generally we recommend to patch no more than 200 agents at any given time.)

    After the agents update – I need to do a quick spot check to see that they are patched and good – so I use the “Patchlist” column in the HealthService state view to see that. For creating a “Patchlist” view – see LINK

    image

     

    In the above view – I can see that it does show my CU6 applied – but it left some stuff about CU5 – which is unfortunate.  What actually happened – is that we tried to install 2 components – the KB2626076-x64-Agent.msp and then the KB2626076-x64-ENU-Agent.msp.  HOWEVER – the ENU components (localization stuff) never got installed.  The reason? 

    Log Name:      Application
    Source:        MsiInstaller
    Date:          5/18/2012 5:00:55 PM
    Event ID:      1038
    Task Category: None
    Level:         Information
    Keywords:      Classic
    User:          SYSTEM
    Computer:      SCOMRS.opsmgr.net
    Description:
    Windows Installer requires a system restart. Product Name: System Center Operations Manager 2007 R2 Agent. Product Version: 6.1.7221.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Type of System Restart: 2. Reason for Restart: 1.

    This is the same issue that been a bit of a pain for ages…. the first MSP (the agent update) requires a restart of the COMPUTER because of some process that was locked and not allowing all the binaries to be updated.  Therefore – Windows Installer blocks any more MSP’s until this reboot happens.  This isn't critical (missing the ENU components) but the agent is not fully patched until you reboot it, AND then you need to manually install ENU components MSP, OR run a “repair”.  Once you reboot the monitored server, and run a repair agent from the console (or manually update the ENU components) the Patchlist looks correct:

    image

    Frustrating – but this has been an issue ever since CU3/4, where the behavior was changed to stop with the agent updates whenever a restart was required.  In CU3/4, RestartManager (a component of Windows Installer) would try to stop other services and processes in order to continue with the update, which was WAY worse than this condition.

     

    8. I have a few dedicated consoles which need updating. One is a desktop machine and the other is my terminal server which multiple people use to connect to the management group. So – I kick off the installer – and just choose “Run Server Update” as well. I do a spot check of the DLL files – and see the following was updated on the terminal server:

    image

     

    I can also perform a “Help > About” in the console itself – this will now show the update version for your console:

    image

     

     

    9. Next up – Web Consoles. I run mine on a stand alone management server, which I have already patched with CU6. So – I will simply just go check their DLL files to ensure they got updated.

    From: \Program Files\System Center Operations Manager 2007\Web Console\bin

    image

     

    Additionally – there are some manual steps needed to secure the Web Console from a client side script vulnerability, per the KB Article  (you might have already done this in a previous CU):

    Update the Web.Config file on the Web Console server role computers

    • To ensure that all cookies created by the web console cannot be accessed by client cscript, add the following configuration to the Web.Config file on each Web console server:

    <httpCookies httpOnlyCookies="true"/>

    • If the web console is configured to run under SSL, add the following configuration to ensure all cookies are encrypted:

    <httpCookies httpOnlyCookies="true" requireSSL="true"/>

    Now – ONE of these lines need to be added to your web.config file. Scroll down in that file until you see the <system.web> tag. You can add one of these on a new line IMMEDIATELY after the <system.web> line. Here is mine – before and after:

    image_thumb8 image_thumb9

    Use the correct line based on your SSL configuration status for your web console. Reboot your web console server to pick up these changes.

     

    10. At this point – I update ACS components on any ACS running Management servers that have already been patched with CU6 – but this time run the update and choose to “Run ACS Server Update”

    image

    After you update your collector…. you must run a SQL script that is included in the update.  This script will be run against your ACS database.  See the KB article for instructions.

     

    11. Manually installed agents. I have a fair bit of these… so I will do this manually, or set up a SCCM package to deploy them. Most of the time you will have manually installed agents on servers behind firewalls, or when you use AD integration for agent assignment, or when you installed manually on DC’s, or as a troubleshooting step.

    Additional Activities:

    12. Since this particular environment I am updating is going from CU5 to CU6 – I need to import the latest cross platform management packs. If I am not using and don’t desire to use OpsMgr to monitor cross platform OS’s like SUSE, RedHat, and Solaris… then I can skip this step. However, if I do want to be fully up to date for Xplat monitoring – I need to ensure I have the latest Xplat MP’s available. The ones that are version .277 are current: http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=18891

    Additionally there are some newer updated ones for RHEL6, Solaris, and AIX.

     

    13. I need to update the ACS reports, if I am using ACS. We have included in the CU6, some new reports which fix some reported issues with the reports. These can be found at:

    C:\Program Files (x86)\System Center 2007 R2 Hotfix Utility\KB2495674\ACS\Reports

    image_thumb2

    You might have already updated these in a previous CU.  They started shipping in CU5.

    At this point I would browse to my Sql Reporting Services website that hosts my ACS reports, and import these RDL’s over the existing reports, or place them in a new folder for testing, and then move them later.

    Now – the update is complete.

    image

    The next step is to implement your test plan steps. You should build a test plan for any time you make a change to your OpsMgr environment. This might include scanning the event logs on the RMS and all MS for critical and warning events… looking for anything new, or serious. Testing reporting is working, check the database for any unreasonable growth, run queries to see if anything looks bad from a most common alerts, events, perf, state perspective. Run a perfmon – and ensure your baselines are steady – and nothing is different on the database, or RMS. If you utilize any product connectors – make sure they are functioning.

    The implementation of a solid test plan is very important to change management. Please don't overlook this step.

    *** Command line install option

    In some situations, you might want to perform a command line installation of the update on your RMS/management server. Most of the time – I don’t recommend this, because you generally need the feedback if each part was successful or not. However, there are situations where it is required.

    One example is for users who have issues with the 1GB MSI file, and getting the hotfix installer running, especially on limited memory systems. For those, you can use a command line options which removes the issue.

    For additional command line options, including how to make a CU package smaller, and how to patch consoles, agents, etc…. see the KB article which contains some guidance, and the following post which contains command line package ideas from a previous CU:

    http://blogs.technet.com/b/kevinholman/archive/2010/10/12/command-line-and-software-distribution-patching-scenarios-for-applying-an-opsmgr-cumulative-update.aspx

    Known issues/Troubleshooting:

    1. New management packs cannot be edited in the authoring console after the Cumulative Update is installed
    When a new management pack is created after CU4, CU5, or CU6 is installed and then an attempt is made to edit the management pack in the Authoring console, the Authoring console cannot edit the management pack because it cannot find the latest version of the Microsoft.SystemCenter.Library Management Pack (build .61 for CU4 and build .81 for CU5 and CU6). This is resolved – please see: http://support.microsoft.com/kb/2590414

    2. CU6 fails to apply. The SDK or config service may not start after this, and CU6 fails on subsequent retries. The installation rolls back and you get a dialog box that the setup was interrupted before completion. There are two possible issues, with workarounds to this. One is caused by a general timeout, the other is a .NET 2.0 Issue due to a CRL response delay. Start with workaround “#1” and if that fails, try workaround “#2”. #2 is a fairly rare condition.

    Workaround #1:

    The services are timing out while trying to start. Using http://support.microsoft.com/kb/922918 set the ServicesPipeTimeout entry for all services to have 3 minutes (180000 milliseconds) and REBOOT the server. Then try and apply CU4. It should apply. You likely will see a few warning messages about failure to start the OMCFG service – just click ok and the setup will continue.

    Workaround #2:

    Using Follow the steps that are outlined in Microsoft Knowledge Base article KB936707

    ***Note: This hotfix likely will not be required. The hotfix is ONLY required if you are still running .NET 2.0 RTM. This hotfix is included in .NET 2.0SP1 and later. The hotfix does not resolve the issue, simply put – the hotfix (or .NET 2.0SP1 or later) simply ENABLES the use of a new tag in XML which will allow for disabling of CRL checking. If your RMS is on Windows Server 2008 or 2008R2 – you already have this hotfix included.

    ***Note: Once you have verified you have .NET 2.0 SP1 or later installed – you MUST perform the second step – which involves editing 2 application.exe.config files. The KB article is misleading in that it tells you to add this information as an entire section – which is incorrect – you must find the <runtime> section in your existing config files – and add a SINGLE new line to that existing section.

    The manifest files are located on the RMS at the \Program Files\System Center Operations Manager 2007\ root directory. The manifest files will need to be edited for the config and sdk service on affected RMS. The file names are:

    • Microsoft.Mom.Sdk.ServiceHost.exe.config
    • Microsoft.Mom.ConfigServiceHost.exe.config

    In between the EXISTING <runtime> and </runtime> lines – you need to ADD a NEW LINE with the following:

    <generatePublisherEvidence enabled="false"/>

    This solution disables CRL checking for the specified execute-ables, permanently.

    3. Agent patchlist information incomplete, or CU6 patching failure. The agent Patchlist is showing parts of CU6, or CU5 but also CU4, CU3, CU2 or CU1 or nothing. The CU6 localization ENU update is not showing in patchlist. This appears to be related to the agents needing a reboot required by Windows Installer from a previous installation package. Once they are rebooted, and a repair initiated, the patchlist column looks correct with the CU6 and CU6 ENU (localized) information. The correct and complete patchlist information will appear as below:

    System Center Operations Manager 2007 R2 Cumulative Update 6 (KB2626076); System Center Operations Manager 2007 R2 Cumulative Update 6 (KB2626076) - ENU Components

    If you apply Cumulative Update 3 or 4 for Operations Manager 2007 R2, the pushed agent may not display the update list correctly. This issue occurs because the agent updates in Cumulative Update 3/4 for Operations Manager 2007 R2 may require a restart operation and then a repair operation. If you do not restart these servers after you apply Cumulative Update 3/4 for Operations Manager 2007 R2, the agent updates in Cumulative Update 6 for Operations Manager 2007 R2 are not applied. However, the restart required state is set on these computers. Therefore, you have to restart these computers and then repair the agent to apply the updates in Cumulative Update 6 for Operations Manager 2007 R2.

  • OpsMgr 2012: Cumulative Update Rollup 1 (UR1) ships–and my experience installing it

     

    image

     

     

    Cumulative Update Rollup 1 (UR1) for OpsMgr 2012 has shipped.

    Download:  http://www.microsoft.com/en-us/download/details.aspx?id=29697

    KB link:  http://support.microsoft.com/kb/2686249

     

    Here is a list of the fixes:

    • Environment crashes in Operations Manager due to RoleInstanceStatusProbe module in AzureMP.
    • When multiple (2-3) consoles are running on the same computer under the same user account the consoles may crash.
    • Cannot start or stop tracing for Reporting and Web Console if they were installed to a standalone IIS server.
    • Connected Group alert viewing is not working but no error is given in console.
    • Task result - CompletedWithInfo not supported with the SDK2007 assemblies.
    • SeriesFactory and Microsoft.SystemCenter.Visualization.DatatoSeriesController need to be public to allow the controls extensibility and reuse.
    • WebConsole is not FIPS compliant out of the box.
    • Network Dashboard should overlay Availability when displaying health state.
    • Dashboards: Group picker does not show all groups in large environment.
    • IIS Discovery: prevent GetAdminSection from failing when framework version was detected incorrectly by IIS API.
    • Performance Counters do not show up in Application list view of AppDiagnostics.
    • Console crashes when state view with self-contained object class is opened.
    • PerformanceWidget displays stale 'last value' in the legend due to core data access DataModel.
    • Availability Report and "Computer Not Reachable" Monitor show incorrect data.
    • Agent install fails on Win8 Core due to dependency on .Net framework 2.0.
    • Web Services Availability Monitoring Wizard - Console crashes if wizard finishes before test has finished.
    • Several Powershell changes needed:
      • Changed License parameter in Get-SCOMAccessLicense to ShowLicense
      • Changed SCOMConnectorForTier cmdlets to SCOMTierConnector
      • Some formatting changes
    • Update Rollup 1 for System Center 2012 – Operations Manager resolves the following issues for UNIX and Linux monitoring:
      • Schannel error events are logged to the System Event Log on Operations Manager Management Servers and Gateways that manage UNIX/Linux agents.
      • On HP-UX, Operations Manager cannot discover and monitor a logical volume composed of more than 127 physical volumes
      • Upgrade of UNIX and Linux Agents fails when using Run As credentials in the Agent Upgrade Wizard or Update-SCXAgent PowerShell Cmdlet
    • Update Rollup 1 for System Center 2012 – Operations Manager adds the following feature:
      • Support for Oracle Solaris 11 (x86 and SPARC)

     

     

    Let’s Roll:

    So – first – I download it. The hotfix is ONLY about 76 megabytes!!!  This is a HUGE improvement over the 1.2GB CU’s for SCOM 2007 – lets hope it stays this way!

    The KB instructions show me that CU1 for SCOM 2012 has changed in a very fundamental way.  No longer is there a bootstrapper program for the CU.  Now – the package simply extracts the files to a directory.  We will then run each MSP file independently based on whatever server roles are installed. 

     

    Next step – READ the documentation… understand all the steps required, and formulate the plan.

    I build my deployment plan based on the release notes in the KB article. My high level plan looks something like this:

    1. Backup the Operations and Warehouse databases, and all unsealed MP’s.
    2. Apply the hotfix to all the Management Servers
    3. Apply the hotfix to my Gateway Servers.
    4. Apply the hotfix to the OpsMgr Reporting server.
    5. Apply the hotfix to my Web Console server
    6. Apply the hotfix my dedicated consoles (Terminal servers, desktop admin machines, etc…)
    7. Import the management pack updates
    8. Agents:  Apply the hotfix to my agents by approving them from pending.
    9. Agents:  Update manually installed agents…. well, manually.
    10. Unix/Linux:  Download and import the updated MP’s for Unix/Linux monitoring.
    11. Unix/Linux:  Upgrade the Unix/Linux agent providers.

     

    Ok – looks like 11 easy steps. This order is not set in stone – it is a recommendation based on logical order, from the release notes.

     

    ****Requirement – as a required practice for a major update/hotfix, you should log on to your OpsMgr role servers using a domain user account that meets the following requirements:

    • OpsMgr administrator role
    • Member of the Local Administrators group on all OpsMgr role servers (RMS, MS, GW, Reporting)
    • SA (SysAdmin) privileges on the SQL server instances hosting the Operations DB and the Warehouse DB.

    These rights (especially the user account having SA priv on the DB instances) are often overlooked. These are the same rights required to install OpsMgr, and must be granted to apply major hotfixes and upgrades (like RTM>SP1, SP1>R2, etc…) Most of the time the issue I run into is that the OpsMgr admin logs on with his account which is an OpsMgr Administrator role on the OpsMgr servers, but his DBA’s do not allow him to have SA priv over the DB instances. This must be granted temporarily to his user account while performing the updates, then can be removed, just like for the initial installation of OpsMgr as documented HERE. At NO time do your service accounts for MSAA or SDK need SA (SysAdmin) priv to the DB instances…. unless you decide to log in as those accounts to perform an update (which I do not recommend).

     

    Ok, Lets get started.

    1.   Backups. I run a fresh backup on my OpsDB and Warehouse DB’s – just in case something goes really wrong.

    I also will take a backup of all my unsealed MP’s. You can do the backup in PowerShell, here is an example which will backup all unsealed MP’s to a folder C:\mpbackup:

    Get-SCOMManagementPack | where {$_.Sealed -eq $false} | export-SCOMmanagementpack -path C:\MPBackup

    We need to do this just in case we require restoring the environment for any reason.

     

    2. Apply the hotfix to the Management Servers.

    Pro Tip #1:  Here is a tip that I have seen increase the success rate: Reboot your Management Server nodes before starting the update. This will free up any locked processes or WMI processes that are no longer working, and reduce the chances of a timeout for a service stopping, file getting updated, etc.  This is not a requirement, just something to consider if you have had issues applying such a fix.

    Pro Tip #2:   If you are running any SDK based connectors – it is a good idea to stop these first. Things like a Remedy product connector service, Alert Update Connector, Exchange Correlation Engine, etc… This will keep them from throwing errors like crazy when setup bounces the SDK services.

    I start by running the download, SystemCenter2012OperationsManager-CU1-KB2674695-X86-X64-IA64-ENU.exe.  This is simply an extractor.  You can run this anywhere, and provide a location for the update.  No longer do we need to run this extractor on each server role.  We can now simply extract these files to a network share, and update all server roles from there.

    Here are the files after extraction:

    image

     

     

    I start on my first management server, OMMS1.  This has the Server role, Web Console Role, and Console installed.  So I will need to run 3 MSP’s. 

    I will start by running KB2674695-AMD64-Server.msp.  Right click the file and choose “apply” or call it from an elevated CMD.

    Pro Tip: Open an ELEVATED COMMAND PROMPT and run these MSP' files from the command prompt.  User Access Control (UAC) will still block a successful install in some cases, so you will experience greater success if you always run updates from an elevated CMD.

     

    You will see a dialogue box like below:

    image

    image

    The server update goes pretty quickly.  It bounced the DAS, Config, and SC Mgmt services during the update.

     

    Next, I will update the web console by running KB2674695-AMD64-WebConsole.msp.  This only takes a few seconds.

     

    Next, I will update the Console files.  I need to make sure that I close any open consoles on this server, from my session or any other logged in sessions via RDP.  I will apply KB2674695-AMD64-Console.msp.  Again – this only takes a few seconds to run.

    Now – I am done updating all three installed server roles on this server.  I can spot check to make sure the files got updated:

    Server role update:  From \Program Files\System Center 2012\Operations Manager\Server

    image

    Web Console role update:  From \Program Files\System Center 2012\Operations Manager\WebConsole\WebHost\bin

    image

    Console role update:  From \Program Files\System Center 2012\Operations Manager\Console

    image

     

    Next – I am ready to update the rest of my management servers.  I only have one other MS – OMMS2.  This only runs the console and server roles, so only two MSP’s to apply.

    I apply each MSP, takes a couple minutes.

    Another spot-check to make, is to each that the Agent binaries get updated on each Management Server role, for Windows agents.  These are located at: \Program Files\System Center 2012\Operations Manager\Server\AgentManagement\ in the AMD64 or i386 directory:

    Check to ensure that the CU dropped the appropriate agent update file, such as KB2674695-amd64-Agent.msp.

    image

     

     

     

    3.  Apply the hotfix to my Gateway Servers.

    I don’t have any gateways in my lab right now – but this would be a very simple execution of the KB2674695-AMD64-Gateway.msp file.

     

    4.  Apply the hotfix to the OpsMgr Reporting server.

    I have SCOM reporting and SSRS installed on a SQL server, DB01. 

    I log on and apply KB2674695-AMD64-Reporting.msp.  This runs in a few seconds.

    Spot check:  From \Program Files\System Center 2012\Operations Manager\Reporting\Tools

    image

     

    5.  Apply the hotfix to my Web Console server

    I already did this in step 2 – but I could have waited and done this now, or patched any dedicated Web Console servers.

     

    6.  Apply the hotfix my dedicated consoles (Terminal servers, desktop machines, etc…)

    I need to apply this update anywhere the console is installed – including consoles installed on management servers.  I have already updated the console patch on my management servers OMMS1 and OMMS2.  Now I have a Terminal Server where I run the console – so I will patch this server now, which only runs the console.  Make sure you close the console and close any other user sessions out – or you might require a reboot to finish the update for the console files which will be locked by an open console.  This also includes any open Powershell windows connected to the SDK.  You will get a warning if setup detects any.  You can run the same spotcheck for the file version update as above.  (Note – help/about will not show the updated version in the console – this stays the same major version, so don’t go looking there.)

     

    7.  Import the management pack updates

    Open a console – and import the files previously extracted:

    Microsoft.SystemCenter.DataWarehouse.Library.mp

    Microsoft.SystemCenter.Visualization.Library.mpb

    Microsoft.SystemCenter.WebApplicationSolutions.Library.mpb

    image

    These all import in a few minutes without a hitch.

     

    8.  Agents: Apply the hotfix to my agents by approving them from pending

    I open the console – Administration > Device Management > Pending Management, and see all my agents in pending that are not manually installed (Remotely Manageable = Yes)

    Let me stop and talk about how agents get into Pending.  This is a ONE TIME operation, which is created at the time that you run the CU on a management server.  What will happen, is that the CU runtime will look for all agents ASSIGNED to that management server, that are Remotely Manageable (not manually installed) and will put ONLY those agents into pending at that time.  We will not ever go back and re-inspect to put old agents into pending, because this is not a SCOM workflow that handles this.  It is done only by applying the update.  If you don’t have agents in pending – you either aren't running the MSP in an elevated fashion, or you aren't running the update as a user that is BOTH a SCOM admin, Local OS Admin, and SQL Systems Administrator role to the databases.

    image

     

    I right click all mine – provide an account that has rights to deploy/install the update remotely, and kick it off.

    100% Success!

    How can I tell?  Open the Console, Monitoring > Operations Manager > Agent Details > Agents by Version (state view):

    image

     

    9.  Agents: Update manually installed agents…. well, manually.

    I simply run the file KB2674695-AMD64-Agent.msp or KB2674695-i386-Agent.msp depending on which OS version (64bit or 32bit) I am updating.  I can deploy these manually, or via software distribution packaging up each MSP and applying them to the correct OS by version/architecture.

     

    10.  Unix/Linux: Download and import the updated MP’s for Unix/Linux monitoring

    The Unix/Linux update files are located in a separate download.  This includes new versions of the management packs, and updated agent binaries.  I go to the link in the KB article and grab these update files.  Monitoring Pack for UNIX and Linux Operating Systems.msi.  It is the typical MP installer/extractor program.  Run setup and extract these files to your preferred location.  I generally just recommend installing/extracting these to your default location on any workstation/server, and then copying the extracted files to your OpsMgr software/MP/Updates network share.

    Download link:  http://www.microsoft.com/en-us/download/details.aspx?id=29696

    Next – import all the files for your versions f Unix/Linux Agents that you support.  The RTM version of the Unix/Linux MP’s was 7.3.2026.0.  The CU1 version is 7.3.2097.0.  This also includes a new MP to add support for Solaris 11.

    ***Note – several of these new MP files are actually in the new .MPB format.  This new format allows management packs to take advantage of the new schema extensions in System Center 2012, to be able to transfer binaries among other items.  Such as the agent binaries

     

    image

     

    11.  Unix/Linux: Upgrade the Unix/Linux agent providers

    One you have imported the updated MP files for Unix/Linux agents, spot check that these files got updated on your management servers.  Look in the following folder:

    \Program Files\System Center 2012\Operations Manager\Server\AgentManagement\UnixAgents\DownloadedKits

    image

     

    Notice the new update files are version 1.3.0-214.  Also, it may take some time for this folder/files to show up, after importing the MP’s, as your database and management servers will be experiencing high CPU utilization from SDK and Config activities, and this is normal.  So be patient.  You need to ensure that these files are updated on any members of your management server resource pool that monitors Unix/Linux agents before continuing.

    Now – to update my Unix/Linux agents.  I can use the Update-SCXAgent cmdlet in powershell, or I can use the Administration wizard.  I right click my version .206 agent, and choose Upgrade Agent.

     

    image

    image

    You can use an existing RunAs account profile if you have configured these with an agent upgrade account that has the necessary rights, or you can input new credentials using an account that has SSH privileges in order to remotely install software on the Unix/Linux agent machine.   If all goes well – you get a success:

    image

     

    image

     

     

    Now – the update is complete.

    image

     

     

     

     

    The next step is to implement your test plan steps. You should build a test plan for any time you make a change to your OpsMgr environment. This might include scanning the event logs on the Management servers for critical and warning events… looking for anything new, or serious.  Testing reporting is working, test your web console, check the database for any unreasonable growth, run queries to see if anything looks bad from a most common alerts, events, perf, state perspective.  Run a perfmon – and ensure your baselines are steady – and nothing is different on the database, or management servers.  If you utilize any product connectors – make sure they are functioning.

    The implementation of a solid test plan is very important to change management. Please don't overlook this step.

  • OpsMgr 2012: Configure notifications

    Setting up notifications for email, IM, or command channels is almost identical to how this was configured in OpsMgr 2007 R2.  This article will just serve as a walk through to the process, such as immediately after deploying OpsMgr 2012.  The key difference here is that Notifications are now managed by a Resource Pool, instead of just depending on the RMS.

     

    Notifications in OpsMgr are made of of three primary components – the Channel, Subscriber, and the Subscription.  The Channel is the mechanism that we want to notify by, such as Email.  The subscriber is the person or distribution list we want to send to, and the subscription is a definition of criteria around what should be sent.

     

    The SMTP Channel:

     

    We will first need to create the channel:  Under Administration pane > Notifications > Channels.  Right click and choose New channel > Email (SMTP)

    image

     

    Give your channel a name.  We might have multiple email channels.  Once for emails to our primary work mailboxes.  Maybe another with different formatting for sending email to cell phones and pager devices.  Lets just call this one our “Default SMTP Channel”

    image

     

    Click Add, and type in the FQDN of your SMTP server(s).  This can be an actual SMTP enabled mail server, or a load balanced virtual name.

    I am going to select “Windows Integrated” for my Authentication mechanism, since my mail server does not allow Anonymous connections.

    image

     

    For the Return Address – I have created an actual mail enabled user to send Email notifications through SCOM.  This might not be a requirement to be a real mail address – mostly that depends on your mail server security policies.

    image

     

    Next up is the email format.  We can customize this with very specific information that is relevant to how we want emails to look from SCOM.  I will just accept the defaults for now.  I can always come back and customize this one, or create additional channels with different formats later.

     

     

    The Subscriber:

    Next up – creating the subscriber.  Right Click “Subscribers” and choose “New Subscriber”

    This will default to show your domain account.  You can change this to whatever you like:

    image

    Next – we need to choose when Kevin wants to receive email notifications.  This is especially important for things like on call pager devices, or when people work shifts and only want to see emails during certain times.

    Next – we need to add an email address to the subscriber.  I will add my default work email:

    image

    Then select the Channel type, and the email address:

    image

     

    Additionally – you can configure a specific schedule for this specific address.  The previous schedule was for the subscriber itself, but a subscriber can have multiple addresses with different schedules if needed.  I will keep things simple and choose “Always send”.   Click Finish a couple times and your subscriber is set up.

     

    The Subscription:

     

    Now we create a new subscription – Right Click “Subscriptions” and choose New Subscription.

    Give your subscription a descriptive name that describes what it is and who it is to.  Like – “Messaging team – all critical email alerts”  Here is mine:

    image

     

    On the criteria screen – we have some very granular capabilities to scope this subscription.  My goal for this simple one is just to send me any new critical alert that comes into my environment:

     

    image

     

    Next we add the subscribers to the subscription:

     

    image

     

    We also need to choose which Channel we want to use for this subscription:

     

    image

     

    On this same screen – there is an option for delay aging:

    image

     

    What that does – is allow for you to have multiple alert subscriptions – and using delay – create an escalation path if an alert is not modified in a way that takes it out of the notification path for these subscriptions.

    Click “Finish” and we are all set.  Behind the scenes – what happened is that all this information was actually written to a special management pack – the Microsoft.SystemCenter.Notifications.Internal MP.

    Let’s test our work.

     

    I have a test rule that generates a critical alert whenever a specific event is written to the event log.  Since I subscribed to all critical alerts – this should trigger my subscription and deliver an email:

     

    It worked!

    image

     

     

    Advanced configuration – setting up a Run As Account to authenticate to the SMTP server:

     

    Note – there is a Run-As Profile that ships with SCOM called the “Notification Account”.  If this is not configured, SCOM will try to authenticate to the Exchange server using the Management Server Action Account.  If this is not allowed to authenticate, you might need to configure this Run-As profile with a Run As Account.

    For instance – I disabled the ability for mail relay on my Exchange server.  When I do this – only mail enabled Exchange servers can connect to it.  Subsequent notifications fail to go through – and I will see two possible alerts in the console:

    Failed to send notification

    Notification subsystem failed to send notification over 'Smtp' protocol to 'kevinhol@opsmgr.net'. Rule id: Subscription02e8b6be_528d_407c_8edf_5f29dddaae6b

    Failed to send notification using server/device

    Notification subsystem failed to send notification using device/server 'ex10mb1.opsmgr.net' over 'Smtp' protocol to 'kevinhol@opsmgr.net'. Microsoft.EnterpriseManagement.HealthService.Modules.Notification.SmtpNotificationException: Mailbox unavailable. The server response was: 5.7.1 Client does not have permissions to send as this sender. Smtp status code 'MailboxUnavailable'. Rule id: Subscription02e8b6be_528d_407c_8edf_5f29dddaae6b

    In this case – I must configure the Run-As account with a credential that is able to authenticate properly with my Mail Server.  I already have a user account and mailbox set up:  OPSMGR\scomnotify

    Under Administration > Run As Configuration > Accounts – create a Run As Account.

    The account type will be “Windows” and give it a name that makes sense:

    image

    Input the user account credentials:

    image

    Choose “More Secure” and click Next, then Close.

     

    So – we have created our Run As Account – next we need to choose where to distribute it.  Account credential distribution is part of the “More Secure” option – we need to choose which Health Services will be allowed to use this credential.  In this case – we want to distribute the account to the management server pool in SCOM 2012 that handles notifications.

    Open the properties of our newly created action account, and select the Distribution tab:

    image

     

    Click “Add”, and in the Option field – change it to “Search by Resource Pool Name” and click Search:

    image

     

    Choose the Notifications Resource Pool, click Add, and OK:

     

    image

     

    Now we have created our Run As Account for notifications, and then distributed it to the Notifications Resource Pool (which contains all management servers dynamically)

    Next – we need to configure the Run As Profile – which will associate this account credential with the actual Notification workflows.

    Under Administration > Run As Configuration > Profiles, find the “Notification Account” profile.  Open the properties of this Profile.

    Under Run As Accounts – click Add:

    image

     

    Select our Notification Run As Account, and click OK

    image

    Then Save it.  This will update the Microsoft.SystemCenter.SecureReferenceOverride MP with these credentials and configurations for notification workflows.

    From this point forward – Whichever Management server in the Notifications Resource Pool that is currently responsible for handling notifications, will spawn a MonitoringHost.exe process under our credential that we configured:

    image

     

    This credential will be used to authenticate to the Exchange server to send SMTP notifications.  Now my email notifications are flowing smoothly once again!  If the current management server goes down, another management server in the Notifications Resource Pool will pick up this responsibility and spawn the process, and continue sending notifications. 

     

    High availability out of the box.  One of the benefits of the improved SCOM 2012 architecture improvements.

  • Orchestrator 2012: a quickstart deployment guide

    System Center Orchestrator 2012 is extremely easy to setup and deploy.  There are only a handful of prerequisites, and most can be handled by the setup installer routine.

     

    The TechNet documentation does an excellent job of detailing the system requirements and deployment process:

    http://technet.microsoft.com/en-us/library/hh420337.aspx

     

    The following document will cover a basic install of System Center Orchestrator 2012 at a generic customer.  This is to be used as a template only, for a customer to implement as their own pilot or POC deployment guide.  It is intended to be general in nature and will require the customer to modify it to suit their specific data and processes.

    SCORCH can be scaled to match the customer requirements. This document will cover a typical two server model, where all server roles are installed on a single VM, and utilize a remote database server or cluster.

    This is not an architecture guide or intended to be a design guide in any way.

    Definitions:

    SCORCH          System Center Orchestrator

    Server Names\Roles:

    SCORCH          Orchestrator 2012 role server

    • Management Server
    • Runbook Server
    • Orchestrator Web Service Server
    • Runbook Designer client application
    • Windows Server 2008 R2 SP1 Enterprise edition will be installed as the base OS for all platforms.
    • All servers will be a member of the AD domain.
    • SQL 2008 R2 ENT edition with SP1 will be the base standard for all database services. SCORCH only requires a SQL DB engine (locally or remote) in order to host SCORCH databases.

     

    High Level Deployment Process:

     

    1.  In AD, create the following accounts and groups, according to your naming convention:

    a.  DOMAIN\scorchsvc                       SCORCH Mgmt, Runbook, and Monitor Account

    b.  DOMAIN\ScorchUsers                 SCORCH users security global group

    2.  Add the domain user accounts for yourself and your team to the ScorchUsers group.

    3.  Install Windows Server 2008 R2 SP1 to all server role members.

    4.  Add the DOMAIN\scorchsvc account to the local administrators group on the SCORCH server.

    5.  Add the DOMAIN\ScorchUsers global group to the local administrators group on the SCORCH server.

    6.  Install the SCORCH Server.

     

    Prerequisites:

    1.  Install Windows Server 2008R2 SP1

    2.  Ensure server has a minimum of 1GB of RAM.

    3.  .Net 3.5SP1 is required. Setup will add this feature if not installed.

    4.  IIS7 (IIS Role) is required. Setup will add this role is not installed.

    5.  .Net 4.0 is required. This must be installed manually on Server 2008 R2 SP1. Download and install this prereq.

    6.  Install all available Windows Updates as a best practice.

    7.  Join all servers to domain.

    8.  Add the “DOMAIN\scorchsvc” domain account explicitly to the Local Administrators group on the SCORCH server.

    9.  Add the “DOMAIN\ScorchUsers” global group explicitly to the Local Administrators group on the SCORCH server.

     

    Step by step deployment guide:

    1.  Install SCORCH 2012:

    • Log on using your domain user account that is a member of the ScorchUsers group.
    • Run Setuporchestrator.exe
    • Click Install
    • Supply a name, org, and license key (if you have one) and click Next.
    • Accept the license agreement and click Next.
    • Check all boxes on the getting started screen, for:
      • Management Server
      • Runbook Server
      • Orchestration console and web service
      • Runbook Designer
    • On the Prerequisites screen, check the boxes to remediate any necessary prerequisites, and click Next when all prerequisites are installed.
    • Input the service account “scorchsvc” and input the password, domain, and click Test. Ensure this is a success and click Next.
    • Configure the database server. Type in the local computer name if you installed SQL on this SCORCH Server, or provide a remote SQL server (and instance if using a named instance) to which you have the “System Administrator” (SA) rights to in order to create the SCORCH database and assign permissions to it. Test the database connection and click Next.
    • Specify a new database, Orchestrator. Click Next.
    • Browse AD and select your domain global group for ScorchUsers. Click Next.
    • Accept defaults for the SCORCH Web service ports of 81 and 82, Click Next.
    • Accept default location for install and Click Next.
    • Select the appropriate options for Customer Experience and Error reporting. Click Next.
    • Click Install.
    • Setup will install all roles, create the Orchestrator database, and complete very quickly.

    2. Open the consoles.

    • Start > Microsoft System Center 2012 > Orchestrator
    • Open the Deployment Manager, Orchestration Console, and Runbook designer. Ensure all consoles open successfully.

     

    Post install procedures:

     

    1.  Lets register and then deploy Integration Packs that enable Orchestrator to connect to so many outside systems.

    Go to http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=28725 and download the toolkit, add-ons, and IP’s.

    • Make a directory on the local SCORCH server such as “C:\Integration Packs”
    • Copy to this directory, the downloaded IP’s, such as the following:
      • SC2012_Configuration_Manager_Integration_Pack.oip
      • SC2012_Data_Protection_Manager_Integration_Pack.oip
      • SC2012_Operations_Manager_Integration_Pack.oip
      • SC2012_Service_Manager_Integration_Pack.oip
      • SC2012_Virtual_Machine_Manager_Integration_Pack.oip
      • Configuration_Manager_2007_Integration_Pack.oip
      • Data_Protection_Manager_2010_Integration_Pack.oip
      • Operations_Manager_2007_Integration_Pack.oip
      • Service_Manager_2010_Integration_Pack.oip
      • Virtual_Machine_Manager_2008_Integration_Pack.oip
    • Open the Deployment Manager console
    • Expand “Orchestrator Management Server
    • Right click “Integration Packs” and choose “Register IP with the Orchestrator Management Server
    • Click Next, then “Add”.  Browse to “C:\Integration Packs” and select all of the OIP files you copied here.  You have to select one at a time and go back and click “Add” again to get them all.
    • Click Next, then Finish.  You have to accept the License Agreement for each IP. 
    • Now when you select “Integration Packs” you can see these 10 IP’s in the list.
    • Right Click “Integration Packs” again, this time choose “Deploy IP to Runbook server or Runbook Designer”.
    • Click Next, select all the available IP’s and click Next.
    • Type in the name of your Runbook server role name, and click Add.
    • On the scheduling screen – accept the default (which will deploy immediately) and click Next.
    • Click Finish.  Note the logging of each step in the Log entries section of the console.
    • Verify deployment by expanding “Runbook Servers” in the console.  Verify that each runbook was deployed.
    • Open the Runbook Designer console.
    • Note that you now have these new IP’s available in the console for your workflows.

     

    Additionally – you can download more IP’s at:

    http://technet.microsoft.com/en-us/library/hh295851.aspx

    Such as the VMware VSphere IP, or the IBM Netcool IP.

    Additionally – check out Charles Joy’s blog on popular codeplex IP’s which have been updated for Orchestrator:

    http://blogs.technet.com/b/charlesjoy/

  • OpsMgr 2012: a quickstart deployment guide

    There is already a very good deployment guide posted on TechNet here:  http://technet.microsoft.com/en-us/library/hh457006.aspx  The TechNet deployment guide provides an excellent walkthrough of installing OpsMgr 2012 for the “all in one” scenario, where all roles are installed on a single server.  That is a very good method for doing simple functionality testing and lab exercises.

    The following article will cover a basic install of System Center Operations Manager 2012 as well.   The concept is to perform a limited deployment of OpsMgr, only utilizing as few servers as possible, but enough to demonstrate the new roles and capabilities in OM2012.  For this reason, this document will cover a deployment on 3 servers. A dedicated SQL server, and two management servers will be deployed.  This will allow us to show the benefits of the RMS removal, and the management server pools concepts.  This is to be used as a template only, for a customer to implement as their own pilot or POC, or customized deployment guide. It is intended to be general in nature and will require the customer to modify it to suit their specific data and processes.

    This also happens to be a very typical scenario for small environments for a production deployment.  This is not an architecture guide or intended to be a design guide in any way. This is provided "AS IS" with no warranties, and confers no rights. Use is subject to the terms specified in the Terms of Use.

    Definitions:

    • MS - Management Server
    • SRS - SQL reporting services

    Server Names\Roles:

    • DB01          SQL 2008 R2 Database Services, Reporting Services
    • OMMS1    Management Server, Web Console server
    • OMMS2    Management Server

     

    Windows Server 2008 R2 SP1 Enterprise edition will be installed as the base OS for all platforms.  All servers will be a member of the AD domain.

    SQL 2008 R2 ENT edition with SP1 will be the base standard for all database and SQL reporting services.  (Note:  SP1 is not technically required, however it is strongly recommended to always apply the latest *supported* SP and CU to SQL when deploying.)

     

     

    High Level Deployment Process:

    1.  In AD, create the following accounts and groups, according to your naming convention:

    • DOMAIN\OMAA                 OM Server action account
    • DOMAIN\OMDAS               OM Config and Data Access service account
    • DOMAIN\OMWRITE          OM Reporting Write account
    • DOMAIN\OMREAD            OM Reporting Read account
    • DOMAIN\SQLSVC               SQL 2008 service account
    • DOMAIN\SCOMAdmins   OM Administrators security group

    2.  Add the “OMAA” account and the “OMDAS” account to the “SCOMAdmins” global group.

    3.  Add the domain user accounts for yourself and your team to the “SCOMAdmins” group.

    4.  Install Windows Server 2008 R2 SP1 to all server role servers.

    5.  Install Prerequisites and SQL 2008.

    6.  Install the Management Server and Database Components

    7.  Install the Reporting components.

    8.  Deploy Agents

    9.  Import Management packs

    10.  Set up security (roles and run-as accounts)

     

    Prerequisites:

    1.  Install Windows Server 2008R2 SP1 to all Servers

    2.  Add the .NET 3.5.1 feature to windows. Use the Server Manager UI, or use PowerShell:

    Open PowerShell (as an administrator) and run the following:

    Import-Module ServerManager

    <then>

    Add-WindowsFeature NET-Framework-Core

    3.  Install .NET 4.0 to all servers

    4.  Install the Report Viewer controls to all Management Servers. Install them from http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=6442

    5.  Install all available Windows Updates.

    6.  Join all servers to domain.

    7.  Add the “SCOMAdmins” domain global group to the Local Administrators group on each server.

    8.  Install IIS on any management server that will also host a web console:

    Open PowerShell (as an administrator) and run the following:

    Import-Module ServerManager

    <then>

    Add-WindowsFeature NET-Framework-Core,Web-Static-Content,Web-Default-Doc,Web-Dir-Browsing,Web-Http-Errors,Web-Http-Logging,Web-Request-Monitor,Web-Filtering,Web-Stat-Compression,Web-Mgmt-Console,Web-Metabase,Web-Asp-Net,Web-Windows-Auth -Restart

    9. Install SQL 2008 R2 to the DB server role

    • Setup is fairly straightforward. This document will not go into details and best practices for SQL configuration. Consult your DBA team to ensure your SQL deployment is configured for best practices according to your corporate standards.
    • Run setup, choose Installation > New Installation…
    • When prompted for feature selection, install ALL of the following:
      • Database Engine Services
      • Full-Text Search
      • Reporting Services
    • Optionally – consider adding the following to ease administration:
      • Business Intelligence Development Studio (for custom report development)
      • Management Tools – Basic and Complete (for running queries and configuring SQL services)
    • On the Instance configuration, choose a default instance, or a named instance. Default instances are fine for testing and labs. Production clustered instances of SQL will generally be a named instance. For the purposes of the POC, choose default instance to keep things simple.
    • On the Server configuration screen, set SQL Server Agent to Automatic. Click “Use the same account for all SQL Server Services, and input the SQL service account and password we created earlier.
    • On the Collation Tab – make sure SQL_Latin1_General_CP1_CI_AS is selected, as that is the ONLY collation supported.
    • On the Account provisioning tab – add your personal domain user account or a group you already have set up for SQL admins. Alternatively, you can use the OMAdmins global group here. This will grant more rights than is required to all OMAdmin accounts, but is fine for testing purposes of the POC.
    • On the Data Directories tab – set your drive letters correctly for your SQL databases, logs, TempDB, and backup.
    • On the Reporting Services Configuration – choose to install the native mode default configuration. This will install and configure SRS to be active on this server, and use the default DBengine present to house the reporting server databases. This is the simplest configuration. If you install Reporting Services on a stand-alone (no DBEngine) server, you will need to configure this manually.
    • Setup will complete.
    • Apply SQL 2008 R2 SP1.
    • The update is very straightforward. Accept the defaults and update all features. When complete, reboot the SQL server.

     

     

    Step by step deployment guide:

     

    1.  Install the Management Server role on OMMS1. You can also refer to: http://technet.microsoft.com/en-us/library/hh301922.aspx

    • Log on using your personal domain user account that is a member of the SCOMAdmins group.
    • Run Setup.exe
    • Click Install
    • Select the following, and then click Next:
      • Management Server
      • Operations Console
      • Web Console
    • Accept or change the default install path and click Next.
    • You might see an error from the Prerequisites here. If so – read each error and try to resolve it. Common errors:
      • Report Viewer controls are not installed. Install them from http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=6442
      • ISAPI/ASP.NET errors. This can happen if you install .NET 4.0 as part of your OS build, but then add the IIS role later. Simply run the following command to resolve, from an elevated command prompt: C:\Windows\Microsoft.NET\Framework64\v4.0.30319>aspnet_regiis.exe -i -enable
    • On the Proceed with Setup screen – click Next.
    • On the specify an installation screen – choose to create the first management server in a new management group.  Give your management group a name. Don’t use any special or Unicode characters, just simple text. Click Next.
    • On the Database Configuration screen, enter in the name of your SQL database server name and instance. In my case this is “OMDB”. Leave the port at default unless you are using a special custom fixed port.  If necessary, change the database locations for the DB and log files. Leave the default size of 1000 MB for now. Click Next.
    • On the data warehouse database screen, input the servername, instance, and if necessary change path locations as on the previous screen. Click Next.
    • On the Web Console screen, choose the default web site, and leave SSL unchecked. If you have already set up SSL for your default website with a certificate, you can choose SSL.  Click Next.
    • On the Web Console authentication screen, choose Mixed authentication and click Next.
    • On the accounts screen, choose Domain Account for ALL services, and enter in the unique DOMAIN\OMAA, DOMAIN\OMDAS, DOMAIN\OMREAD, and DOMAIN\OMWRITE accounts we created previously. It is a best practice to use separate accounts for distinct roles in OpsMgr, although you can also just use the DOMAIN\OMDAS account for all SQL Database access roles to simplify your installation. Click Next.
    • Choose Yes or No to send Customer Experience and Error reports.
    • Click Install.
    • Close when complete.
    • The Management Server will be very busy (CPU) for several minutes after the installation completes. Before continuing it is best to give the Management Server time to complete all post install processes, complete discoveries, configuration, etc. 10 minutes is typically sufficient.

     

    2.  Install the second Management Server on OMMS2. You can also refer to: http://technet.microsoft.com/en-us/library/hh284673.aspx

    • Log on using your domain user account that is a member of the SCOMAdmins group.
    • Run Setup.exe
    • Click Install
    • Select the following, and then click Next:
      • Management Server
      • Operations Console
    • Accept or change the default install path and click Next.
    • Resolve any issues with prerequisites, and click Next.
    • Choose “Add a management server to an existing management group” and click Next.
    • Input the servername\instance hosting the Ops DB. Select the correct database from the drop down and click Next.
    • On the accounts screen, choose Domain Account for ALL services, and enter in the unique DOMAIN\OMAA, DOMAIN\OMDAS accounts we created previously. It is a best practice to use separate accounts for distinct roles in OpsMgr, although you can also just use the DOMAIN\OMDAS account for all SQL Database access roles to simplify your installation. Click Next.
    • Choose Yes or No to send Customer Experience and Error reports.
    • Click Install.
    • Close when complete.

     

    3.  Install OM12 Reporting on the SQL server. You can also refer to: http://technet.microsoft.com/en-us/library/hh298611.aspx

    • Log on using your domain user account that is a member of the SCOMAdmins group, and has System Administrator (SA) rights over the SQL instances.
    • Run Setup.exe. Click Install.
    • Select the following, and then click Next:
      • Reporting Server
    • Accept or change the default install path and click Next.
    • Resolve any issues with prerequisites, and click Next.
    • Type in the name of a management server, and click Next.
    • Choose the correct local SQL reporting instance and click Next.
    • Enter in the DOMAIN\OMREAD account when prompted. It is a best practice to use separate accounts for distinct roles in OpsMgr, although you can also just use the DOMAIN\OMDAS account for all SQL Database access roles to simplify your installation. Click Next.
    • Choose Yes or No to send ODR information to Microsoft. This is very important to assist Microsoft in getting good information to help improve the product.
    • Click Install.
    • Close when complete.

     

    4.  Deploy an agent to the SQL DB server.

     

    5.  Import management packs. Also refer to: http://technet.microsoft.com/en-us/library/hh212691.aspx

    • Using the console – you can import MP’s using the catalog, or directly importing from disk.  Note – some MP’s should only be imported from disk.
    • Import the Base OS and SQL MP’s at a minimum.

     

    6.  Create a dashboard view:

     

    7.  Manually grow your Database sizes and configure SQL

    • When we installed each database, we used the default of 1GB (1000MB). This is not a good setting for steady state as our databases will need to grow larger than that very soon.  We need to pre-grow these to allow for enough free space for maintenance operations, and to keep from having lots of auto-growth activities which impact performance during normal operations.
    • A good rule of thumb for most deployments of OpsMgr is to set the OpsDB to 30GB for the data file and 15GB for the transaction log file. This can be smaller for POC’s but generally you never want to have an OpsDB set less than 10GB/5GB.  Setting the transaction log to 50% of the DB size for the OpsDB is a good rule of thumb.
    • For the Warehouse – you will need to plan for the space you expect to need using the sizing tools available and pre-size this from time to time so that lots of autogrowths do not occur.

     

    8.  Continue with optional activities from the Quick Start guide on TechNet:

     

    9.  Enable Agent Proxy

    I prefer to simply enable agent proxy for all agents.  You can do this by running a script on a schedule, either via scheduled task, Orchestrator, or embed into a management pack.

    http://blogs.technet.com/b/kevinholman/archive/2010/11/09/how-to-set-agent-proxy-enabled-for-all-agents.aspx

  • OpsMgr: How to monitor non-Microsoft SQL databases in SCOM – an example using Postgre SQL

    OpsMgr has the capability to run a synthetic transaction, to query a remote database from a watcher node.  This can be used to simulate an application query to a back end database, and we have built in monitoring to set thresholds for:

    • Connection time
    • Query time
    • Fetch Time

    We will also auto-create three performance rules – so that you can collect these as performance data for short term investigation, or long term trending.

     

    In the console, on the Authoring pane, right click “Management Pack Templates” and choose the Add Monitoring Wizard.

    image

     

    Select the OLE DB Data Source, give it a name and create or select an existing management pack for your SQL synthetic transaction.

    On the Connection String page – typically you would select “Build” and choose from one of our existing built-in providers:

     

    image

     

    This is very simple for running queries against Microsoft SQL servers.  However, what if you you need to query Oracle, or some open-source database?

     

    There is a pretty good article on setting this up for an Oracle database here:

    http://www.maartendamen.com/2010/09/monitor-an-oracle-database-with-a-scom-oledb-watcher/

     

    My customer recently asked me about running a synthetic transaction against a Postgre Open Source SQL DB, so that will be the source of this article.  However – you can use this guidance for any database, as long as there is an OLE DB provider for Windows for that database.  The alternative to this – would be to write a custom script, that can query the DB via the scripting language providers, then use the output of the script to drive a SCOM monitor, like via a propertybag, or the event log.

     

    Ok – lets get started.

     

    The first step is to find a Windows OLE DB provider for your database.  Download it, and install it on your Watcher node (the agent that you want to run the queries)

    For Postgre SQL – I used a trial provider from http://www.pgoledb.com however if you look around I am sure there are other open source providers out there.

    Once you install the provider – you should test it to ensure your connections are a success.  Create an empty file with Notepad.exe on your Watcher node’s desktop, name it SQL.txt.  Then once it is on your desktop, rename it to sql.udl.  This UDL file can now launch the OLD DB data link tool, which will show you all your providers.  Notice my new provider for PostgreSQL:

    image

     

    Select your provider, and choose Next.

    Input the servername, port, authentication account, and default database you wish to query, and test the connection.  You MUST get this working before even attempting the OpsMgr OLE DB Wizard, because it will simply call on this provider.  Here is my example below:

     

    image

     

    Once it is a success – you browse the “All” tab – and see all the parameters allowed by your provider in a connect string:

    image

     

    The next step is to configure the OpsMgr Synthetic transaction. 

     

    In the “Build” Connection String setting for your OLE DB Datasource, it will not list our custom provider, unless it is installed on the same machine that you are running the console.  You could install your provider on your console machine, but I don’t recommend it.  The connect strings are very specific and the SCOM wizard does not provide the correct ones in all cases.  Therefore – just pick the “Microsoft OLE DB Provider for SQL Server”, provide a server and database name, and make sure you check the box to use Simple Authentication RunAs Profile. 

     

    image

    The reason we check the box for simple auth is so it will build the RunAs profile and input the username and password variables into the connect string.

     

    Now – on the next screen, highlight everything in the Connection string, copy and past it into notepad

     

    Provider=SQLOLEDB;Server=SRV02;Database=postgres;User Id=$RunAs[Name="OleDbCheck_37d53320a37b48dda11eed3a00caa91f.SimpleAuthenticationAccount"]/UserName$;Password=$RunAs[Name="OleDbCheck_37d53320a37b48dda11eed3a00caa91f.SimpleAuthenticationAccount"]/Password$

     

    We need to modify this line to use the supported parameters of our SQL provider.  You should be able to get this information from the provider documentation, from the Data Link Properties tool we used above, or from examples on the web.  In my case – I will use the provider documentation

     

    Provider=PGNP.1;Initial Catalog=postgres;Extended Properties="PORT=5432";User ID=$RunAs[Name="OleDbCheck_37d53320a37b48dda11eed3a00caa91f.SimpleAuthenticationAccount"]/UserName$;Password=$RunAs[Name="OleDbCheck_37d53320a37b48dda11eed3a00caa91f.SimpleAuthenticationAccount"]/Password$

     

    In the example above – my provider uses a name of “PGNP.1”, the initial catalog is the database I want to query, and I specify the port.  I did not specify the server name, because my watcher node is the same computer that hosts the database, otherwise I would have a value for the server host name.

    Once you have a well formatted connect string, the next step is to input your test query and give the workflow a timeout of when to quit and kill the query:

     

    image

     

    Running a “Test” will fail – because the test is not run from the watcher node – it is run from the RMS, which does not have these special providers installed, so skip that.

     

    Configure alert thresholds for your expected query results:

    image

     

    Choose your watcher node and how often you want the query to run.  Don’t run these synthetic transactions too often, if you have a lot of them they can overflow the watcher node agent, or create a performance impacting load on it.

     

    image

     

    You can now finish and create your transaction.  The watcher node will get instructions to download this management pack, and it will begin running the transaction.  You can inspect the progress in the console under Synthetic Transaction, OLE DB Data Source State:

     

    image

     

    Soon Health Explorer may show as critical:

     

    image

     

    This is because we haven't configured the RunAs accounts, for simple authentication to gain access to the database. 

     

    In the console, under Administration > Run As Configuration > Accounts.  Create a Run As Account.  Choose Simple authentication and supply a name:

    image

     

    Provide a credential:

     

    image

     

    Always choose More Secure:

     

    image

     

    Under Accounts, open the properties of the account you just created.  Go to the Distribution Tab – and you need to allow your watcher node to use this credential by distributing it to your watcher:

     

    image

     

    image

     

    Now we need to associate this account we created, to the Profile that our Synthetic Transaction uses.  Select Profiles, and find the name of the Simple Authentication Profile that matches the name of our OLE DB Synthetic transaction:

     

    image

     

    Open the properties of this profile, and add our newly created account to it:

     

    image

     

    This will update the Secure Reference management pack, and this credential will flow down to our watcher node, and subsequent attempts to monitor our database will pass this credential, instead of trying to use the default agent action account to authenticate (local system).

    After a few minutes, you should see Health explorer clear up and show a successful connection:

     

    image

     

    If you want to validate that you are collecting performance data – right click your OLE DB Synthetic transaction in the monitoring pane > Open > Performance View:

     

     

    image

     

    image

     

    As you can see – as long as there is a provider for Windows for the agent to consume, we can synthetically query remote databases of any type, authenticate to them securely, and bring back good performance data to proactively show query or connect performance issues, and react to outages immediately.

  • Deploying Unix/Linux Agents using OpsMgr 2012

    Microsoft started including Unix and Linux monitoring in OpsMgr directly in OpsMgr 2007 R2, which shipped in 2009.  Some significant updates have been made to this for OpsMgr 2012.  Primarily these updates are around:

    • Highly available Monitoring via Resource Pools
    • Sudo elevation support for using a low priv account with elevation rights for specific workflows.
    • ssh key authentication
    • New wizards for discovery, agent upgrade, and agent uninstallation
    • Additional Powershell cmdlets
    • Performance and scalability improvements
    • New monitoring templates for common monitoring tasks

     

    This article will cover the discovery, agent deployment, and monitoring configuration of a Linux server in OpsMgr 2012.  I am going to run through this as a typical user would – and show some of the pitfalls if you don’t follow the exact order of configuration required.

     

    So what would anyone do first?  They’d naturally run a discovery, just like they do for Windows agents.  However – this will likely end up in frustration.  There are several steps that you need to configure FIRST, before deploying Unix/Linux agents.

     

    High Level Overview:

     

    The high level process is as follows:

    • Import Management Packs
    • Create a resource pool for monitoring Unix/Linux servers
    • Configure the Xplat certificates (export/import) for each management server in the pool.
    • Create and Configure Run As accounts for Unix/Linux.
    • Discover and deploy the agents

     

     

    Import Management Packs:

     

    The core Unix/Linux libraries are already imported when you install OpsMgr 2012, but not the detailed MP’s for each OS version.  These are on the installation media, in the \ManagementPacks directory.  Import the specific ones for the Unix or Linux Operating systems that you plan to monitor.

     

     

    Create a resource pool for monitoring Unix/Linux servers

    The FIRST step is to create a Unix/Linux Monitoring Resource pool.  This pool will be used and associated with management servers that are dedicated for monitoring Unix/Linux systems in larger environments, or may include existing management servers that also manage Windows agents or Gateways in smaller environments.  Regardless, it is a best practice to create a new resource pool for this purpose, and will ease administration, and scalability expansion in the future.

    Under Administration, find Resource Pools in the console:

    image

     

    OpsMgr ships 3 resource pools by default:

    image

     

    Let’s create a new one by selecting “Create Resource Pool” from the task pane on the right, and call it “Unix Linux Monitoring Resource Pool”

     

    image

     

    Click Add and then click Search to display all management servers.  Select the Management servers that you want to perform Unix and Linux Monitoring.  If you only have 1 MS, this will be easy.  For high availability – you need at least two management servers in the pool.

     

    Add your management servers and create the pool.  In the actions pane – select “View Resource Pool Members” to verify membership.

     

    image

     

     

    Configure the Xplat certificates (export/import) for each management server in the pool

    This process is documented here:  http://technet.microsoft.com/en-us/library/hh287152.aspx

    Operations Manager uses certificates to authenticate access to the computers it is managing. When the Discovery Wizard deploys an agent, it retrieves the certificate from the agent, signs the certificate, deploys the certificate back to the agent, and then restarts the agent.

    To configure high availability, each management server in the resource pool must have all the root certificates that are used to sign the certificates that are deployed to the agents on the UNIX and Linux computers. Otherwise, if a management server becomes unavailable, the other management servers would not be able to trust the certificates that were signed by the server that failed.

    We provide a tool to handle the certificates, named scxcertconfig.exe.  Essentially what you must do, is to log on to EACH management server that will be part of a Unix/Linux monitoring resource pool, and export their SCX (cross plat) certificate to a file share.  Then import each others certificates so they are trusted.

    If you only have a SINGLE management server, or a single management server in your pool, you can skip this step, then perform it later if you ever add Management Servers to the Unix/Linux Monitoring resource pool.

     

    In this example – I have two management servers in my Unix/Linux resource pool, MS1 and MS2.  Open a command prompt on each MS, and export the cert:

    On MS1:

    C:\Program Files\System Center 2012\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS1.cer

    On MS2:

    C:\Program Files\System Center 2012\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS2.cer

    Once all certs are exported, you must IMPORT the other management server’s certificate:

    On MS1:

    C:\Program Files\System Center 2012\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS2.cer

    On MS2:

    C:\Program Files\System Center 2012\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS1.cer

    If you fail to perform the above steps – you will get errors when running the Linux agent deployment wizard later.

     

     

    Create and Configure Run As accounts for Unix/Linux

     

    Next up we need to create our run-as accounts for Linux monitoring.   This is documented here:  http://technet.microsoft.com/en-us/library/hh212926.aspx

     

    We need to select “UNIX/Linux Accounts” under administration, then “Create Run As Account” from the task pane.  This kicks off a special wizard for creating these accounts.

    image

     

    image

     

    Lets create the Monitoring account first.  Give the monitoring account a display name, and click Next.

     

    image

     

    On the next screen, type in the credentials that you want to use for monitoring the Linux system(s).

     

    image

     

    On the above screen – you have two choices.  You can provide a privileged account for handling monitoring, or you can use an existing account on the Linux system(s) that is not privileged.  Then – you can specify whether or not you want this account to be able to leverage sudo elevation.  Since I am providing a privileged account in this case – I will tell it to not use elevation.

    On the next screen, always choose more secure:

    image

     

    Now – since we chose More Secure – we must choose the distribution of the Run As account.  Find your “Linux Monitoring Account” under the UNIX/Linux Accounts screen, and open the properties.  On the Distribution Security screen, click Add, then select "Search by resource pool name” and click search.  Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK.  This will distribute this account credential to all Management servers in our pool:

     

    image

     

    We would repeat the above process, as many times as necessary for the number of different accounts we need.  If all our Linux systems use the same credentials, then we need at a minimum, ONE monitoring account that is privileged, and it can be associated to the three Run As Profiles (covered in next section).

    However, what would be more typical, if all our systems had the same credentials and passwords, is to use THREE Run As accounts:

    • One for for Unprivileged (do not use elevation) monitoring
    • One for Privileged monitoring using EITHER a priv account (do not use elevation), OR a unpriv account using sudo (use elevation)
    • One for Agent Maintenance using EITHER a priv account (do not use elevation), OR a unpriv account using sudo (use elevation)

    For the purposes of this demo, I am just going to create a SINGLE priv Run As account (root) that I will use for all three scenarios.

     

    Next up – we must configure the Run As profiles.  This is covered here:  http://technet.microsoft.com/en-us/library/hh212926.aspx

     

    There are three profiles for Unix/Linux accounts:

     

    image

     

    The agent maintenance account is strictly for agent updates, uninstalls, anything that requires SSH.  This will always be associated with a privileged account that has access via SSH, and was created using the Run As account wizard above, but selecting “Agent Maintenance Account” as the account type.  We wont go into details on that here.

    The other two Profiles are used for Monitoring workflows.  These are:

    Unix/Linux Privileged account

    Unix/Linux Action Account

    The Privileged Account Profile will always be associated with a Run As account like we created above, that is Privileged (root or similar) OR a unprivileged account that has been configured with elevation via sudo.  This is what any workflows that typically require elevated rights will execute as.

    The Action account is what all your basic monitoring workflows will run as.  This will generally be associated with a Run As account, like we created above, but would be used with a non-privileged user account on the Linux systems.

    ***A note on sudo elevated accounts:

    • sudo elevation must be passwordless.
    • requiredtty must be disabled for the user.

     

    For my example – I am keeping it very simple.  I created a single Run As account, of the Monitoring type, which is the privileged root account and password credential.  I will associate this Run As account to BOTH the Privileged and Action account.  This will make all my workflows (both normal monitoring and elevated monitoring) run under this credential.  This is not recommended as the “lowest priv” design, but being leveraged in this example just to keep things simple.  Once we validate it is working, we can go back and change this configuration and experiment using low priv and sudo enabled elevation accounts, and associate them independently.

    For more information on configuring sudo elevation for OpsMgr monitoring accounts, including some sample configurations for your sudoers files for each OS version:  http://social.technet.microsoft.com/wiki/contents/articles/7375.configuring-sudo-elevation-for-unix-and-linux-monitoring-with-system-center-2012-operations-manager.aspx

     

    I will start with the Unix/Linux Action Account profile.  Right click it – choose properties, and on the Run As Accounts screen, click Add, then select our “Linux Monitoring Account”.  Leave the default of “All Targeted Objects” and click OK, then save.

    Repeat this same process for the Unix/Linux Privileged Account profile.

    Repeat this same process for the Unix/Linux Agent Maintenance Account profile.

     

     

    Discover and deploy the agents

     

    Run the discovery wizard.

     

    image

     

    Click “Add”:

    image

     

    Here you will type in the FQDN of the Linux/Unix agent, its SSH port, and then choose All Computers in the discovery type.  ((We have another option for discovery type – if you were manually installing the Unix/Linux agent (which is really just a simple provider) and then using a signed certificate to authenticate))

     

    Now – hit “Set Credentials”.  If we do not want to provide a root account here, and wanted to use SSH key authentication, we support that on this screen now.  For this example – I will simply type in my root account in order to use SSH to discover and deploy the Linux agent.

     

    image

     

    Notice above that you can tell the wizard if the account is privileged or not.  Here is an explanation:

    • A privileged account is a user account that has root-level access, including access to security logs and read, write, and execute permissions for the directories in which the Operations Manager agent is installed.
    • An unprivileged account is a normal user account that does not have root-level access or special permissions. However, an unprivileged account allows monitoring of system processes and of performance data.

    If you have to discover only UNIX and Linux computers that already have an agent installed, rather than installing an agent, you can use an unprivileged user account on the UNIX or Linux computer. If you have to install an agent, you must use a privileged account. If you do not have a privileged account, you can elevate an unprivileged account to a privileged account provided that the su or sudo elevation program has been configured on the UNIX or Linux computer for the user account.

     

    So – if we had pre-installed the agent already – we could simply use an unprivileged account to authenticate and discover the system, bringing it into OpsMgr.

    Or – we could provide an unprivileged account that was allowed elevation via a pre-existing sudo configuration on the Linux server.

     

    image

     

    Click save.  On the next screen – select a resource pool.  We will choose the resource pool that we already created.

     

    image

     

    Click Discover, and the results will be displayed:

     

    image

     

    Check the box next to your discovered system – and deploy the agent.

     

    image

     

    This will take some time to complete, as the agent is checked for the correct FQDN and SSL certificate, the management servers are inspected to ensure they all have trusted SCX certificates (that we exported/imported above) and the connection is made over SSH, the package is copied down, installed, and the final certificate signing occurs.  If all of these checks pass, we get a success!

     

    There are several things that can fail at this point.  See the troubleshooting section at the end of this article.

     

     

    Monitoring Linux servers:

     

    Assuming we got all the way to this point with a successful discovery and agent installation, we need to verify that monitoring is working.  After an agent is deployed, the Run As accounts will start being used to run discoveries, and start monitoring.  Once enough time has passed for these, check in the Administration pane, under Unix/Linux Computers, and verify that the systems are not listed as “Unknown” but discovered as a specific version of the OS:

     

    image

     

    Next – go to the Monitoring pane – and select the “Unix/Linux Computers” view at the top.  Look that your systems are present and there is a green healthy check mark next to them:

     

    image

     

    Next – expand the Unix/Linux Computers folder in the left tree (near the bottom) and make sure we have discovered the individual objects, like Linux Server State, Linux Disk State, and Network Adapter state:

     

    image

     

    Run Health explorer on one of the discovered disks.  Remove the filter at the top to see all the monitors for the disk:

     

    image

     

    Close health explorer. 

    Select the Operating System Performance view.   Review the performance counters we collect out of the box for each monitored OS.

     

    image

     

    Out of the box – we discover and apply a default monitoring template to the following objects:

    • Operating System
    • Logical disk
    • Network Adapters

    Optionally, you can enable discoveries for:

    • Individual Logical Processors
    • Physical Disks

    I don’t recommend enabling additional discoveries unless you are sure that your monitoring requirements cannot be met without discovering these additional objects, as they will reduce the scalability of your environment.

     

    Out of the box – for an OS like RedHat Enterprise Linux 5 – here is a list of the monitors in place, and the object they target:

     

    image

     

    There are also 50 rules enabled out of the box.  46 are performance collection rules for reporting, and 4 rules are event based, dealing with security.  Two are informational letting you know whenever a direct login is made using root credentials via SSH, and when su elevation occurs by a user session.  The other two deal with failed attempts for SSH or SU.

     

    To get more out of your monitoring – you might have other services, processes, or log files that you need to monitor.  For that, we provide Authoring Templates with wizards to help you add additional monitoring, in the Authoring pane of the console under Management Pack templates:

     

    image

     

    In the reporting pane – we also offer a large number of reports you can leverage, or you can always create your own using our generic report templates, or custom ones designed in Visual Studio for SQL reporting services.

     

    image

     

    As you can see, it is a fairly well rounded solution to include Unix and Linux monitoring into a single pane of glass for your other systems, from the Hardware, to the Operating System, to the network layer, to the applications.

    Partners and 3rd party vendors also supply additional management packs which extend our Unix and Linux monitoring, to discover and provide detailed monitoring on non-Microsoft applications that run on these Unix and Linux systems.

     

     

    Troubleshooting:

     

    The majority of troubleshooting comes in the form of failed discovery/agent deployments.

     

    Microsoft has written a wiki on this topic, which covers the majority of these, and how to resolve:

    http://social.technet.microsoft.com/wiki/contents/articles/4966.aspx

     

    • For instance – if your DNS name that you provided does not match the DNS hostname on the Linux server, or match it’s SSL certificate, or if you failed to export/import the SCX certificates for multiple management servers in the pool, you might see:

     

    image

     

    Agent verification failed. Error detail: The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
    The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.

    The SSL certificate is signed by an unknown certificate authority.
    It is possible that:
    1. The destination certificate is signed by another certificate authority not trusted by the management server.
    2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
    3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

    The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
    The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
    The SSL certificate is signed by an unknown certificate authority.
    It is possible that:
    1. The destination certificate is signed by another certificate authority not trusted by the management server.
    2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
    3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

     

    The solution to these common issues is covered in the Wiki with links to the product documentation.

     

    • Perhaps – you failed to properly configure your Run As accounts and profiles.  You might see the following show as “Unknown” under administration:

     

    image

     

    Or you might see alerts in the console:

     

    Alert:  UNIX/Linux Run As profile association error event detected

    The account for the UNIX/Linux Action Run As profile associated with the workflow "Microsoft.Unix.AgentVersion.Discovery", running for instance "rh5501.opsmgr.net" with ID {9ADCED3D-B44B-3A82-769D-B0653BFE54F9} is not defined. The workflow has been unloaded. Please associate an account with the profile.

    This condition may have occurred because no UNIX/Linux Accounts have been configured for the Run As profile. The UNIX/Linux Run As profile used by this workflow must be configured to associate a Run As account with the target.

    Either you failed to configure the Run As accounts, or failed to distribute them, or you chose a low priv account that is not properly configured for sudo on the Linux system.  Go back and double-check your work there.

     

    If you want to check if the agent was deployed to a RedHat system, you can provide the following command in a shell session:

    image

  • OpsMgr: How to create a group of all Windows Computers that are NOT a member of another group

    This is a pretty common request, and I have been meaning to write up an example of this.

    Suppose you have the following scenario:  You are monitoring 1000 Windows Server with OpsMgr.  In your management group, you have 100 servers that are Test/Dev machines, and you have 900 that are production.  You need a simple way to treat these servers differently, for overrides, and creating notifications or incidents, or even scoping your views.  You want to ensure that you don’t send critical pages, emails, or create incidents on these lab/test/dev machines.

    The challenge is – our notifications, views, and overrides don’t have the ability to have an “exclude” function… to say “show me everything except alerts from these machines”

     

    I will start by creating a group using the UI, for my Lab Computers group, based on OU.  This could be based on static membership, or anything else.

     

    image

    image

    Verify that I have the right Lab Computer members in that group:

    image

     

    Now – we need to create a group – which contains ALL OTHER computers in SCOM, that are not part of the lab group:

     

    image

     

    The only criteria we will define here, is that this will contain all Windows Computers.  (We will restrict the membership later in XML)

     

    image

     

    Save the group and verify it contains ALL Windows Computers. 

    Save and export the management pack to XML.

    Edit the XML file using notepad or your XML editor of choice.

     

    Find the discovery for your Production Server Group.  If you used the UI to create the group, these will have a “UINameSpace<GUID>” name… so you will have to ensure you are choosing the right one by verifying this in the DisplayStrings section of the XML.

    Here is what my default group discovery criteria looked like, for all Windows Computers:

     

          <Discovery ID="UINameSpace1235cf5e76c84e458035a1c4ef8d73aa.Group.DiscoveryRule" Enabled="true" Target="UINameSpace1235cf5e76c84e458035a1c4ef8d73aa.Group" ConfirmDelivery="false" Remotable="true" Priority="Normal">
            <Category>Discovery</Category>
            <DiscoveryTypes>
              <DiscoveryRelationship TypeID="MicrosoftSystemCenterInstanceGroupLibrary7585010!Microsoft.SystemCenter.InstanceGroupContainsEntities" />
            </DiscoveryTypes>
            <DataSource ID="GroupPopulationDataSource" TypeID="SystemCenter!Microsoft.SystemCenter.GroupPopulator">
              <RuleId>$MPElement$</RuleId>
              <GroupInstanceId>$MPElement[Name="UINameSpace1235cf5e76c84e458035a1c4ef8d73aa.Group"]$</GroupInstanceId>
              <MembershipRules>
                <MembershipRule>
                  <MonitoringClass>$MPElement[Name="MicrosoftWindowsLibrary7585010!Microsoft.Windows.Computer"]$</MonitoringClass>
                  <RelationshipClass>$MPElement[Name="MicrosoftSystemCenterInstanceGroupLibrary7585010!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass>
                </MembershipRule>
              </MembershipRules>
            </DataSource>
          </Discovery>

    Right now – the expression just basically states that if the object is a Windows Computer, it belongs in the group.

    We need to add an expression, which basically states “All Windows Computers that are NOT CONTAINED in the Lab Servers Group”.  The part that handles this is the <MembershipRule> section.

    Here is an example expression that will create this filter:

                     <Expression>
                        <NotContained>
                            <MonitoringClass>$MPElement[Name="UINameSpacebff9e11464de491f9620271507a2aeb8.Group"]$</MonitoringClass>
                        </NotContained>
                     </Expression>

    The key in the above expression is the <NotContained> tag.   You can use <Contains>, <NotContains>, <Contained>, and <NotContained> for similar expressions.

     

    Now – the group class ID above just happens to be the group class ID in my management pack (for Lab Servers).  You will need to change this to your own group class ID, which is defined in this management pack above, in the <ClassTypes> section.

    The full XML for this discovery would look like so:

          <Discovery ID="UINameSpace1235cf5e76c84e458035a1c4ef8d73aa.Group.DiscoveryRule" Enabled="true" Target="UINameSpace1235cf5e76c84e458035a1c4ef8d73aa.Group" ConfirmDelivery="false" Remotable="true" Priority="Normal">
            <Category>Discovery</Category>
            <DiscoveryTypes>
              <DiscoveryRelationship TypeID="MicrosoftSystemCenterInstanceGroupLibrary7585010!Microsoft.SystemCenter.InstanceGroupContainsEntities" />
            </DiscoveryTypes>
            <DataSource ID="GroupPopulationDataSource" TypeID="SystemCenter!Microsoft.SystemCenter.GroupPopulator">
              <RuleId>$MPElement$</RuleId>
              <GroupInstanceId>$MPElement[Name="UINameSpace1235cf5e76c84e458035a1c4ef8d73aa.Group"]$</GroupInstanceId>
              <MembershipRules>
                <MembershipRule>
                  <MonitoringClass>$MPElement[Name="MicrosoftWindowsLibrary7585010!Microsoft.Windows.Computer"]$</MonitoringClass>
                  <RelationshipClass>$MPElement[Name="MicrosoftSystemCenterInstanceGroupLibrary7585010!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass>
                     <Expression>
                        <NotContained>
                            <MonitoringClass>$MPElement[Name="UINameSpacebff9e11464de491f9620271507a2aeb8.Group"]$</MonitoringClass>
                        </NotContained>
                     </Expression>
                </MembershipRule>
              </MembershipRules>
            </DataSource>
          </Discovery>

     

    You can save this MP edit, then import your management pack.

    You will now see that you have a group of all Windows Computers, except those that are members of the Lab Computers group:

     

    Lab group:

    image

    Production group:

    image

     

    Since my lab group is dynamic based on OU, as servers are moved in or out of that OU, the Production group will also be dynamically updated.

     

    I can now use my production group to scope and filter console views and user roles, filter notifications, and overrides.

  • Test/Demo OpsMgr 2012 network monitoring with Jalasoft’s network device simulator

    Testing the functionality of the network monitoring components of OpsMgr is pretty simple, when you have access to a lot of switch, router, and firewall equipment, and can get past the SNMP strings and access lists.  But what if you have this in a lab, and don’t have full access to all that equipment?

    Simple – use the Jalasoft Xian SNMP Simulator V2 from http://www.jalasoft.com

     

    The installation is simple, just run setup.  Soon you have the UI:

    image

     

    The trick is to install this application on some server or desktop that isn't your OpsMgr server.  Then you will need to add an additional IP address for each network device you want to simulate (if you want multiple running at the same time)

    image

     

    Then – in DNS – add a static A record and a reverse PTR record for each IP, with a name:

     

    image

     

    Once you have these working (verify with a ping and ping –a) you can configure jalasoft.

    Under Repository – select the device class, then the model, and click LOAD.

    image

     

    Right click under “Loaded dumps” – for the category and model you want to fire up, and choose “Simulate Device”

    Pick the IP address for this device that you created a DNS entry for, and click OK.

     

    image

     

    Now right click the device, and choose START.  It will switch from “stopped” to “running”

     

    At this point the device is up and running with SNMP.  It isn't a full blown simulator that supports telnet, or running a fake OS, it is simply supporting SNMP capabilities similar to the actual device.  Now you can follow my walk-thru of discovering this network device using the name or IP:

    http://blogs.technet.com/b/kevinholman/archive/2011/07/21/opsmgr-2012-discovering-a-network-device.aspx

     

    image

     

    image

  • Security in Operations Manager – some perspectives and typical customer scenarios

    The concept of this article is to give you a general overview of the security model, and rights required, to deploy and operate System Center Operations Manager.  For each section, I will link to more detailed documents and articles.

     

    What do I need to know?

     

    When I have a discussion with customers around security, it is always different due to different customer environments and policy requirements.  Here are the discussions topic areas:

    Considering the Enterprise Monitoring team, do they:

    • Own full rights to their SQL servers, or is SQL managed and secured by a different team?
    • Have local admin rights on all monitored servers in the environment, or no rights?
    • Have an account domain admins privileges?
    • Have the ability to create service accounts in the domain?  Containers?
    • What is the password policy for service accounts?
    • How many people on the team will be OpsMgr admins?
    • Do they want to restrict OpsMgr Operators to only see specific servers, by group, or application?
    • Do they want to help elevate Operators to have access to run specific tasks, or restrict them from others?
    • Do we need to use a special account to configure notification access to their SMTP server?
    • Do we need to monitor applications that have their own security access list, and are fairly restricted?  (SQL, Active Directory, SharePoint, etc)
    • Do they even HAVE an “enterprise monitoring team”?  Smile

     

     

    Can you break it down into simpler terms for me?

    First off – there is good documentation available on OpsMgr security requirements, in our Security Guide, Design Guide, and Deployment guide:

    http://technet.microsoft.com/en-us/systemcenter/om/bb498235.aspx

     

    I like to break down a security conversation on OpsMgr around the following topics:

    • Accounts and groups needed before installing the infrastructure
    • Rights needed to install, deploy, and update the infrastructure
    • Rights needed to deploy agents
    • Rights required for specific roles and applications (Run As accounts)
    • Rights and configurations for scoping consoles for operators

     

     

    Accounts and groups needed before installing the infrastructure

    This can really be as simple or as complex as your environment requires it to be.  At a minimum, I like to recommend the following:

    • DOMAIN\OMAdmins            OM Administrators security group
    • DOMAIN\OMAA                      OM Server action account
    • DOMAIN\OMDAS                    OM Config and Data Access service account

    The domain group is really necessary to clearly define who has admin access to OpsMgr, and will be granted local admin access to the infrastructure (SCOM) servers.

    The OM Server action account needs no special rights in Active directory, it is just a domain user account that will be used to run processes on the management servers.

    The OM Config/DAS/SDK account is uses to run processes on the management server, and have access to the OpsMgr database.

     

    Additionally, you might consider the following accounts, which will be needed for specific scenarios:

    • DOMAIN\OMWRITE            OM Reporting Write account
    • DOMAIN\OMREAD              OM Reporting Read account
    • DOMAIN\SQLSVC                 SQL 2008 service account
    • DOMAIN\NOTIFY                  A domain account with rights to send mail to the STMP server

    The OM Write and Read accounts are used for reporting – controlling access to the data warehouse database and used for report execution, deployment, and other tasks.  Many times customers consider just leveraging the above DAS/SDK account for both of these roles when installing reporting, if maintaining multiple service accounts is more painful for your organization.

    The SQL Service account is typical when installing SQL server and running the SQL DB and Agent service under a domain account.  Normally this would be handled and controlled by a dedicated SQL team.  If the OM team owns their own SQL servers, then this should be part of the planning process.

    The Notification account is only needed if you have a secured SMTP relay server in your environment.  By default the OpsMgr server will send notifications using the Management Server Action Account (OMAA).  If this account does not have rights to send mail – then you have to come up with a relay solution, such as granting those rights to the OM server action account, or using a specific notification RunAs account for this, or configuring relay access via some other means.

     

     

    Rights needed to install, deploy, and update the infrastructure

    I break down a lot of the granular security rights for each role and account at the following article:

    http://blogs.technet.com/b/kevinholman/archive/2008/04/15/opsmgr-security-account-rights-mapping-what-accounts-need-what-privileges.aspx

     

    For the initial deployment, I like to recommend that a SCOM admin’s domain account be placed into the DOMAIN\OMAdmins group.  Then – assign this group to have the following rights:

    • Local Administrative rights to the OS on each infrastructure servers (Management Servers, Gateways, and SQL servers)
    • Systems Administrator role rights to all SQL servers which will be hosting the Operations or Warehouse database.

    That’s it.

    Generally, configuring each OpsMgr management server or Gateway is no issue.  However, getting local admin access or SA role rights to the SQL servers OS, and SQL specific access list, can be challenging.  There are no real workarounds to this in OpsMgr 2007 R2.  These rights are required for the OpsMgr setup routine to create and set permissions for these databases.  Sometimes you will see a SQL team pushing back on this requirement, because they do not normally grant local admin or SA role to ANY application owner.  However, these are required so you will need to work that out internally.  What I recommend is to grant the specific OM Admin’s domain user account to have these rights *temporarily* during the installation, and then remove those rights completely once install is complete.  Setup will automatically grant the required rights to each service account supplied – so the individual account that performed the installation is no longer required to have this high level of access.  This is also why I do not recommend performing installations of software using a service account.  If you remove or downgrade the rights to a service account in SQL – you might break something.  It is best to just let setup handle those required rights, and add/remove nothing to them.

     

    Rights needed to deploy agents

    Agents get deployed in one of three typical ways:

    • Push installed from the console
    • Deployed via software distribution, like using Configuration Manager
    • Pre-installed in an image, using AD integration.
    • Manually installed, by a local administrator running the agent MSI.

    In other to Push-install agents from the console across the network to the monitored server, very specific requirements exist.  I discuss these requirements in detail here:"

    http://blogs.technet.com/b/kevinholman/archive/2007/12/12/agent-discovery-and-push-troubleshooting-in-opsmgr-2007.aspx

    One of the main items – is that in order to install software on a remote machine, the SCOM admin must have Local Admin rights on that remote OS, *OR* have access to leverage an account that does have these rights.  You will see in the below screenshot – when performing the initial discovery, we can use the Management Server Action Account (provided it has local admin rights on all managed servers) or type in an account credential that has local admins rights:

     

    image

    Most accounts I work with will type in a credential here, such as their personal administrative account, in order to push the agent out.  The alternative – is to configure the OpsMgr Management Server Action Account to have local admin rights on all monitored servers.

    One distinction here, is Domain Controllers.  Most of the time – OpsMgr Administrators do not have Domain Admin rights to AD.  This also means they will not have any local admin rights to your Domain Controllers.  So pushing the agent to any Domain Controllers can be a challenge.  I discuss that in deeper detail here:

    http://blogs.technet.com/b/kevinholman/archive/2009/02/20/getting-and-keeping-the-scom-agent-on-a-domain-controller-how-do-you-do-it.aspx

     

    Rights required for specific roles and applications (Run As accounts)

    When an agent starts up – it needs to run as a given credential.  Whatever account the agent processes run under, needs to have the ability to gain access to most data sources on the monitored system, like the Event Log, Perfmon, WMI, ability to run scripts, read log files, etc.  The account that the agent runs these default processes under is called the “Default Agent Action Account”.  In most cases, I stress to my customers to run the agent as “Local System” which also happens to be the default configuration when you deploy an Agent.  Local System is a good, secure, low-maintenance strategy, that will give your agent the access it needs to monitor the system.

    Sometimes, you will run across scenarios where Local System does NOT have enough rights to access specific objects or applications.  For instance, this is common when monitoring Microsoft SQL Server.  SQL Server maintains its own security access list to the SQL instance and databases, and does not directly leverage the local administrative group membership.  It is possible (and sometimes common) to see SQL administrators limiting the access rights of “Local System” from having any access, or enough access, to SQL.  In these cases, the agent and management pack cannot do its job.  Luckily, these scenarios are almost always fully covered in the management pack guide for applications where this might be common.  I give a much deeper dive into this concept, using the SQL server scenario here:

    http://blogs.technet.com/b/kevinholman/archive/2010/09/08/configuring-run-as-accounts-and-profiles-in-r2-a-sql-management-pack-example.aspx

    There are many examples of this for specific applications, and workflows, such as Active Directory MP, SQL, SharePoint, etc.

     

    Rights and configurations for scoping consoles for operators

    Once you have OpsMgr up and running – one of the things you will want to do for your users, is to scope the console for them based on what they need to see, or have access to.  This is a good thing, because it will let them focus on the applications, or servers that they care about, without seeing the massive amounts of issues across the environment.  We integrate with Active Directory here, so you can reuse existing Global Groups for scoping a SQL team to only see SQL computers and alerts.  Here is an example, where I create a scoped Operator User Role called “SQL Admins”, and use the AD global group for the existing SQL team:

    image

     

    Once I create this – I can scope to the Groups, Tasks, and Views that I want to deliver to my teams:

     

    image

    image

    image

     

    This results in a very different experience for my SQL team – where they only see computers, SQL instances, databases, and alerts from SQL:

    image

     

    Another item for a security discussion – is Tasks.  Tasks allow you to elevate an operators rights in OpsMgr.  For instance, you can have a first level helpdesk person watching the console or receiving the first line notifications of an issue…. the SQL Agent Service is stopped on a server.  With tasks – you can allow that operational team to be able to run a command in the OpsMgr console – from the actual alert that was generated – to restart the service.  But what if that Operator has NO local admins rights to the monitored server?  No problem – with tasks, they can be created to run under the default action account, or pre-defined run-as account, which DOES have rights to start the service. 

     

    image

     

    Alternatively – if this task requires rights above and beyond what the run-as account or default agent action account have rights to – you can allow the admin/operator to type in one-time use credentials to execute the task under.  So we cover both scenarios.

    In this way – you should take care of what tasks to allow operators to be able to run – the default behavior is possible elevation of their privileges… to be able to execute a task running under a pre-defined credential such as local system, or a SQL run-As account.

    image

  • OpsMgr: Network utilization scripts in BaseOS MP version 6.0.6958.0 may cause high CPU utilization and service crashes on Server 2003

    Recently I discussed some of the changes in the Base OS MP version 6.0.6958.0

    OpsMgr- MP Update- New Base OS MP 6.0.6958.0 adds Cluster Shared Volume monitoring, BPA, new rep

     

    One of the changes in this newer version of the MP is the addition of a new datasource module, which runs a script to output the Network Adapter Utilization.  The name of the datasource is “Microsoft.Windows.Server.2008.NetworkAdapter.BandwidthUsed.ModuleType”.   This datasource module uses the timed script property bag provider, along with a generic mapper condition detection.  The script name is:  “Microsoft.Windows.Server.NetwokAdapter.BandwidthUsed.ModuleType.vbs”

     

    There are 3 rules, and 3 monitors for each OS (2003 and 2008), which utilize this datasource:

    • Rules:
      • 2008
        • Microsoft.Windows.Server.2008.NetworkAdapter.PercentBandwidthUsedReads.Collection (Percent Bandwidth Used Read)
        • Microsoft.Windows.Server.2008.NetworkAdapter.PercentBandwidthUsedWrites.Collection (Percent Bandwidth Used Write)
        • Microsoft.Windows.Server.2008.NetworkAdapter.PercentBandwidthUsedTotal.Collection (Percent Bandwidth Used Total)
      • 2003
        • Microsoft.Windows.Server.2003.NetworkAdapter.PercentBandwidthUsedReads.Collection (Percent Bandwidth Used Read)
        • Microsoft.Windows.Server.2003.NetworkAdapter.PercentBandwidthUsedWrites.Collection (Percent Bandwidth Used Write)
        • Microsoft.Windows.Server.2003.NetworkAdapter.PercentBandwidthUsedTotal.Collection (Percent Bandwidth Used Total)
    • Monitors:
      • 2008
        • Microsoft.Windows.Server.2008.NetworkAdapter.PercentBandwidthUsedReads (Percent Bandwidth Used Read)
        • Microsoft.Windows.Server.2008.NetworkAdapter.PercentBandwidthUsedWrites (Percent Bandwidth Used Write)
        • Microsoft.Windows.Server.2008.NetworkAdapter.PercentBandwidthUsedTotal (Percent Bandwidth Used Total)
      • 2003
        • Microsoft.Windows.Server.2003.NetworkAdapter.PercentBandwidthUsedReads (Percent Bandwidth Used Read)
        • Microsoft.Windows.Server.2003.NetworkAdapter.PercentBandwidthUsedWrites (Percent Bandwidth Used Write)
        • Microsoft.Windows.Server.2003.NetworkAdapter.PercentBandwidthUsedTotal (Percent Bandwidth Used Total)

     

    Only the “Total” rules and monitors are enabled by default, the Read/Write workflows are disabled out of the box by design.

    The good:

     

    This new functionality is cool because it allows us to monitor the total utilization based on the network bandwidth as a percentage of the “total pipe”, report on this, and view the data in the console:

     

    image

     

     

    The issue:

     

    Since there is no direct perfmon data to collect this, the information must be collected via script.  I wrote about how to write this yourself HERE.

    There are 4 known issues with this script in the current Base OS MP, which can cause problems in some environments:

     

    1.  When the script executes – it consumes a high amount of CPU (WMIPrvse.exe process) for a few seconds.

    2.  The script does not support cookdown, so it runs a cscript.exe process and an instance of the script for EACH and every network adapter in your system (physical or virtual).  This makes the CPU consumption even higher, especially for systems with a large number of network adapters (such as Hyper-V servers).

    3.  The script does not support teamed network adapters very well, as they are manufacturer/driver dependent, and are often missing the WMI classes expected by the script, so you will see errors on each script execution, about “invalid class”

    4.  On some Windows 2003 servers, people have reported this script eventually causes a fault in netman.dll, and this can subsequently cause some additional critical services to fault/stop.

    Event Type:        Error
    Event Source:    Application Error
    Event Category:                (100)
    Event ID:              1000
    Date:                     16/10/2011
    Time:                     4:41:09 AM
    User:                     N/A
    Computer:          WSMSG7104C02
    Description:
    Faulting application svchost.exe, version 5.2.3790.3959, faulting module netman.dll, version 5.2.3790.3959, fault address 0x0000000000008d4f.

     

     

     

    From a CPU perspective – below is an example Hyper-V server with multiple NIC’s.  I set the rule and monitor which use this script to run every 30 seconds for demonstration purposes (they run every 5 minutes by default).

    image

     

    You can see WMI (and the total CPU) spiking every 30 seconds.

    After disabling all the rules and monitors which utilize this data source, we see the following from the same server:

    image

     

     

    Based on these issues, I’d probably recommend disabling these rules AND monitors for Windows 2003 and Windows 2008.  They seem to create a bit more impact than the usefulness of the data they provide.

     

     

    To disable these monitor and rules:

     

    Open the Authoring pane of the console.

    Highlight “Monitors” in the left pane.

     

    In the top line – click “Scope” until you see the “Scope Management Pack Object” pop up:

    image

     

    In the Look For box – type “Network”:

     

    image

     

    Tick the boxes next to “Windows Server 2003 Network Adapter” and “Windows Server 2008 Network Adapter” and click OK.

     

    image

     

    Now you will see a scoped view of only the monitors that target the windows server network adapter classes.  Expand Windows Server 2003 Network Adapter > Entity Health > Performance:

    image

     

    You can see that Read and Write monitors are already disabled out of the box.  You need to add a new override to disable the “Total” monitor.  Set enabled = false and save it to your Base OS Override MP for Windows 2003.

     

    Now, repeat this for the Server 2008 monitor for “Percent Bandwidth Used Total”.

     

    After disabling the two monitors that run this script – we also need to disable the rules that also share this script.  Highlight Rules in the left pane.

    Again – the read/write rules are disabled out of the box, so you need to create two overrides for each rule, one for Server 2003 Percent Bandwidth Used Total, and then the same that targets Server 2008:

     

    image

  • OpsMgr: Logical Disk free space alerts don’t show percent and MB free values in the alert description

     

    NOTE:  This article has been updated – please see the latest version at:

      http://blogs.technet.com/b/kevinholman/archive/2014/02/05/opsmgr-logical-disk-free-space-alerts-don-t-show-percent-and-mb-free-values-in-the-alert-description-updated-for-server-2012.aspx

     

     

    I recently wrote about the new Base OS Monitoring Packs that shipped, adding many new features and fixes for monitoring the OS.  You can read more about that new release HERE.   While this MP update contained many fixes and new features which are VERY beneficial in making alerts more actionable by controlling “false positives”, some of these modifications left a bit of a negative side effect.

    One of the areas this new MP focused on, was changing a lot of the “average threshold” monitors to “consecutive sample” monitors.  This helps control the noise when there are short term fluctuations in a performance value, or when some counters can spike tremendously for a very short time, skewing the average.  So for the most part – changing these over to consecutive samples is a good thing.  That said, one of the changes made was to the Logical Disk free space monitors, both for Windows Server 2003 and 2008 disks.

    The script used to monitor logical disk free space in previous versions of the Monitoring Pack would output two additional propertybags for free space in MB and Percent.  This was very useful, because these values could easily be added to the alert description, alert context, and health explorer.  This was very beneficial, because the consumer of the alert in a notification knew precisely how much space was left for each and every alert generated.  Here are some examples of how it looked previously:

     

    image

    image

    image

     

    Now – when the new MP shipped – this script was changed to support the new consecutive samples monitortype, and was completely re-written.  When it was rewritten, the script no longer returned these propertybags, so they were removed from the alert description, alert context, and health explorer.  The current MP (6.0.6958.0) looks like this:

    image

    The monitor still works perfectly as designed, and you are alerted when thresholds that you set are breached.  The only negative side effect is the loss of information in the alert description.

    Several customers have indicated that they preferred to have these values back in the alert description.  The only real way to handle this scenario, until the signed and sealed MP gets updated at some point in the future, is to disable the built in monitor, and enable a new monitor with an alert description that you like.

    I have written two addendum MP’s attached at the bottom of this article, which do exactly that – I created two new monitors (essentially the same monitors from the previous older version of the Base OS MP’s) and included two overrides which disable the existing monitors from the sealed MP’s.  These two new monitors are essentially exact copies of the monitors before they got updated.  They run once per hour and have all the default settings from the previous monitors.

    With the addendum MP imported – health explorer looks like the following:

    image

    Note the new name for the addendum monitor, and the fact that the existing “Logical Disk Free Space” monitor is unloaded as it is disabled via override.

     

    These addendum MP’s for Windows Server 2003 and Windows Server 2008 each simply include a script datasource, monitortype, and monitor to use instead of the items in the current sealed Base OS MP’s.  These addendum MP’s are unsealed, so you have two options:

    1. Leave them unsealed, and use them as-is.  This allows you to be able to tweak the monitor names, alert descriptions, and any other settings further.
    2. Seal the MP’s with your own key (recommended) after making any adjustments that you desire.  This will be necessary in order to create overrides for existing groups in other MP’s should you desire to use those.

     

    One caveat to understand – is that any overrides you have created on the existing Base OS free space monitors will have to be re-created here on these new ones.  There is no easy workaround for that.

    Let me know if you have any issues using these addendum MP’s (which are provided as a sample only) and I will try to address them.

     

    Credits – to Larry Mosley at Microsoft for doing most of the initial heavy lifting writing the workaround MP.

    Another approach:  Daniele Grandini has authored a different solution to this issue.  What he has done, is to add diagnostics to the existing sealed Logical Disk Free space monitors, which will add the actual disk free space in MB and % to Health explorer, so console users can have this information in real time as they use alert/health explorer to troubleshoot a free space issue.  His solution will not be able to add these values to the alert description to be sent in an email notification/pager/ticket, but for those companies that use the console and health explorer, it is a more graceful solution in that you don’t have to re-engineer all your existing overrides, and you still get the benefit of having consecutive samples.  It is worth a look:  http://nocentdocent.wordpress.com/2011/11/19/opsmgr-logical-disk-free-space-alerts-dont-show-percent-and-mb-free-values-in-the-alert-description/comment-page-1/#comment-1018

  • OpsMgr 2012: New feature – the Agent Control Panel applet

    One of the many new features in OpsMgr 2012 is the Agent control panel applet.

     

    image

     

    While this applet has several key features, the primary use is the ability to add and remove management groups from the agent/server.  While this was possible in OM 2007, you had to run the MSI or “Modify” the agent setup in Control Panel > Programs and features.  Now – we no longer have to run through agent setup – and can configure this via a simple applet.

     

    Opening this applet, we are presented with the following:

     

    image

     

    As you can see – adding or removing a Management Group is a snap.

     

    You can also EDIT an existing management group – but all we can change is the default agent action account.  This is handy if you have deployed servers as local system, but your server owners want the workflows to execute under a specific domain credential that the server owner controls the password on.  This makes a useful scenario to avoid configuring run-as accounts in a highly distributed environment.

     

    image

     

    One thing to note here – this applet demonstrates that Active Directory Integration is enabled by default.  Yes – that is correct – the Health Service defaults to enabled for AD integration, even if you pushed your agents out via the console, or manually installed.  This is because you could be multi-homed with a non-AD integrated management group, and another AD integrated management group.  If you NEVER wanted your agents to participate in any AD integrated management groups – you could consider disabling this.  I will discuss some command line methods to control this functionality below.  Note:  This UI simply controls the registry -

    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\ConnectorManager\EnableADIntegration         (0 = disabled, 1 [default] = enabled)

     

     

    So – let’s summarize. 

     

    The key features of the OpsMgr agent applet are:

    1. Ability to add and remove management groups on an agent
    2. Ability to change default agent action account on the agent itself
    3. Ability to turn on/off AD integration

     

     

    Command line/script examples:

    Additionally – the deployment of this applet gives us command line/script controls of these features.  The applet is deployed via a DLL that is installed/registered upon agent deployment.  This DLL can also be deployed/registered manually if required. 

    “Regsvr32 AgentConfigManager.dll” which is documented in the SDK doco here:    http://msdn.microsoft.com/en-us/library/hh329076.aspx

    Additionally – you can script add/remove of management groups:  http://msdn.microsoft.com/en-us/library/hh329017.aspx

    Or output/monitor management groups, configurations, etc:  http://msdn.microsoft.com/en-us/library/hh352628.aspx

    Detailed documentation on these SDK methods are available in the 2012 reference library:  http://msdn.microsoft.com/en-us/library/hh329031.aspx

  • Dude – unseal my MP!

    image

     

    Tim McFadden of Microsoft PFE has done it again.  He continually writes handy tools and solutions to help the OpsMgr admin.  His Scheduled Maintenance Mode solution is wildly famous.  His latest tool published is something I will use all the time:

     

    http://www.scom2k7.com/mp2xmlpro-management-pack-conversion-tool/

     

    MP2XMLPRO is a GUI tool which makes unsealing an MP super easy.  Now, many SCOM admins never need to unseal an MP…. but if you ever want to run a DIFF against an old MP to a new MP, or crack open MP’s so you can troubleshooting a datasource or a script, or bulk dump a big list of MP’s, this tool does it all.

    It runs in three modes:

    Unseal a single MP file

    Unseal all MP’s in a directory

    Connect to a RMS and unseal\export all MP’s to a directory.

  • OpsMgr: MP Update: New Base OS MP 6.0.6958.0 ships.

    Recently I discussed that we released a new Base OS MP 6.0.6957.0 which added many new features to the base OS MP’s.  In some of these new features, we got some feedback on some issues, and we are shipping an updated version of the MP to resolve the majority of the reported issues.  See my previous post describing the new features here:

    http://blogs.technet.com/b/kevinholman/archive/2011/09/30/opsmgr-new-base-os-mp-6-0-6956-0-adds-cluster-shared-volume-monitoring-bpa-and-many-changes.aspx

     

    Get the new version 6.0.6958.0 from the download center:  http://www.microsoft.com/download/en/details.aspx?id=9296

     

    What’s new?

     

    • Disabled BPA Rules by default.

    The Best Practices Analyzer monitor is now shipped disabled out of the box.  Since most customers have a lack of adherence to the best practices on specific server roles, and this monitor would generate a significant amount of noise in most customer environments, it has been changed to disabled by default.  You can enable this if you would like to compare your server roles against the built in Server 2008 R2 BPA and receive alerts on this.

    • Added appropriate SQL Stored Procedures credentials

    The reports we shipped in the new Microsoft.Windows.Server.Reports.mp contained two stored procedures which required manual intervention to assign permissions, previously.  This has been resolved.

    ***Note – this MP with these new reports was designed for SQL2008 reporting environments only.  It will fail to deploy on SQL 2005 SCOM infrastructures.  If you are using SQL 2005 for a backend for OpsMgr databases and reporting, either upgrade to SQL 2008 or later, or do not import this MP.  If you have already imported this MP, delete it.  It is not supported for SQL 2005.

    • Updated Knowledge for Logical Disks

    The knowledge for the logical disk free space monitors was updated to reflect the new default values.

    • Updated Overrides for Logical Disks

    In the previous release (6.0.6957.0) of this MP, some of your previous overrides would not apply.  This has been resolved in the current version of the MP.

    • Fixed %Idle time sorting in the utilization report.