Welcome to TechNet Blogs Sign in | Join | Help

Applying an OpsMgr hotfix to a RMS Cluster node? Some things to be aware of.

 

When you apply a SCOM hotfix to a RMS cluster, you need to be aware of some issues, and some workarounds.  This is something I have seen several times in the field…

 

On any server/agent, the Hotfix installer will stop any discovered OpsMgr services, including the SDK, Config, and HealthService.  This part is normal.  It does this in order to update the files (DLL’s) that are part of the hotfix payload, and then it will start the services again when complete.  This all works well, except for on RMS clusters.

 

The reason for this, is that the Hotfix installer is not 100% cluster aware. 

In a RMS cluster… the passive node will have these three services stopped, and the services will be set to Manual Startup.  On the active node – the OpsMgr services are also set to Manual Startup, but the services are running, because the Cluster service controls these services now.  This is how a clustered service works, and we should not ever stop a clustered service in Service Control Manager, we really should take the resource offline, in Cluster Admin. 

 

So I have two options… I can apply the hotfix to the Active Node… or the Passive node. 

 

If I choose the active node – the hotfix installer will try and stop all the OpsMgr services, and this will cause the Cluster service to try and restart them, or eventually fail them over to the passive node – depending on your Cluster configuration settings.  Therefore – it is probably best to patch the passive node first… ensure the hotfix applied correctly, and then move the cluster group and OpsMgr RMS group over to the freshly hotfixed node… and go patch the other one (now passive)

This works – but is not 100% smooth.  When we apply the hotfix to the passive node, the hotfix installer will try and start the services at the end of the process, even though they were not running previously.  We do NOT want these services trying to run on the passive node – since it does not own the cluster disk resources…. so the services will start, but cannot do anything but log errors.  

You will also see an error from the HealthService – not being able to start.  It is apparent that this service fails because it cannot access the disk resource, but the SDK and config services WILL start.

What is worse – is that the hotfix installer – changes the config of the service startup types to Automatic – which means these services will continue to try and run on the passive node across reboots.

 

So – the guidance I have, for RMS clusters – is:

  1. Patch the passive node (we will call this Node 2)
  2. Click ok on the HealthService start failure error.
  3. Ensure the hotfix applied by inspecting the DLL(s) versions as documented in the KB.
  4. Stop the running SDK and Config services on the passive node.
  5. Set any OpsMgr services that were changed to Automatic – BACK to Manual.
  6. Move the cluster resource groups over to the freshly patched Node 2.
  7. On Node 1 (now passive) apply the hotfix, and repeat steps starting at Step 2 above.

NOTE:  This is only applicable to OpsMgr specific hotfixes.  For OS hotfixes – you would follow your standard clustered OS hotfix routine.

Published Wednesday, February 25, 2009 2:20 AM by kevinhol
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Applying an OpsMgr hotfix to a RMS Cluster node? Some things to be aware of.

Friday, March 20, 2009 8:20 AM by Peter

Hi

Great article - it´s a problem which is very little covered

My experience with OpsMgr hotfix installation on Cluster environment is to take RMS cluster offline (3 services) - install hotfix on active node - check the automatic/manuel service issue and vertified the services is still stopped (otherwise stop them)- verified the RMS cluster is still offline - install hotfix on passive node - check the automatic/manuel service issue and verified the services is still stopped (otherwise stop them)- verified the RMS cluster is still offline. Take it online and look i Event View for errors. After that - the MS, GW, Manually installed agents can be updated and Agents approved

# re: Applying an OpsMgr hotfix to a RMS Cluster node? Some things to be aware of.

Friday, March 20, 2009 9:50 AM by kevinhol

The only thing I dont like about that - is:

1.  Your method experiences downtime for SCOM.  We should be able to patch SCOM with no downtime if we have a clustered RMS.  By patching the passive node in all cases, this is closer to how we patch the OS in a clustered situation.

2.  I dont like taking cluster resources online by forcing the services to start.  This can have unpredictable results... and potentially cause the cluster to fail to start on the node you are patching, and failover to the other node.  I played with that process, and that is how I came up with the process I documented.

Not saying yours wont work - it will... it just seems like it doesnt have any pro's over always patching the passive node?  The only pro I can see, is your way always ensures there is only one Config and SDK service running at any given time.

# re: Applying an OpsMgr hotfix to a RMS Cluster node? Some things to be aware of.

Wednesday, March 25, 2009 7:32 AM by Peter

certainly agree with your downtime issue. But as Opsmgr is not clusteraware - it is not the primary goal to keep it up the whole time, but to ensure that the cluster is functional after hotfix patching. But I will try your scenario and see how it works - so if the result is the same, you are right - no downtime. But it´s nice to discuss it, cause there isn’t a lot hotfix_cluster information out there and no recommendations from Microsoft.

# re: Applying an OpsMgr hotfix to a RMS Cluster node? Some things to be aware of.

Wednesday, April 08, 2009 12:08 PM by brent flesner

Patch active Node and pause passive node cluster so that SCOM does not failover.  Then move SCOM to other node and repeat the same process.

# re: Patching active node

Wednesday, April 08, 2009 12:28 PM by kevinhol

Why patch the active node?  I am curious.  The concept behind clustering - is no downtime for the application, even while patching.  If you patch the active node this is not the case.  What is the benefit of patching the active node, over the steps I outlined in the article?

# re: Applying an OpsMgr hotfix to a RMS Cluster node? Some things to be aware of.

Tuesday, April 28, 2009 7:23 AM by Peter

I have been testing three methods of patching Operations Mangager 2007 hotfixes on clustered RMS.

One mentioned from Derek which will go for patching the active node and pause the passive meanwhile.

The second method is Kevin´s way of patching which go for standard patching method where it is the passive node we go for first.

The Third method I used to go for was a method where the goal was to path the active node, but the cluster RMS should be offline first. And seen from a operations Manager administrator perspective the easiest way is the Kevin way. No downtime and no OpsMgr Windows services which will merge with the three OpsMgr cluster services, because the OpsMgr cluster services control it.

From an Operations Manager user perspective it will best with the Kevin method because of no downtime (or a little when do a move. This is done with looking in event logs, but how those three ways of patching methods have a influence on the Operations Manager 2007 DB or other places, I have not tested. Maybe somebody will.

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
 
Page view tracker