SCVMM & Failover Cluster Host Fails with Error 10429 – Older version of VMM Server installed

SCVMM & Failover Cluster Host Fails with Error 10429 – Older version of VMM Server installed

  • Comments 5
  • Likes

Recently, I had the unfortunate task of moving the System Center VMM Database off of a SQL cluster over to a stand-alone version of SQL 2008 SP1.  This sounds extremely simple though it is unfortunate that it isn’t quite as simple as it possibly could be.  In this blog, I thought I would share my experience in case anyone runs into a similar issue.  I’m happy to report, though, that the steps below cleared the issue and all systems are working as desired.

Setup

  • 7-node Failover Cluster using Windows Server 2008 R2
  • 2 Clustered Share Volumes (CSVs) attached to cluster
  • SCVMM 2008 R2

How to migrate VMM Database

The first step is to detach/offline the Virtual Machine Managers’ database VirtualManager.  This is easily accomplished using SQL Management Studio and outlined in the following MSDN article.  Once this is done, you can copy the database to a shared location where your target host can access it.  Copy the MDF & LDF files from the shared location to your new database server.  Import the database.

To recap, the steps are:

  1. Backup your VMM Database
  2. Detach your Database
  3. Copy to a “Shared” location
  4. Import the database (and log)

This is where it gets a bit “nastier” than it has to be in my opinion.  You are now required to uninstall SCVMM and select the “Retain Data” option.  Yes, I said uninstall.  You will now re-install VMM on the server (which you just removed it from) and on the database selection screen use existing database.  Do not select to “Create New Database” as this is for a new install.

This process, albeit much more complicated than I believe it has to be worked with no problems during this process.

NOTE:  There is a registry key that “supposedly” one can set without this hammer of sorts.  If this works, let me know or next time I will try it and see what happens.

Hosts Need Attention – Update Agent

This is where the fun began, for me at least.  I re-installed the server with no problem and then I opened the VMM Administration console.  Upon opening the console, I was immediately shared with no host happy with anything.  They are all in a “Needs Attention” status.  If you right-click on the host, you notice that you can now select Update Agent.  For 6 out of the 7 hosts, this process worked flawlessly with no work at all.  Then there was the 7th…

Error 10429:  Older version of the VMM server is installed on {Server FQDN} | {IP}

On the seventh server, the VMM server failed each and every time with an error of 10429 stating the VMM server is an older version.  This is funny, for me, because this is the exact same build I had installed previously.  Nonetheless, the server was simply un-willing to cooperate and continued this odd behavior of being “unmanageable.”  The host stayed in a non-responsive state.

 

image

The error message returned by the VMM job in the “Recommended Action” was unfortunately not helpful or “actionable.”  The recommended action to remove the physical host, manually install the VMM agent wasn’t possible.  The reasons are the following:

  1. The “Remove Host” UI or Command-let was unavailable
  2. Attempting to install the VMM Agent on the host returned an error stating that VMM agent couldn’t be installed manually

Wow… this is ugly.  I can’t manage the server though in Failover Cluster all looks good and the guest VM’s are running fine.  Stretching the brain…In the event viewer on the host, you see the following -

Error    6/12/2010 7:02:20 PM    Service Control Manager    7024    None
Information    6/12/2010 7:02:20 PM    Service Control Manager    7036    None
Information    6/12/2010 7:02:20 PM    Service Control Manager    7036    None
Information    6/12/2010 7:02:20 PM    Service Control Manager    7045    None

The information messages are just telling you it is attempting to install VMM while the error message description says the following:

The VMMAgentInstaller service terminated with service-specific error %%-2146041839.

Fix It:  Let’s just get this thing back to working order and worry about the cause later…

For many of you, you will be very much like I was at the moment this problem is occurring.  The service points (VM’s) are online and happy but the physical host isn’t in a manageable state at the current moment.  You don’t care why nor do you want to spend a significant amount of time gathering the debug logs for VMM – you just want it working as it was prior to you initiating the migration of the database.  There isn’t any more frustrating problem than troubleshooting something that was avoidable such as moving a database that is happy.

Let’s just fix this thing…

NOTE:  Doing the following in production isn’t suggested without taking all precautions such as taking backups, etc. and doing the typical “change management” process your company has in place.  I hope you don’t hold me responsible as I will plead the 5th and I will share that this is the real world – you want it fixed.  Period.  All lawyers have *not* approved this message. 

  1. On your workstation or the physical host, open Failover Cluster Manager
  2. Locate the host that is unmanageable, right-click on it and select Live Migrate (or Quick Migrate) and select the target host
  3. Repeat step 2 until all VM guests are off the server
  4. When all VMs are off the physical host, right-click on the node in Failover Cluster Manager and select More Actions, Evict
  5. After the eviction, verify all services are running fine in the Cluster (See Below)
  6. Open VMM Administrator console, and locate the evicted non-clustered physical host (you might need to refresh)
  7. Right-click on the host, select Remove Host (yes, this isn’t unavailable any longer)
  8. In the Failover Cluster Manager, right-click on the cluster and select Add Node, enter the physical hosts name that was evicted
  9. Validate all tests & add back to the cluster
  10. In the VMM Administrator console, refresh the cluster
  11. Right-click, and select to add the node to the cluster after completion
  12. Open VMM Administrator console, and locate the evicted non-clustered physical host (you might need to refresh)
  13. Right-click on the host, select Remove Host (yes, this isn’t unavailable any longer)
  14. In the Failover Cluster Manager, right-click on the cluster and select Add Node, enter the physical hosts name that was evicted
  15. Validate all tests & add back to the cluster
  16. In the VMM Administrator console, refresh the cluster
  17. Right-click, and select to add the node to the cluster after completion

image

After 11 steps, your server is now manageable and ready.  Go ahead and Live Migrate your guest VM back to the server and verify service for those guests.

Summary

This isn’t the cleanest scenario by any stretch of the imagination and it is much more of a hammer than anything I like.  However, you effectively have a server that is unmanageable but all services are happy.  This isn’t satisfactory and the situation needs fixed.  These steps led me back to the promise land and should do the same for you…just be patient.  It is required.  In this blog post, you got educated in how to migrate VMM database to another server, watch one host get upset with this decision, and how you can prevail and still show that host who is in charge. 

Enjoy!

Thanks,

-Chris



Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • I'm pretty sure that using some of the steps in the following blog post would have made this whole operation much easier.

    blogs.technet.com/.../how-to-configure-vmm-2008-to-run-with-a-domain-service-account-using-a-remote-sql-database.aspx

  • Hey there-

    I'm not sure if you are correct but I'm not going to say you are incorrect.  However, I was moving the database from a SQL Cluster to a local instance of SQL (e.g. localhost) and for this I was using a domain account for running the service.  If you could be specific as to what steps you think that could be eliminated, I would love to understand better.

    The point of the post is I found *zero* helpful data for 10429 and the fact that the built in recommendation failed.  Thus, the only method I found to actually get this server back online is to follow these steps.  

    Thanks,

    -Chris

  • I recently ran into virtually the same scenario on a simple SCVMM server upgrade (in other words, not a SQL database move, just upgrading the VMM version). In my Hyper-V cluster, 4 of the 5 host servers upgraded the agent perfectly, no problems. The 5th one, though, gave me fits. First the WinRM service hung at Stopping and the agent installation failed. I found the PID and did taskkill to stop the WinRM service (thanks to this: grinding-it-out.blogspot.com/.../oddity-with-hyper-v-and-virtual-machine.html), then restarted the service and did winrm quickconfig on the host. Still no joy on the VMM console.

    I found this post and was prepared to go through this nightmare, although one PFE at Microsoft advised against it (he said a node evicted from a cluster often cannot be re-added without rebuild). I decided to try something a little simpler first. I went to the host to do a manual uninstall of the agent (using these instructions, as it's Server Core: blogs.technet.com/.../install-and-uninstall-vmm-agent-on-windows-2008-server-core.aspx) and discovered the Uninstall registry key didn't exist for Virtual Machine Manager. Ergo, my uninstall of the old agent version was successful, it just didn't reinstall the new version when I tried to push the upgrade from the VMM console. So I manually installed the new version of the agent on the host, did a Refresh on my VMM console, and voila'! I have a working, manageable Hyper-V cluster in SCVMM again without having to do anything scary (I didn't even have to restart the server, which is nice, since I couldn't put it into maintenance mode). So that's certainly worth a try for anyone who finds themself in a similar pickle.

  • Amazing - I had major issues where one of the hosts in our cluster failed to install an VMM agent update. I couldn't remove the host in VMM because the agent update had failed and the host was no longer responding. After a lot of searching these steps worked perfectly and all is well - THANK YOU!!

  • Hey Dave- I'm glad it helped you! -Chris