From here: Microsoft and Marathon Technologies Expand Relationship to Provide Fault Tolerant Computing to More Businesses – this is great news!
I have to say, going forward into the ‘next generation’ of virtualisation platforms, with Citrix XenServer, and VI4 / vSphere from VMware, I was a little bit jealous of one of their features, namely, Fault Tolerance. Fault Tolerance, when it comes to Virtual Machines, basically works like this; You have a VM where you want Zero downtime for a VM – so you say, well, let’s make it Highly Available, using Hyper-V Failover Clustering, VMware HA, or the like. Well, that’s great, however, with traditional HA, should you lose the physical box that the VM is running on, that VM will go down too, and reboot on another physical node, which is great, but it doesn’t give us Zero downtime.
So, what do you do if you need Zero downtime for a VM? Well, VMware seemed to have it nailed with the upcoming vSphere, and there is a great demo video of VMware Fault Tolerance already on the web. Basically, you select a VM that you want to make ‘Fault Tolerant’, and that VM, utilising VMotion technology, is replicated to another physical host, and kept in sync (yet not actually live and accessible by users), again, using VMotion. Should physical node 1 go down, so will the VM, however the VM that’s identical on physical node 2, now springs into life, and takes over, with no loss of data etc. Cool stuff. Not only that, but because that VM is ‘Fault Tolerant’, it needs another replica somewhere to remain that way, so, another VM replica is created on another available physical node! You’ll always have (providing you have enough physical kit!) 2 copies of the same VM, in almost like an active-ready-to-be-active style state.
So, when I saw this, I thought, very cool feature indeed. Then I found one of our Partners, Marathon technologies, provide the same functionality (in some ways, as Scott alludes to here, it may actually be slightly more powerful under the covers, but a bit to early to say yet) for Citrix XenServer 5. There is a great video of that, below:
You can view a bigger version of this video, here. So, as you can see, it’s a great explanation, and a simple solution to work with, yet the results are excellent, with a number of ‘levels’ working under the covers.
The great thing is, is that we’ve announced an agreement with Marathon to bring these technologies to the Microsoft Hyper-V platform, starting with the R2 wave of technologies. This is great news, and a huge boost for customers. With Windows Server 2008 R2, out of the box, for free, customers will have Live Migration, built on Failover Clustering, so, for planned downtime, manual Live Migrations will keep systems up and running, with no interruption, however, should a physical box go down unplanned, the VMs will reboot on another node. Should this not be enough for customers, Marathon’s everRun will bring the Zero downtime that is desired.
Great stuff – and you can read all about it here.
You can already provide similar functionality by clustering one or more services or applications using clustering at the guest level and host the guests on clustered physical hosts - the caveat being that you can only provide this fault tolerance to cluster aware apps and services, of course.
In a nutshell, can see that Marathon fills a gap where fault tolerance must be provided to VMs hosting apps and services which are not cluster aware.
I agree with you. Clustering at the guest level (providing you have the relevant OS that has clustering capability) is a cheaper way of providing a strong level of HA, however the value-add that Marathon will provide on top will help a number of organisations reach their desired level of availability.
The technology isn't without it's flaws though - if the guest OS that's being protected suffers, say, a blue screen, so does the clone. Plus, you can only, I believe, clone a VM with 1 vCPU, and there is a performance hit involved. How many mission critical VMs have 1 vCPU? Exchange, SQL? Nope!
This technology, whether it's Marathon's, or VMware's, needs to develop further to accelerate adoption.