James O'Neill's blog

Windows Platform, Virtualization and PowerShell with a little Photography for good measure.

Drilling into ‘reasons for not switching to Hyper-V’

Drilling into ‘reasons for not switching to Hyper-V’

  • Comments 13
  • Likes

Information week published an article last week “9 Reasons why enterprises shouldn’t switch to hyper-v”. The Author is Elias Khnaser, this is his website and this is the company he works for.  A few people have taken him to task over it, including Aidan . I’ve covered all the points he made, most of which seem to have come the VMwares bumper book of FUD, but I wanted to start with one point which I hadn’t seen before. 

Live migration. Elias talked of “an infrastructure that would cause me to spend more time in front of my management console waiting for live migration to migrate 40 VMs from one host to another, ONE AT A TIME.” and claimed it “would take an administrator double or triple the time it would an ESX admin just to move VMs from host to host”.  Posting a comment to the original piece he went off the deep end replying to Justin’s comments , saying “Live Migration you can migrate 40 VMs if nothing is happening? Listen, I really have no time to sit here trying to educate you as a reply like this on the live migration is just a mockery. Son, Hyper-v supports 1 live VM migration at a time.” . Now this does at least start with a fact : Hyper-V only allows one VM to be in flight on a given node at any moment: but you can issue one command and it moves all the hyper-v VMs between nodes. Here’s the PowerShell command that does it.
Get-ClusterNode -Name grommit-r2 | Get-ClusterGroup |
  where-object { Get-ClusterResource -input $_ | where {$_.resourcetype -like "Virtual Machine*"}} |
     Move-ClusterVirtualMachineRole -Node wallace-r2             
The video shows it in action with 2 VMs but it could just as easily be 200.  The only people who would “spend more time in front of [a] management console” are those who are not up to speed with Windows Clustering. System Center will sequence moves for you as well. But… does it matter if the VMs are migrated in series or in parallel ?  If you have a mesh of Network connections between cluster nodes you could be copying to 2 nodes of two networks with the parallel method, but if you don’t (and most clusters don’t) then n copies will go at 1/n the speed of a single copy. Surely if you have 40VMs an they take a minute to move it takes 40 minutes either way…  right ? Well no... Let’s use some rounded numbers for illustration only: say 55 seconds of the minute is doing the initial copy of memory, 4 seconds doing the second pass copy of memory pages which changed in that 55 seconds, and 1 second doing the 3rd pass copy and handshaking. Then Hyper-V moves onto the next VM and the process repeats 40 times. What happens with 40 copies in parallel ? Somewhere in 37th minute the first pass copies complete - none of the VMs have moved to their new node yet. Now: if 4 seconds worth changed in 55 seconds – that’s about 7% of all the pages - what percentage will have changed in 36 minutes ?  Some won’t change from hour to hour and others change from second to second – how many actually change in 55 seconds or  36 minutes or any other length of time depends on the work being done at that point, and the memory size and will be enormously variable. However the extreme points are clear (a) In the very best case no memory changes and the parallel copy takes as long as the sequential. In all other cases it takes longer (b) In the worst case scenario the second pass has to copy everything – when that happens the migration will never complete.  

Breadth of OS support. In Microsoft-speak “supported”  means a support incident can go to the point of issuing a hot-fix if need be. Not supported doesn’t mean non-cooperation if you need help – but the support people can’t make the same guarantee of a resolution. By that definition, we don’t “support” any other companies’ software – they provide hot-fixes, not us - but we do have arrangements with some vendors so a customer can open a support case and have it handed on to Microsoft or handed on by Microsoft as a single incident. We have those arrangements with Novell for Suse Linux and Red Hat for RHEL, and it’s reasonable to think we are negotiating arrangements for more platforms: those who know what is likely to be announced in future won’t identify which platforms to avoid prejudicing the process. In VMware-speak “supported”, has a different meaning. In their terms NT4 is “Supported”. NT4 works on HyperV but without hot-fixes for NT4 it’s not “Supported”. If NT4 is supported on VMware and not on Hyper-V exactly how is a customer better off ? Comparisons using different definitions of “support” are meaningless. “Such and Such an OS works on ESX / Vsphere but fails on Hyper-V” or “Vendor X works with VMware but not with Microsoft” allows the customer can say “so what” or “That’s a deal-breaker”.

Security.  Was it hyper-v that had the vulnerability which let VMs break out of into the host partition ? No that was VMware. Elias commented that "You had some time to patch before the exploit hit all your servers" which makes me worry about his understanding of network worms. He also brings up the discredited disk footprint argument; that is based on the fallacy that every Megabyte of  code is equally prone to vulnerabilities, Jeff sank that one months ago and pretty comprehensively – the patch record  shows a little code from VMware has more flaws than a lot of code of Microsoft’s.

Memory over-commit. Vmware's advice is don't do it. Deceiving a virtualized OS about the amount of memory at its disposal means it makes bad decisions about what to bring into memory - with the virtualization layer paging blindly - not knowing what needs to be in memory and what doesn’t. That means you must size your hardware for more disk operations, and still accept worse performance. Elias writes about using oversubscription, “to power-on VMs when a host experiences hardware failure”. In other words the VMs fail over to another host which is already at capacity and oversubscription magically makes the extra capacity you need. We’d design things with a node’s worth of unused memory (and CPU , Network, and Disk IOps ) in the other node[s] of the cluster. VMware will cite their ability to share memory pages, but this doesn’t scale well to very large memory systems (more pages to compare), and to work you must not have [1] large amounts of data in memory in the VMs (the data will be different in each), or [2]  OSes which support entry point randomization (Vista, Win7, Server 2008/2008-R2) or [3] heterogeneous operating systems. Back in March 2008 I showed how a Hyper-v solution was more cost effective if you spent some of the extra cost of buying VMware on memory – in fact I showed the maths underneath it and how under limited circumstances VMware could come out better. Advocates for VMware [Elias included] say buying VMware buys greater VM density: the same amount spent on RAM buys even-greater density. The VMware case is always based on a fixed amount of memory in the server: as I said back then, either you want to run [a number of] VMs on the box, or the budget per box is [a number] Who ever yelled "Screw the budget, Screw the workload. Keep the memory constant !" ? The flaw in that argument is more pronounced now than it was when I first pointed it out as the amount of RAM you get for the price of VMware has increased.

Hot add memory. Hyper-v only does hot-add of disk, not memory. Some guest OSes won’t support it at all. Is it an operation which justifies the extra cost of VMware ? . 

Priority restart - Elias describes a situation where all the domain controllers / DNS servers on are one host. In my days in Microsoft Consulting Services reviewing designs customers had in front of them, I would have condemned a design which did that, and asked some tough questions of whoever proposed it.  It takes scripting (or very conservative start-up timeouts) in Hyper-V to manage this. I don’t know enough of the feature in VMware to know how sequences things not based on the OS running but all the services being ready to respond

Fault tolerance. VMware can offer parallel running - with serious restrictions. Hyper-v needs 3rd party products (Marathon) to match that.  What this saves is the downtime to restart the VM after an unforeseen hardware failure. It’s no help with software failures if the app crashes, or the OS in the VM crashes, then both instances crash identically. Clustering at the application level is the only way to guarantee high levels of service: how else do you cope with patching the OS in the VM or the application itself ?      click for larger version

Maturity: If you have a new competitor show up in your market, you tell people how long you have been around. But what is the advantage in VMware’s case ? Shouldn’t age give rise to wisdom, the kind of wisdom which stops you shipping Updates which cause High Availability VMs to unexpectedly reboot, or shipping beta time-bomb code in a release product. It’s an interesting debating point whether VMware had that Wisdom and lost it – if so they have passed through maturity and reached senility.

 Third Party vendor support. Here’s a photo. At a meet-the-suppliers event one of our customers put on, they had us next to VMware. Notice we’ve got System Center Virtual Machine manager on our stand, running in VM, managing two other hyper-V hosts which happen to be clustered, but the lack of traffic at the VMware stand allows us to see they weren’t showing any software – a full demo of our latest and greatest needs 3 laptops, and theirs ? Well the choice of hardware is a bit limiting. There is a huge range of management products to augment Windows – indeed the whole reason for bring System Center in is that it manages hardware, Virtualization (including VMware) and Virtualized workloads. When Elias talks of 3rd party vendors I think he means people like him – and that would mean he’s saying you should buy VMware because that’s what he sells.



Comments
  • Nice follow up James!  And thanks for the link :)

  • So are you making a challenge to all of the researchers out there to see if they can find a guest escape vulnerability in Hyper-V?  Really?  Do you want to go there?  

    Help me out with the patch record thing too.  Do you (MS) really want to go to the patch record argument?  Seriously?  

  • "The [vendor] doth protest too much, methinks"?

    Back to Hyper-V RTM, I recall similar arguments regarding Quick Migration--sort of a deflection when it came to a VMware feature that Hyper-V lacked.  That's not at all to say that your points are without merit... but I think it really boils down to this--are customers finding value in those features or not?  For example--are customers finding value in using memory over-commit?  Of course some are.  Hot-add memory?  I think for the future vision of a truly dynamic infrastructure that adjusts itself based on load, this is a necessity--very much like how a single OS install will manage/page memory to individual apps based on need and activity.

    You have some good points to make.  In my opinion, it would be better to make those dispassionately, and also simply concede/articulate vision where there's an area that Microsoft can meet its customers needs more fully--and why/how they should still consider the Hyper-V solution today.  We know that if, for example, Microsoft introduces memory over-commit in a future version, it will surely be brought up as a feature where Hyper-V now matches VMware (much like Live Migration's introduction in R2).

  • Is it just my command of the English language, but "Memory over-commit. Vmware's advice is don't do it." I don't read that at all in the document, and you stating it simply looks like jading the truth and works against credibility.  Be honest with the reader.  Otherwise, it looks just like another Quick Migration is as good as Live Migration argument.

  • @anonymous. YES. Do you think researchers have not tried to find a guest escape vulnerability. Am I happy to take Microsoft's Patch record over the life of Hyper-V and go up against VMware. You bet I am. [Microsoft's patch record c. 2002 ? Not-so-much]

    @Ryan, actually  I agree with you. HyperV itself is free, you have to pay for VMware: so anyone doing due diligence should look at what VMware offers and ask if it is worth the money (or if you prefer, look at the free product and ask if it is up to the Job). I'm happy with the proportion who will say "no it isn't" but only the customer can say if they are in the group who say "Yes it is, for us"

    @Kent

    In the section "Memory performance Best practices" at the top of Page 6 it says clearly

    "Make sure the host has more physical memory than the total amount of memory that will be used by ESX plus the sum of the working set sizes that will be used by all the virtual machines running at any one time."

    I don't think we ever said Quick Migration was as good as Live Migration. If you can plan the move you can schedule a couple of minutes of downtime - and when we lacked live migration we had to keep pointing that out. Customers told us they saw the logic of that but they still felt a lot happier knowing they could live migrate.

  • James, there is no product called 'VMware', as you well know -- would you say 'Microsoft' when referring to a product? VMware, the company, published a suite of products, of which some, including VMware Server but most importantly ESXi, are in fact free [zero-cost]. vSphere is decidedly not free, but just like Windows 2008 R2, comes in a variety of levels and price points which offer various feature sets.

    No reasonable person would ever conclude that Hyper-V's OS support comes even remotely close to that of vSPhere/ESXi. For instance, our organization has numerous production VMs running FreeBSD and Solaris/OpenSolaris, with VMware Tools running happily in those VMs. We couldn't do the same with Hyper-V.

    Memory overcommit is a real, useful feature -- in a limited set of circumstances (for us, mostly desktop virtualization). There are certainly cons to using it, but it's nice to have the option.

    Although Live Migration has mostly caught up with vMotion (and doesn't come with the big price tag that vMotion does), Hyper-V is again playing catch-up because there's no equivalent feature to Storage vMotion. We use that regularly to move VMs among different back-end datastores for reasons of policy, performance tiering, maintenance and the like.

    Hyper-V's storage options are also more limited. While vSphere/ESXi can use just about any NFSv3 filer, and also just about any iSCSI or FiberChannel SAN, Hyper-V doesn't support any NAS storage (even CIFS/SMB2) as a clustered back end, and requires an iSCSI implementation that supports persistent reservations. This can affect costs for clustering, since very free free/low-cost iSCSI implementations support the required features. OpenSolaris does, one of FreeBSD's iSCSI targets does, and a few others, but OpenFiler and the like do not [yet]. Likewise, customers may already have a fast, robust NAS infrastructure in place, but they can't use it for Hyper-V. Contrary to FUD I've seen elsewhere, NFS is indistinguishable in terms of speed from iSCSI in almost all cases (and faster in some); our organization is about to switch to 10GbE NFS for even faster storage.

    None of this implies that Hyper-V isn't a decent product with some compelling features, and the initial price tag is almost always (but not universally) less than vSphere, depending on size and shape of the deployment and features. But it's very silly, for proponents of either  technology, to ignore the real pros and cons on each. Actually, it's even worse to actually be a proponent of either; it's like advocating a green hammer vs. a blue mallet. Both offer roughly analogous features, but they are *not* the same, and arguments devolve to discussing the color anyway ;)

  • @Zarafa.

    First off, thanks: As a set of reasons for why VMware was right for you (but that might not apply to everyone) it's one of the more intelligent things I've read. To answer specifics ...

    Yes I occasionally say VMware to stand for products much as people used to say "Novell" for "Netware" it's sloppy of me and I try not to do it.

    We are making extensions available to all the Linux distros, but we have nothing to say right now on FreeBSD and Solaris.

    Memory overcommit - the one place where it does work is if you are running Virtual Desktops with XP (not Vista or 7) and small amounts of memory to maximize the number of shareable pages (when memory constrained its the same core stuff which will be kept in memory on all of them). Often a terminal server will do that job better, butt not always. Run a FreeBSD VM, a Soloaris VM, a couple of Linux versions, a Windows Server 2003 / 2008 and 2008R2 and give each a moderate size server app : allocate each multiple GB of memory with total assigned 2x what is present and the picture isn't so rosy at all.

    I'm pretty sure Windows Server will support more disk controllers than any of the products from Vmware but we have a long track record of being uncomfortable with stuff being on filers.

    I like the green hammer / blue mallet analogy and will probably use it discussions in the future.

  • James,

    The root concern on Live Migration isn't that you can't execute all of the Live Migrations at the same time, it's that these will execute in serial rather than in parallel. Regardless of how the calculations are done, a parallel operation is always going to be faster than a serial one.

    Also, I don't think that video helps in illustrating how "simple" it is to execute all of these move commands at once. When you compare this to the competitors alternative of invoking multiple move simultaneously through a gui, this video doesn't really help your cause.

  • James,

    I work for VMware and am one of the people responsible for our performance white papers.

    You incorrectly state that VMware recommends against memory over-commit.  It is foolish for you to make this statement, supported by unknown text in our performance white paper, when so much of our literature demonstrates the phenomenal value of this wonderful feature.

    If you think any of the language in this document supports your position, please quote the specific text.  I urge you to put that comment in a blog entry of its own.  I am sure that your interpretation of the text will receive comments from many of your customers.

    I will say again: we absolutely do not recommend against memory over-commit.  We love it, our customers love it, and you guys will, too, once you have provided similar functionality in Hyper-V.

    Thank you,

    Scott

  • Are you freaking joking? Here is a "photo" of a random event proving some kind of vendor support advantage for Microsoft?

  • @Matt - it was just a proof that we could stick our solution on the hardware we had in the cupboard - nothing more.

    @Paul.  A parallel transfer is not automatically quicker than a serial one. Think of ordinary file copies, if you are copying 10 files in paralell you get closer to using the full network bandwidth : but you get the same effect with a multi-treaded copy moving each file in series. If the content is changing then ... see above.


    @Scott - OK another blog post coming up.

  • James,

    I understand your argument on parallel vs serial. Only testing would tell if one is better than the other. I think  that your article doesn't address some or tries to deflect from the root concerns of the original author. e.g.

    1) The author is just trying to say that there are some things missing still from Live Migration. I think your response highlights this (e.g. can't do multiple without scripting and lack of parallel support which was the author's original point)

    2) Regardless of how you define support, you have to acknowledge that VMware "supports" more operating systems than Hyper-V. That was the original author's point.

    3) I happen to agree with some of the original author's points here. However just wanted to say that do we really want to compare security models by highlighting isolated incidents? This can work both ways. Also the types of incidents that come up is reflective of the type of product, features, how it is being used and how long it has been out in the market. Additionally you site a blog article as "sinking" the footprint myth. A blog article doesn't make something fact. See comments in that particular article.

    4) Original author is just trying to say that this is a function that customers value and do use. Like the Microsoft articles that came out before Live Migration trying to convince the world that VMotion wasn't required, I expect this to continue until Hyper-V has this functionality.

    5) Where supported, I think this is valuable. Not enough on its own to justify.

    6) Again, point is being made that Hyper-V doesn't support this natively. Yes lots of things can be done via scripting as you showed in item 1. But many people want things to just work out of the box.

    7) Another "Quick Migration" scenario. Hyper-V doesn't have it, so dismiss the need. Like Overcommit, many people see this as a valuable function.

    8) Don't think isolated incidents help. This goes both ways. Maturity delivers, Functional maturity, features that are built in and don't need to be scripted, a mature support and delivery organization, reliability, solution breadth. It doesn't make a product infalible.

    9) I don't think you would have put the picture up there if you weren't trying to cause readers to make the connection between no people and vendor support. Like Matt's comment, I don't think a picture at some random event prove's anything.

    You do make some valuable points, but lets not loose sight of some of the simple truths that the original author was trying to get across.

  • Paul.

    1. If we're missing anything it is in the UI. Remember with Parallel after 90% of the process is done none of the VMs are on their new host. If you're copying because you have warning of a hardware failure I'd like to have 90% of my VMs copied at 90% of the time.

    2. See what I said higher up the comments. Elias post was just FUD. "Your OS will probably be OK on VMware not on Microsoft" there will be specific OSes which are better supported on one or the other.

    3. Again the original article was FUD. VMware will have better security than Microsoft because their OS is smaller/dedicated to virtualization rather than general purpose/larger. [No one cares about the size of the management OS partition, whenever VMware people bring up footprint it is for this FUD]

    Count the vulnerabilities, compare the number of system reboots for patches , count the number of times that patches have introduced regressions and Microsoft is doing a better job than Vmware. "do we really want to compare security models by highlighting security incidents?". I think that's a better way which compares theoretical benefits of an architecture but ignores the track record of the only implementation of that architecture.

    4. I don't think we ever said live migration wasn't required. We did say it was oversold - based on market research on the number of customers whose processes needed Live migration. What that didn't show up was the number of customers who just couldn't feel comfortable without it. We could say they didn't actually need it until were blue in the face, and we might even have been right... but they in the back of their mind there was always the worry that a short service interruption might end up causing a problem.

    If you choose to buy virtualization the question is always "where does the value come from?" Different people will have different views of each feature - you can expect me to say "I think most customers will think that's just 'chrome'" and Vmware fans will say we've got that wrong every time, and every feature "is like live migration": I don't think the lack of any of features listed here, creates the same degree of worry as the lack of Live migration.  [see the follow-up article on dynamic memory and over-commit too]

    5. A-ha! we agree :-)  

    6. It's an extra feature. And it might make a worthwhile difference if you are stuck with a bad design.

    7. Fault tolerance is only cutting out the reboot time in a hardware failure, it doesn't cope with application crashes, or guest OS crashes. VMware want you to believe that if you use a Microsoft stack under your VMs it is flakey, but if you use a more complicated Microsoft stack inside a VM that is much more reliable than your hardware.

    I believe it only works with a Single CPU so you can't use it for heavy workloads, and if the workload is that critical you need to cluster at the application level.

    8. Maturity would mean not shipping updates which broke customers systems. It's another FUD claim

    9. Maybe the photo wasn't a good call on my part, I'm sure we could have found a time when our stand was clear and VMware's was hidden. My point was we had a hyper-v demo going with SCVMM , clustering and live migration on 3 laptops. VMware couldn't bring a demo because of their hardware demands. It's just anecdotal evidence - it wasn't supposed to be proof.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment