Thanks very much for all of the passionate feedback and input on my original technical comparison article, VMware or Microsoft? Comparing vSphere 5.5 and Windows Server 2012 R2 Hyper-V At-A-Glance!  This article has most definitely been one of my most broadly reviewed and commented articles out of all articles that I’ve published to-date with well over 30,000 views of this single comparison article alone. 

Thank you for also sharing my comparison throughout your communities to help IT Pros evaluate the capabilities of each virtualization and Private Cloud platform in terms of their environment needs!

The Excitement is Building

Some Misconceptions …

However, I’ve also seen some areas of misconception building in the online community … Apparently, certain readers have misunderstood my original comparison article as being some sort of "attack" on VMware solutions.  Quite the contrary – as I called out in the summary of my original article, there’s a lot to be gained from both solution offerings for customers, but … when evaluating and deciding on appropriate solutions, I’ve found that IT Pros also want to dive into the details to better understand potential limitations and additional considerations that they should be evaluating to see if it's important in their environment – simple checklists don’t do the trick.

This article is a follow-up to my original article with the intention of addressing the common misconceptions I’ve seen repeated in the online community after publishing my comparison.  I’ve bucketed each area of misconception in its own section below, with additional technical commentary based on my experience with both solution offerings in an attempt to correct these misconceptions and provide a bit more background than I was able to provide in the original comparison article.

As always, please feel free to leave additional comments below with any additional points-of-view that you may have! Truly one of the great things about social collaboration is the benefits we derive by expanding the scope of our own experiences when learning from the experiences of others. 

Bias? Sure ... but not Unfair

OK - I do currently work for Microsoft as a technical evangelist, and obviously I believe that we have some awesome technical capabilities in our solution stack - after all, Windows Server 2012 Hyper-V is the virtualization fabric used in our global Windows Azure datacenters consisting of hundreds of thousands of physical hosts providing cloud services across 89 countries and territories world-wide, arguably making Hyper-V one of the world's most highly-scaled and battle-tested hypervisors.  You are also reading a Microsoft TechNet blog.  As a result, I’m sure there’s some bias in my conclusions - but then, don't we all have biases to some extent?

Throughout my career, I’ve probably implemented an equal dose of VMware and Windows network environments.  Over the last few years, my work has been primarily oriented towards Private Cloud, Public Cloud and Hybrid Cloud environments that are virtualizing Windows and Linux workloads.  As those of you who have attended my technical events can attest, I am upfront about technical areas where some Microsoft solutions may currently have gaps that I've seen impact customers and projects.  Of course, to be fair, I will also be the first to point out limitations in other products that I've seen impact customers and projects. 

  • Technical accuracy of the information that I share is my top priority.

My conclusions are based on my own field experiences – experiences as an infrastructure consultant in designing and implementing datacenter solutions for hundreds of customer organizations and, most recently, experiences as a technical Microsoft employee collaborating with thousands of the best IT Pros in the industry on their virtualization scenarios across broad customer segments in the United States. 

Of course, you may have different experiences with your customer organizations, and I certainly respect and appreciate all of your points-of-view.  Hopefully, you can respect my points-of-view as well, without characterizing them as FUD, incorrect, dishonest or unfair simply because they don’t agree with your own thoughts. At the end of the day, all of our technical points-of-view are good input to customers making technology decisions, and it’s up to them to evaluate which of those points resonates best with their needs.

Technical Accuracy

In drawing my conclusions in my original comparison article, I’ve compared my past experiences to the capabilities in the latest version of each virtualization stack, and then tested each comparison area in a lab environment for technical accuracy.  Of course, I’m human – so, if you feel that I've overlooked an important consideration, feel free to share your experiences and I’ll be happy to research and adjust as needed.

To-date, it’s important to note that none of the online discussions have pointed out any technical inaccuracies in my comparisons.  Rather, they’ve generally acknowledged the technical details that I’m providing and have attempted to argue on the point of whether it’s a real concern to their customers.  And, that’s fine … I’ve openly stated in my original article that each reader should evaluate the areas that are most important to them, and I fully expect each of you, as experienced IT Pros, to determine if my considerations in each relevant area impact your environment and decisions.

To that extent, for each comparison area in my original article, I’ve provided my conclusions based on my experience in the field and in the lab, with comments and linked resources where I’ve seen significant advantages, disadvantages or additional considerations impact the projects on which I’ve been assigned.  These comments and resources provide important additional details to evaluate, because often times a particular technical capability is not a simple Yes or No answer and requires deeper considerations.

Operations Management and Monitoring

In my original comparison article, this topic related to the common features of enterprise-grade operations monitoring and management of virtualization hosts, guest VMs and application workloads provided by Microsoft System Center 2012 R2 and VMware vCOPS.  Most enterprise customers that I've worked with have expressed the need for end-to-end monitoring of the virtualization fabric, VM containers and applications running inside each VM to have total 360-degree visibility when managing their workloads. I felt it was important to note when an additional investment is necessary to deliver these capabilities on the VMware stack.  Note that System Center 2012 R2 includes these capabilities and is not specific to management and monitoring of only Hyper-V environments – many of my customers are using System Center 2012 R2 Operations Manager with Veeam’s Management Pack for VMware, because they felt it fit their needs and budget as a better alternative to adding vCOPs to their environment.

Some of you have pointed out that VMware vCenter includes core management and monitoring capabilities for the hypervisor that doesn’t require vCOPS, and that’s true – the same is also true of Windows Server 2012 R2 and Hyper-V.  While both hypervisor platforms, of course, include core management and monitoring capabilities, the focus of this comparison area was looking beyond those basic capabilities.  I’ve updated the description on this comparison topic in my original article to attempt to better clarify that point.

And this brings me to ...

System Center 2012 R2 Licensing

Many of you have commented that System Center 2012 R2 is not a free-of-charge product, and you are correct.  However, the comparison article clearly states that I am comparing Windows Server 2012 R2 Datacenter + System Center 2012 R2 Datacenter with vSphere 5.5 Enterprise Plus + vCenter Server 5.5.  I selected these product configurations to compare, because they represent the most common product configurations that I’ve seen in the field in customer environments.  I made my comments around additional VMware costs for the topics called out in my comparisons, because these capabilities are not included in the compared product configurations and require additional licensed products. 

In terms of System Center 2012 R2 licensing – please note that a single System Center 2012 R2 Datacenter edition license enables management of each virtualization host for up to two-physical processors for ALL of the System Center 2012 R2 capabilities I’ve called out, and thus does not require additional licensing over and above the compared product configurations.  It appears that those people commenting from the online community may not have been aware that System Center licensing changed in the 2012 product release to roll all management capabilities under a single product SKU.

Dynamic Memory Management and VMware Transparent Page Sharing

Talk about an area of some heated debates! Here’s an expanded version of my conclusions in this comparison area of the original article: It’s been my experience that enterprise customers optimize their production datacenters for performance, and as such when using modern servers ( Nehalem and beyond ) and OS’s, Large Pages is left enabled at its default setting by these customers to maximize memory performance. This experience also appears to be supported by VMware's recent whitepaper, ESX Memory Resource Management: Transparent Page Sharing, which states:

"In fact, most guest workloads perform better when ESX is configured to use large pages."

This whitepaper then goes on to confirm that ...

"However, as a result of using large pages, virtual machines might potentially be deprived from transparent page sharing and its benefits of saving memory."

Now, to be fair, if an ESX host is placed under memory pressure where memory overcommitment is at risk, then ESX selects a subset of large pages to convert into small pages so that TPS can help reduce memory pressure.  The whitepaper concludes ...

"Therefore, when a virtual machine is using large pages, transparent page sharing will become effective only when the memory state of ESX falls below high. Even though the virtual machine may contain shareable content in the form of small pages, these will be considered for sharing only at this time."

By the way, it's been my experience that when these large pages are converted to small pages due to memory pressure, the ESX host does not revert back to accessing these pages as large pages when memory pressure returns to a normalized state - until the ESX host is restarted - potentially penalizing memory performance for the affected VM's in the meantime.

As for my customers ... they generally don't run their datacenters at such an extreme level of memory allocations where they're routinely risking memory overcommitment - instead they manage their environments for best performance. As a result, it’s been my experience that they have generally seen very little to no real memory efficiencies gained via Transparent Page Sharing on vSphere when using modern server hardware and operating systems.

In addition to the whitepaper above, you may find the following article I've linked at the bottom of this section helpful to better understand the considerations around Large Pages and TPS for modern server hardware and software to draw your own conclusions.

Unlimited Concurrent Live Migrations

Another area of heated feedback ... With modern enterprise hardware, I'm seeing customers able to concurrently Live Migrate significantly more running VMs on Windows Server 2012 R2 Hyper-V than the 4-to-8 concurrent vMotions at which vSphere 5.5 is currently capped.  The community has called out some good points in terms of CPU and network overhead considerations when performing Live Migrations using other virtualization solutions that could limit the number of concurrent operations even if the hypervisor does not impose hard-coded limits. 

However, Windows Server 2012 R2 Hyper-V provides the ability to perform Live Migration over RDMA with 10GbE and faster NIC hardware that supports RDMA.  This provides the ability to offload much of the CPU overhead into the NIC and blast VM memory state transfers quickly between hosts.  In fact, after conducting a set of extreme lab tests designed to push server hardware to the limits, we were amazed to see that the hardware-imposed limits are now actually related to the internal memory bus on servers with Windows Server 2012 R2.  With Windows Server 2012 R2 and Live Migration over RDMA, we’re able to max out internal server memory buses when using 3 Infiniband adapters per server for Live Migration. Clearly, in configurations using modern high-speed server NICs with RDMA support, CPU and Network are not the hardware bottleneck to be concerned with, and the ability to set Hyper-V Live Migrations and Live Storage Migrations to higher caps can be advantageous to customers – particularly in scale-up virtualization host environments.

Microsoft Clustering

I agree with many of the community comments that, in the “old days”, Microsoft Clustering could be quite complex.  I've personally designed, built and managed Windows Server clusters starting with the very first "Wolfpack" release in Windows NT Advanced Server, and I'm well aware of the technical nuances that existed in prior days.  However, significant improvements have been made in the Windows Server 2012-era to simplify robust clustering configurations considerably.  In my field experiences, enterprise customers leverage Microsoft Clustering extensively for their mission-critical Windows Server workloads – such as Microsoft Exchange and Microsoft SQL Server.

Some people in the community have attempted to discredit the value of Microsoft Clustering by incorrectly characterizing some Microsoft workloads, such as Exchange DAGs, as an example of modern enterprise applications not using Microsoft Clustering.  Quite the contrary – as Exchange DAG configurations are built on-top of Microsoft Clustering.  Some may be confusing Microsoft Clustering with the need to provide shared cluster storage resources – they are separate concepts, as some clustered application configurations require shared storage while others, such as Exchange DAG and SQL Server AlwaysOn clusters, do not. 

By leveraging Microsoft Clustering in a virtualized environment, customers benefit from the ability to scale-out application workloads and provide application-aware resiliency with super-fast failover times.  As a result, I view the limitations imposed by vSphere on Microsoft Clustering workloads as significant for enterprise customers to consider if they are virtualizing cluster-aware applications.

VMware Fault Tolerance (FT)

OK - I may be somewhat critical of VMware FT.  In my experience, I have never had a customer productively use VMware FT after they understood the limitations it imposes.  For my customers, the level of availability targeted by FT is something that they’ve considered for their most important mission critical apps – however, the limitations imposed also significantly limit their ability to scale-up, scale-out and manage those applications.  For these reasons, all of my customers that had considered VMware FT had chosen not to leverage it. 

If you have different experiences - feel free to share! I’d be interested to hear the specific scenarios where you’ve had first-hand experience leveraging VMware FT in a production environment.  Don’t get me wrong – FT is certainly an admirable engineering feat, but the limitations have made the practical use of VMware FT improbable for my customers.

VMware Virtual SAN (VSAN)

Several of you have claimed that I unfairly characterized VMware VSAN as an “experimental feature” in my original article. Actually, I was quoting VMware’s official product documentation where VMware calls VSAN an “experimental feature” – Check for yourself, here is the link to the quoted VMware documentation from my original article.  Enough said ...

If you have interest in storage virtualization solutions that can be used today in production environments, be sure to evaluate Storage Spaces, the SAN-like Storage Virtualization solution included in Windows Server 2012 and 2012 R2 that uses commodity hardware, with the resources linked below.

BTW – in R2, we’ve also optimized Storage Spaces and SMB 3.0 Scale-out File Servers (SoFS) for many different IO block sizes and IO patterns as well as with high-speed 10GbE or faster adapters with RDMA support.  We are now seeing disk IO performance that not only compares well with traditional iSCSI SANs, but also rivals the disk IO throughput found in more expensive fibre channel solutions.  

Dynamic Storage Balancing and VMware Storage DRS

Actually, IMHO, VMware Storage DRS was a great idea in the old days before automated storage tiering was a feature commonly available on enterprise storage solutions.  The most common use case that I saw back then for Storage DRS was optimizing ongoing VM placement across different classes of storage based on IO needs. Of course, Storage DRS can still also be productively used to distribute initial storage placement of new VM workloads - something that Microsoft System Center 2012 R2 also addresses via the included Intelligent Placement feature set.

However ... in my recent conversations with customers, they’re looking for a more granular level of ongoing storage optimization – at the block level, rather than the VM level – nowadays.  So, they’re either already investing in SANs that provide automated storage tiering between SSD and HDD, or they are leveraging software capabilities that provide automated storage tiering at a block-level across SSD and HDD – such as the automated tiering built into Windows Server 2012 R2 Storage Spaces.  So, it’s not that Storage DRS is a bad technology – it just doesn’t meet the storage load balancing needs of the customers I’m talking with today.  If you have older storage that doesn't have automated tiering capabilities, be sure to check out Storage Spaces and Automated Storage Tiering before you upgrade your SAN again.

Network Virtualization - Microsoft HNV and VMware NSX

Windows Server 2012 R2 + System Center 2012 R2 implement a full Software-Defined Networking (SDN) stack with Hyper-V Network Virtualization (HNV).  HNV offers capabilities that are comparable to the separately licensed VMware NSX product. Arguably NSX has the potential to support multi-hypervisor environments, whereas today the HNV solution is closely integrated into the Hyper-V hypervisor.  However, NSX is also a separate product and not included in vSphere, which is why I didn’t go into depth in comparing the technologies in my original article.

A full comparison of network virtualization stacks could easily be a whole series of articles in itself – and I’ll definitely consider this for the future. For now, here’s a link to a great recorded session that will help you get started with Hyper-V Network Virtualization:

We Live in Exciting Times!

I hope this article has helped to clear up some of the common misconceptions I’ve seen floating around online after publishing my original comparison article.

All-in-all – these are very exciting times for us all as infrastructure professionals! Understanding the advantages and considerations for each solution is certainly important to ensure that we are presenting and designing the best solutions for customers. I look forward to your continued comments and feedback!

I also look forward to hearing from you all as you consider pursuit of the MCSE Private Cloud certification to extend your technical skillset to include the Microsoft virtualization and Private Cloud platform!

Additional resources in which you may be interested ...

See you in the clouds!

Keith