Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Hyper-V, Live Migration, and the upgrade to 10 gigabit Ethernet

Hyper-V, Live Migration, and the upgrade to 10 gigabit Ethernet

  • Comments 15
  • Likes

 

My lab consists of 2 Dell Precision T7500 workstations, each configured with 96GB of RAM.  These are each nodes in a Hyper-V 2012 cluster.  They mount cluster shared volumes via iSCSI, some are SSD, and some are SAS RAID based disks, from a 3rd Dell Precision Workstation.

One of the things I have experienced, is that when I want to patch the hosts, I pause the node, and drain the roles.  This kicks off a live migration of all the VM’s on Node1 to Node2.  This can take a substantial amount of time, as these VM’s are consuming around 80GB of memory. 

image

 

image

 

When performing a full live migration of these 18 VM’s across a single 1GB Ethernet connection, the Ethernet link was 100% saturated, and it took exactly 13 minutes and 15 seconds.

 

I recently got a couple 10 gigabit Ethernet cards for my lab environment.  I scored an awesome deal on eBay for 10 cards for $250, or $25 for each Dell/Broadcom 10GBe card!  The problem I have now is that the CHEAPEST 10GBe switch on the market is $850.  No way am I paying that for my lab.  The good news is, these cards, just like 1GB Ethernet cards, support direct connect auto MDI/MDIX detection, so you can form an old school “crossover” connection just using a standard patch cable.  I did order a CAT6A cable just to be safe.

Once I installed and configured the new 10GBe cards, I set them up in the Cluster as a Live Migration network:

image

image

 

image

 

 

 

image

 

 

The same live migration over 10GBe took 65 SECONDS!

 

 

In summary -

 

1GB Live migration, 18 VM’s, 13m15s.

10GB Live migration, 18 VM’s, 65 seconds.

In my case, I can drastically decrease the live migration latency, with minimal cost, by using a direct connection between two hosts in a cluster with 10 gigabit Ethernet.   Aidan Finn, MVP – has a post with similar results:  http://www.aidanfinn.com/?p=12228

 

 

Next up, I wanted to create a “converged” network, by carving up my 10GBe NIC into multiple virtual NIC’s, by connecting it to the Hyper-V virtual switch, and then create virtual adapters.  Aidan has a good write-up on the concept here:  http://www.aidanfinn.com/?p=12588

 

Here is a graphic that shows the concept from his blog:

image

 

The supported network configuration guide for Hyper-V clusters is located here:

http://technet.microsoft.com/en-us/library/ff428137(v=WS.10).aspx

 

Typically in the past, you would see 4 NIC’s, one for management, cluster, live migration, and virtual machines.  The common alternative would be to use a single 10GBe NIC (or two in a highly available team) and then use virtual network adapters on a Hyper-V switch, and QoS to carve up weighting.  In my case, I have a dedicated NIC for management (the parent partition/OS) and a dedicated NIC for Hyper-V virtual machines.  On my 10GBe NIC, I want to connect that one to a Hyper-V virtual switch, and then create virtual network adapters – one for Live Migration and one for Cluster/CSV communication, bot.

 

We will be using the QoS guidelines posted at:  http://technet.microsoft.com/en-us/library/jj735302.aspx

John Savill has also done a nice quick walkthrough of a similar configuration:  http://savilltech.com/blog/2013/06/13/new-video-on-networking-for-windows-server-2012-hyper-v-clusters/

 

When I start – my current network configuration look like this:

image

 

We will be attaching the 10GbE network adapter to a new Hyper-V switch, and then creating two virtual network adapters, then applying QoS to each in order to ensure that both channels have their sufficient required bandwidth in the case of contention on the network.

 

Open PowerShell.

To get a list of the names of each NIC:

Get-NetAdapter

To create the new switch, with bandwidth weighting mode:

New-VMSwitch “ConvergedSwitch” –NetAdapterName “10GBE NIC” –MinimumBandwidthMode Weight –AllowManagementOS $false

To see our new virtual switch:

Get-VMSwitch

 

You will also see this in Hyper-V manager:

 

image

 

Next up, Create a virtual NIC in the management operating system for Live Migration, and connect it to the new virtual switch:

Add-VMNetworkAdapter –ManagementOS –Name “LM” –SwitchName “ConvergedSwitch”

Create a virtual NIC in the management operating system for Cluster/CSV communications, and connect it to the new virtual switch:

Add-VMNetworkAdapter –ManagementOS –Name “Cluster” –SwitchName “ConvergedSwitch”

View the new virtual network adapters in powershell:

Get-VMNetworkAdapter –All

View them in the OS:

image

 

Assign a minimum bandwidth weighting to give QoS for both virtual NIC’s, but apply heavier weighting to Live Migrations in the case of contention on the network:

Set-VMNetworkAdapter –ManagementOS –Name “LM” –MinimumBandwidthWeight 90
Set-VMNetworkAdapter –ManagementOS –Name “Cluster” –MinimumBandwidthWeight 10

Set the weighting so that the total of all VMNetworkAdapters on the switch equal 100.  The configuration above will (roughly) allow ~90% for the LM network, and ~10% for the Cluster network.

To view the bandwidth settings of each virtual NIC:

Get-VMNetworkAdapter -All | fl

 

At this point, I need to assign IP address information to each virtual NIC, and then repeat this configuration on all nodes in my cluster.

 

After this step is completed, and you confirm that you can ping each other’s interfaces, you can configure the networks in Failover Cluster Administrator.  Rename each network appropriately, and configure Live Migration and Cluster communication settings:

 

 

image

 

image

In the above picture – I don’t allow cluster communication on the live migration network – but this is optional and you certainly can allow that if the primary cluster communication fails.

 

 

image

image

 

 

Test Live Migration and ensure performance and communications are working properly.

 

In Summary – here is all the PowerShell used:

Get-NetAdapter
New-VMSwitch “ConvergedSwitch” –NetAdapterName “10GBE NIC” –MinimumBandwidthMode Weight –AllowManagementOS $false
Get-VMSwitch
Add-VMNetworkAdapter –ManagementOS –Name “LM” –SwitchName “ConvergedSwitch”
Add-VMNetworkAdapter –ManagementOS –Name “Cluster” –SwitchName “ConvergedSwitch”
Get-VMNetworkAdapter -All | fl
Set-VMNetworkAdapter –ManagementOS –Name “LM” –MinimumBandwidthWeight 90
Set-VMNetworkAdapter –ManagementOS –Name “Cluster” –MinimumBandwidthWeight 10

 

This configuration worked.  HOWEVER, it did exposed a limitation.  I noticed that using vNICs I was only able to sustain about 3GB/s on the same live migrations, where I was achieving 10GB/s before.  This is due to the fact that RSS is not exposed to virtual NIC’s on the host/management partition, which own the live migration networks.  When using these virtual NIC’s to transfer a data stream from host to host, you will see a single CPU core pegged, as it manages the traffic in this scenario.

 

Here is the maximum traffic that could be sent using this configuration on my server:

 

 

image

 

 

Below you will see the single core that was pegged during the live migration:

 

image

 

 

If you are sharing a converged network design, this still might be acceptable, as some of the bandwidth will be needed for all your VM’s on the host, some will be needed for management and client access traffic, some for CSV and cluster communications.  However, if you want a design with high speed live migrations, you should likely plan for using physical NIC’s for Live Migration, and for CSV (in the cases of redirected IO).  These can use teaming for redundancy, but better to use SMB multi-channel in Server 2012 R2, as live migration will leverage SMB advanced features, like multi-channel and RDMA (SMB Direct).

Comments
  • As always fantastic post, I think I now have test lab envy. Sudden urge to up the RAM in my IBM x3650 m2 and search ebay for a second box so I can play around with the live migration functionality.

  • Thanks.  They are cheaper than you might think.  You can find the T7500's with nice dual XEON's on ebay for under $1000 or right at it.  I think I paid $1000 for one and $850 for the other.  Then RAM for these was about $650 for the 96GB... a couple SSD's, and you have a sweet setup for under $2000 per box.  These were added over time... I try to buy one lab server per year or so.

  • Great content Kevin. I really need to switch to some Xeon lab servers, thanks for the heads up on the T7500's. Ebay search starts in 3,2,1.....

  • When looking for lab servers - one thing to keep in mind, is that memory expandability is often related to number of CPU's.  For instance, the T7500 has the additional memory slots on the second CPU riser, and maps memory to that CPU.  So in order to maximize the memory, you have to ensure you get a dual CPU config.  Adding an additional CPU later can cost more than the whole server.

  • Kevin:

    Now that MS has ditched the TechNet subscriptions, what options are there for testing out software.

    --Tracy

  • I think Marnix summed it up pretty well.  Sorry.... don't have better news:

    thoughtsonopsmgr.blogspot.com/.../a-farewell-to-old-friend.html

  • Thanks for the links, Kevin.  I look forward to the day when we get vRSS for management OS vNICs.  Then we might see the all-virtual converged network getting better Live Migration results.

  • I too have lab envy. I am looking for the best option for the storage for a lab.

    How many disks and raid volumes do you have in the third workstation?

    How are you connecting your hosts to the storage? Is gigabit enough for 18 VM?

  • @Merlus -

    My disks presented to the cluster in my lab are all on a single "storage server".  It is essentially composed of 3 SSD drives connected to the Intel motherboard SATA controller, and then a RAID0 array composed of 4 15k SAS disks.  Then I created an iSCSI disk on each of the 4 "drives" (3 SSD's, one spinning disk array).  Each cluster node mounts the iSCSI volumes, which are presented as ClusterSharedVolumes.  I put my critical VM's on the SSD's, and all the ancillary VM's on the larger RAID0 array.  If I could afford some 512GB SSD drives, I'd do away with all the spinning disks.  With Hyper-V storage migration, it makes adding/removing/changing/upgrading disks really easy.  If you didn't want to use iSCSI, you could easily create a single node Scale Out File Server and do the same thing without the complexity of iSCSI, which I am planning on transitioning to once I upgrade everything to WS 2012 R2.  Yes, gigabit is FINE for the storage connection.  With 35 VM's I never come close to saturating it.  The latency on the network is near zero.  You'd have to start all 18 VM's at the same time to even see the network become a bottleneck.  

  • Hi Kevin

    i am having one query that if particular service is running under specfic service accounts and if that service failed means whether SCOM will be able to restart the service

  • it will start service which is running under local system or local service account ?

  • @ Kevin have you re-created this lab setup with 2012 R2? With vRSS I'd like to hear how much it will saturate now.

  • @ Kevin can you give some more specific info on the 10gb ethernet card that you bought?

  • Dell Broadcom RK-375

  • Hi there. I'm just wondering how you got the network cards so cheap? Were they second hand?

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs