Cloud scale multitenant networking stack

Cloud scale multitenant networking stack

  • Comments 2
  • Likes

Cloud scale architectures require highly scalable core infrastructure services. For instance, gateways that provide Network Address Translation or Site-to-Site VPN capabilities for the datacenter must be able to support a large number of tenants. This blog post provides a high-level overview of a key innovation in Windows Server 2012 R2, the multitenant networking stack.

Introduction

Windows Server 2012 introduced Hyper-V Network Virtualization. It extends the concept of server virtualization to apply to physical networks. With network virtualization, virtual networks can be created and run on top of a shared physical network. Each virtual network has the illusion that it runs on a dedicated network. Combined with site-to-site VPN technologies, Hyper-V Network Virtualization enables an enterprise to move workloads to a virtual network hosted in a service provider’s datacenter and connect its on-premise network to the virtual network. This is wonderful.

To route traffic between an on-premise network and a virtual network, a gateway is needed. Traditionally a service provider has to dedicate a virtual machine, configured as the gateway, to each tenant. This gateway won’t be able to serve any other tenants because TCP/IP stack in traditional OS can’t isolate one tenant’s IP addresses and routes from another tenant’s. As more enterprises migrate their workloads to a service provider’s datacenter, having a gateway VM per tenant can start getting expensive from both capital expenditure (CAPEX) and operation expenditure (OPEX) perspective. This needs improvement.

And what an improvement we made in Windows Server 2012 R2! We built a networking stack that supports 100 tenants, who can bring their own IPs and routing policies, and each tenant is isolated from the other.

Multitenant Networking Stack

To explain the new networking stack, we must start by defining Compartment. Conceptually, a compartment is a logical container within the TCP/IP stack. Windows can add one or more IP interfaces to each compartment. Each IP interface can have one or more IP addresses and routes. One compartment is completely isolated from another. In other words, same IP addresses and same/conflicting routes can be configured in two different compartments. So it should be no surprise that one compartment is assigned to one tenant. Therefore, a compartment is 1:1 mapped to a VM Network in SCVMM, 1:1 mapped to a Routing Domain in PowerShell, and 1:1 mapped to a virtual network, a term we’ve used frequently in all of our published contents, including this blog. 

You may notice a special compartment in the figure below, Default Compartment. The default compartment exists perpetually in the TCP/IP stack. You can’t remove it. In contrast, you can add or remove a tenant compartment dynamically as you add or remove a tenant. All existing network services written for Windows Server 2012 or before can only access IP resources in the default compartment in Windows Server 2012 R2. But the new multitenant network services can interface with the tenant compartments and serve each tenant separately and independently. We implemented a number of such multitenant network services in Windows Server 2012 R2, e.g. NAT, Site-to-Site VPN, VPN access, and BGP.

Now that there are multiple compartments and each has its own IP interfaces, how is an incoming packet indicated to the compartment that it is targeted for? Remember, if this packet arrives from a virtual network, it must have a virtual subnet ID (VSID) in the packet header. Because this VSID is unique and assigned to only one tenant, Windows sends the packet to the tenant’s compartment based on this VSID. If this packet arrives from a physical network, it doesn’t have a VSID. Therefore, it is simply sent to the default compartment. In the other direction, when a multitenant network service sends a packet it knows from which compartment the packet is to be sent out. If the packet is sent out from a tenant compartment, the same VSID that is used to indicate the incoming packet is stamped on the outgoing packet. If the packet is sent out from the default compartment, it is not tagged.

So now you have the foundation of the multitenant gateway.

MultitenantNetworkStack

Compartment Management

We strongly recommend that you use System Center Virtual Machine Manager (SCVMM) to deploy and manage our gateway, which runs in a virtual machine. SCVMM configures tenant compartments transparently for you so you don’t have to go through tedious configurations. But you may want to know what really happens under the cover. To help you understand, we’ll go over a simple exercise. We’ll introduce some new PowerShell cmdlets so that you can do the exercise yourself.

First, you may already wonder how Windows adds or removes a tenant compartment. For that, you need to configure the Hyper-V host. More specifically, you need to run the following configurations on a gateway’s “internal” NIC. A gateway must have multiple NICs. The “internal’' NIC is the one that connects the gateway to the tenant’s virtual network. The gateway probably has other NICs to connect to the physical network, the management network, or the cluster network.

Set-VmNetworkAdapterIsolation -VMName "MultitenantGW" -VMNetworkAdapterName 
"Internal" -IsolationMode NativeVirtualSubnet -MultiTenantStack On 


Add-VmNetworkAdapterRoutingDomainMapping -VMName "MultitenantGW" 
-VMNetworkAdapterName "Internal" -RoutingDomainID 
"{12345678-1000-2000-3000-123456780005}" -RoutingDomainName "NorthWind" -IsolationID 
8111 -IsolationName "NorthWindGw"   

The first cmdlet simply tells the gateway (named as “MultitenantGW”) that it’s going to support multiple tenants and the way to differentiate one tenant from another is NativeVirtualSubnet, i.e. the VSID in the packet. This cmdlet only needs to be run once, but it must be run before you add any tenant, which is exactly what the second cmdlet does. The second cmdlet tells the gateway to create a tenant compartment for NorthWind and to send any packet tagged with VSID 8111 to this tenant compartment. Needless to day, if the Hyper-V switch receives a packet tagged with 8111 from the gateway, it’ll send the packet to NorthWind’s virtual network for routing.

You don’t need to shut down the gateway VM for the above configurations. It’d be unacceptable if you had to disrupt other tenants, which are connected through the same gateway, in order to onboard a new tenant, wouldn’t it?

Now let’s take a look from inside the gateway VM. From an elevated PowerShell command on the gateway, run the new cmdlet Get-NetCompartment. You’ll find tenant NorthWind now has its own compartment, which has a compartment ID, 4. (On this setup, this gateway also hosts two other tenants, Contoso and Fabrikam.)

PS C:\> Get-NetCompartment


CompartmentId          : 1
CompartmentDescription : Default Compartment
CompartmentGuid        : {b1062982-2b18-4b4f-b3d5-a78ddb9cdd49}

CompartmentId          : 2
CompartmentDescription : Contoso
CompartmentGuid        : {12345678-1000-2000-3000-123456780001}

CompartmentId          : 3
CompartmentDescription : Fabrikam
CompartmentGuid        : {12345678-1000-2000-3000-123456780002}

CompartmentId          : 4
CompartmentDescription : NorthWind
CompartmentGuid        : {12345678-1000-2000-3000-123456780005}

You can enumerate IP interfaces in the new tenant compartment by Get-NetIPInterface. It should be noted that if you use PowerShell to examine a tenant configuration, e.g. IP addresses or IP routes, you must specify the new flag, –IncludeAllCompartments. Remember, NorthWind is in Compartment 4 and NorthWindGW, the interface alias, got its name from the configuration on the Hyper-V host when you ran Add-VMNetworkAdapterRoutingDomainMapping.

PS C:\> Get-NetIPInterface -IncludeAllCompartments -CompartmentId 4

ifIndex InterfaceAlias                  AddressFamily NlMtu(Bytes) InterfaceMetric Dhcp     ConnectionState Polic
------- --------------                  ------------- ------------ --------------- ----     --------------- -----
53      NorthWindGW                     IPv6                  1500               5 Disabled Connected       Activ
52      Loopback Pseudo-Interface 4     IPv6            4294967295              50 Disabled Connected       Activ
53      NorthWindGW                     IPv4                  1500               5 Disabled Connected       Activ
52      Loopback Pseudo-Interface 4     IPv4            4294967295              50 Disabled Connected       Activ

Now that you know the interface alias and the interface index of the IP interface in the compartment you just created for NorthWind, you can add IP addresses and routes for the tenant. (You can refer to New-NetIPAddress and New-NetRoute for details.)

PS C:\> New-NetIPAddress -InterfaceAlias NorthWindGW -IPAddress 10.0.2.2 -PrefixLength 24


IPAddress         : 10.0.2.2
InterfaceIndex    : 53
InterfaceAlias    : NorthWindGW
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 24
PrefixOrigin      : Manual
SuffixOrigin      : Manual
AddressState      : Tentative
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : 10.0.2.2
InterfaceIndex    : 53
InterfaceAlias    : NorthWindGW
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 24
PrefixOrigin      : Manual
SuffixOrigin      : Manual
AddressState      : Invalid
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : PersistentStore



PS C:\> New-NetRoute -InterfaceAlias NorthWindGW -DestinationPrefix 10.0.1.0/24 -NextHop 
10.0.2.1

ifIndex DestinationPrefix                              NextHop                                  RouteMetric Polic
------- -----------------                              -------                                  ----------- -----
53      10.0.1.0/24                                    10.0.2.1                                         256 Activ
53      10.0.1.0/24                                    10.0.2.1                                         256 Persi

That’s it! You just added a tenant on the gateway and configured the gateway IP, 10.0.2.2, for the tenant’s virtual network, which presumably has a virtual subnet 10.0.1.0/24.

To remove a tenant compartment, simply run the following configuration.

Remove-VMNetworkAdapterRoutingDomainMapping -VMName "MultitenantGW" -VMNetworkAdapterName
 "Internal" -RoutingDomainID "{12345678-1000-2000-3000-123456780005}"

Once again, the above exercise is intended to show you what happens in Windows when you onboard a tenant on a multitenant gateway. If you use SCVMM it does all this work for you.

Ping and Diagnostics

While SCVMM makes the multitenant gateway deployment and management easy for you, from time to time you may need to roll up your sleeves and do some troubleshooting yourselves. Diagnostics is a big topic. We can’t cover all the cases and details in this blog, but you can read New Networking Diagnostics with PowerShell in Windows Server 2012 R2, which includes a few new cmdlets for virtualized environment. Here, we’ll just show how you can check connectivity and collect TCP/IP traces for a tenant on the multitenant gateway.

Everyone, or at least every admin, loves ping. It’s simple and elegant. So when it comes to test connectivity on a gateway, you probably will start with ping. However, to test connectivity for a tenant on a multitenant gateway, you must know in which compartment the tenant is. The example below shows how you can ping a destination on NorthWind’s virtual network. NorthWind is in Compartment 4, so you must specify /c 4 in the command.

PS C:\> ping /c 4 10.0.1.101

Pinging 10.0.1.101 with 32 bytes of data:
Reply from 10.0.1.101: bytes=32 time=2ms TTL=127
Reply from 10.0.1.101: bytes=32 time<1ms TTL=127
Reply from 10.0.1.101: bytes=32 time=1ms TTL=127
Reply from 10.0.1.101: bytes=32 time<1ms TTL=127

Ping statistics for 10.0.1.101:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 2ms, Average = 0ms

Another diagnostics tool that advanced admin uses is ETW tracing. It is inherently harder than before to trace events and capture packets through the multitenant networking stack because multiple tenants are hosted in the stack and packets and events for different tenants may overlap and conflict. To address this problem, we implemented a new filter, RoutingDomain, in Windows Server 2012 R2 that allows you to trace events and packets per tenant.

netsh trace start provider=Microsoft-Windows-TCPIP providerFilter=Yes 
RoutingDomain="{12345678-1000-2000-3000-123456780005}"

Try it out and let us know what you think.

Summary

Hyper-V Network Virtualization gateway is a key component in Microsoft’s Software-defined Networking solution. It enables connectivity between a virtual network and a physical network. The new multitenant networking stack in the R2 release adds scale to the gateway with lease overhead. Instead of spinning off one gateway per tenant, a service provider can consolidate multiple tenant gateways on a single virtual machine to reduce CAPEX and OPEX.

If you have any comments we would love to hear them!

Charley Wen, Program Manager, Windows Core Networking

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Hi - Thanks for the catchy article! Quick one: Okay. Gateway is configured and can route traffic for both the customers - how does Customer VMs talk to this gateway? Do they need to use the IP Address of the gateway inside the VM or PA network? Thanks! Margarita