Microsoft Enterprise Platforms Support: Windows Server Core Team
The Windows Server 2008 Failover Clustering feature provides high availability for services and applications. To ensure applications and services remain highly available, it is imperative the cluster service running on each node in the cluster function at the highest level possible. Providing redundant and reliable communications connectivity among all the nodes in a cluster plays a large role in ensuring for the smooth functioning of the cluster. Configuring proper communications connectivity within a failover cluster not only provides access to highly available services required by clients but also guarantees the connectivity the cluster requires for its own internal communications needs. The sections that follow discuss Windows Server 2008 Failover Clustering networking features, functionality and recommended processes for the proper configuration and implementation of network connectivity within a cluster.
The following sections provide the information needed to understand failover cluster networking and to properly implement it.
Windows Server 2008 Failover Cluster networking features
Windows Server 2008 Failover Clustering introduces new networking capabilities that are a major shift away from the way things have been done in legacy clusters (Windows 2000\2003 and NT 4.0). Some of these take advantage of the new networking features that are included as part of the operating system and others are a result of feedback that has been received from customers. The new features include:
New cluster network driver architecture
The legacy cluster network driver (clusnet.sys) has been replaced with a new NDIS level driver called the Microsoft Failover Cluster Virtual Adapter (netft.sys). Whereas the legacy cluster network driver was listed as a Non-Plug and Play Driver, the new fault tolerant adapter actually appears as a network adapter when hidden devices are displayed in the Device Manager snap-in (Figure 1).
Figure 1: Device Manger Snap-in
The driver information is shown in Figure 2.
Figure 2: Microsoft Failover Cluster Virtual Adapter driver
The cluster adapter is also listed in the output of an ipconfig /all command on each node (Figure 3).
Figure 3: Microsoft Failover Cluster Virtual Adapter configuration information
The Failover Cluster Virtual Adapter is assigned a Media Access Control (MAC) address that is based on the MAC address of the first enumerated (by NDIS) physical NIC in the cluster node (Figure 4) and uses an APIPA (Automatic Private Internet Protocol Addressing) address.
Figure 4: Microsoft Failover Cluster Virtual Adapter MAC address
The goal of the new driver model is to sustain TCP/IP connectivity between two or more systems despite the failure of any component in the network path. This goal can be achieved provided at least one alternate physical path is available. In other words, a network component failure (NIC, router, switch, hub, etc…) should not cause inter-node cluster communications to break down, and communication should continue making progress in a timely manner (i.e. it may have a slower response but it will still exist) as long as an alternate physical route (link) is still available. If cluster communications cannot proceed on one network, the switchover to another cluster-enabled network is automatic. This is one of the primary reasons that each cluster node must have multiple network adapters available to support cluster communications and each one should be connected to different switches.
The failover cluster virtual adapter is implemented as an NDIS miniport adapter that pairs an internally constructed virtual route with each network found in a cluster node. The physical network adapters are exposed at the IP layer on each node. The NETFT driver transfers packets (cluster communications) on the virtual adapter by tunneling through the best available route in its internal routing table (Figure 5).
Figure 5: NetFT traffic flow diagram
Here is an example to illustrate this concept. A 2-Node cluster is connected to three networks that each node has in common (Public, Cluster and iSCSI). The output of an ipconfig /all command from one of the nodes is shown in Figure 6.
Figure 6: Example Cluster Node IP configuration
Note: Do not be concerned with the name ‘Microsoft Virtual Machine Bus Network Adapter’ as these examples were derived from cluster nodes running as Guests in Hyper-V.
The Microsoft Failover Cluster Virtual Adapter configuration information for each node is shown in Figure 7. Keep in mind; the default port for cluster communication is still TCP\UDP: 3343.
Figure 7: Node Failover Cluster Virtual Adapter configuration information
When the cluster service starts, and a node either Forms or Joins a cluster, NETFT, along with other components, is responsible for determining the node’s network configuration and connectivity with other nodes in the cluster. One of the first actions is establishing connectivity with the Microsoft Failover Cluster Virtual Adapter on all nodes in the cluster. Figure 8 shows an example of this in the cluster log.
Figure 8: Microsoft Failover Cluster Virtual Adapter information exchange
Note: You can see in Figure 8 that the endpoint pairs consist of both IPv4 and IPv6 addresses. The NETFT adapter prefers to use IPv6 and therefore will choose the IPv6 addresses for each end point to use.
As the cluster service startup continues, and the node either Forms or Joins a cluster, routing information is added to NETFT. Using the three networks mentioned previously, Figure 9 shows each route being added to a cluster.
Route between 220.127.116.11 and 18.104.22.168
Route between 192.168.0.31 and 192.168.0.32
Route between 172.16.0.31 and 172.16.0.32
Figure 9: Routes discovered and added to NETFT
Each ‘real’ route is added to the ‘virtual’ routes associated with the virtual adapter (NETFT). Again, note the preference for NETFT to use IPv6 as the protocol of choice.
The capability to place cluster nodes on different, routed networks in support of Multi-Site Clusters
Beginning with Windows Server 2008 failover clustering, individual cluster nodes can be located on separate, routed networks. This requires that resources that depend on IP Address resources (i.e. Network Name resources), implement an OR logic since it is unlikely that every cluster node will have a direct local connection to every network the cluster is aware of. This facilitates IP Address and hence Network Name resources coming online when services\applications failover to remote nodes. Here is an example (Figure 10) of the dependencies for the cluster name on a machine connected to two different networks.
Figure 10: Cluster Network Name resource with an OR dependency
All IP addresses associated with a Network Name resource, which come online, will be dynamically registered in DNS (if configured for dynamic updates). This is the default behavior. If the preferred behavior is to register all IP addresses that a Network Name depends on, then a private property of the Network Name resource must be modified. This private property is called RegisterAllProvidersIP (Figure 11). If this property is set equal to 1, all IP addresses will be registered in DNS and the DNS server will return the list of IP addresses associated with the A-Record to the client.
Figure 11: Parameters for a Network Name resource
Since cluster nodes can be located on different, routed networks, and the communication mechanisms have been changed to use reliable session protocols implemented over UDP (unicast), the networking requirements for Geographically Dispersed (Multi-Site) Clusters have changed. In previous versions of Microsoft clustering, all cluster nodes had to be located on the same network. This required ‘stretched’ VLANs be implemented when configuring multi-site clusters. Beginning with Windows Server 2008, this requirement is no longer necessary in all scenarios.
Support for DHCP assigned IP addresses
Beginning with Windows Server 2008 Failover Clustering, cluster IP address resources can obtain their addressing from DHCP servers as well as via static entries. If the cluster nodes themselves have at least one NIC that is configured to obtain an IP addresses from a DHCP server, then the default behavior will be to obtain an IP address automatically for all cluster IP address resources. The new ‘wizard-based’ processes in Failover Clustering understand the network configuration and will only ask for static addressing information when required. If the cluster node has statically assigned IP addresses, the cluster IP address resources will have to be configured with static IP addresses as well. Cluster IP address resource IP assignment follows the configuration of the physical node and each specific interface on the node. Even if the nodes are configured to obtain their IP addresses from a DHCP server, individual IP address resources can be changed to static addresses (Figure 12).
Figure 12: Changing DHCP assigned to Static IP address
Improvements to the cluster ‘heartbeat’ mechanism
The cluster ‘heartbeat’, or health checking mechanism, has changed in Windows Server 2008. While still using port 3343, it is no longer a broadcast communication. It is now unicast in nature and uses a Request-Reply type process. This provides for higher security and more reliable packet accountability. Using the Microsoft Network Monitor protocol analyzer to capture communications between nodes in a cluster, the ‘heartbeat’ mechanism can be seen (Figure 13).
Figure 13: Network Monitor capture
A typical frame is shown in Figure 14.
Figure 14: Heartbeat frame from a Network Monitor capture
There are properties of the cluster that address the heartbeat mechanism; these include SameSubnetDelay, CrossSubnetDelay, SameSubnetThreshold, and CrossSubnetThreshold (Figure 16).
Figure 16: Properties affecting the cluster heartbeat mechanism
The default configuration (shown here) means the cluster service will wait 5.0 seconds before considering a cluster node to be unreachable and have to regroup to update the view of the cluster (One heartbeat sent every second for five seconds). The limits on these settings are shown in Figure 17. Make changes to the appropriate settings depending on the scenario. The CrossSubnetDelay and CrossSubnetThreshold settings are typically used in multi-site scenarios where WAN links may exhibit higher than normal latency.
Figure 17: Heartbeat Configuration Settings
These settings allow for the heartbeat mechanism to be more ‘tolerant’ of networking delays. Modifying these settings, while a worthwhile test as part of a troubleshooting procedure (discussed later), should not be used as a substitute for identifying and correcting network connection delays.
Support for IPv6
Since the Windows Server 2008 OS will be supporting IPv6, the cluster service needs to support this functionality as well. This includes being able to support IPv6 IP Address resources and IPv4 IP Address resources either alone or in combination in a cluster. Clustering also supports IPv6 Tunnel Addresses. As previously noted, intra-node cluster communications by default use IPv6. For more information on IPv6, please review the following:
Microsoft Internet Protocol Version 6
In the next segment, I will discuss Implementing networks in support of Failover Clusters (Part 2). See ya then.
Chuck Timon Senior Support Escalation Engineer Microsoft Enterprise Platforms Support
config question concerning live migration(LM),csv and heartbeat(HB) networks if you don't mind
I have a hp c7000 with flex-10's and I'm trying to come up with the most reliable and easiest configuration (to ease deployment of 30-40 2-4 node clusters). My configuration is to run the flex-10's in active/active with two 10g nics teamed for combined bandwidth of 20g they'll also be FT. Storage is all FC. I want to run LM,CSV and HB on this one 20g path, does that sound ok? Cluster validation throws warnings but passes so technically I'm good but I just want an opinion from MS
I could not jump to the link you provided above for IPv6, but I was able to hit modified address ...
Also I was curious if ipv6 should be disabled if we are not yet ready for ipv6 from our networking standpoint.
ichoudhury: I fixed the link, thanks for the heads up. No, you do not need to disable IPV6. Cluster uses IPV6 under the covers to determine best route(s) between cluster nodes.
tonyr08: Config sounds fine. What's the validation warnings you are seeing?
I notice that you say that the cluster uses IPv6 under the covers for communication. I have been attempting to create a three node R2 cluster. I have assigned static IPv4 addresses as well as static FD00::/64 addresses to all NICs(8 total per machine, 4 dedicated to iSCSI, 2 dedicated to cluster, 1 dedicated to Hyper-V, and 1 shared between Hyper-V and Cluster).. Everything passes validation with the exception of a warning that my iSCSI SAN NICs have addresses on the same subnet(acceptable for MPIO).
I have opened up the firewall rules that the clustering service installation creates to all static IPv4 and IPv6 addresses on the nodes.
Still, the "forming cluster" step stalls and fails with a timeout error. Firewall logging reveals that attempts are being made to connect between nodes using the FE80:: IPv6 address of the cluster virtual adapter on the nodes. From node1's FE80:: to node2's FE:80::, etc...
I have even gone as far as using netsh to assign a fixed FD00:: address to the virtual adapter and opening the firewalls to that but it still uses with the FE80:: address.
If it is indeed tunneling, why do I see these connection attempts? Can you tell me if the FE80:: address assigned to the virtual adapter is constant or does it randomize like is required in RFC 3041? If it is constant, I may be able to work with that.
Opening up the firewalls to all FE80::/64 addresses is a least desireable action since it is likely that other less controlled machines will be located on the same FE80:: .link
When two different IP V4 address such as 192.168.1.11 / 255.255.255.0 and 10.10.1.11 / 255.255.255.0 are associated to a network adapter, only 192.168.1.0 is proposed when creating the cluster or a client endpoint, whatever the order is in the IP V4 properties, whatever the gateway is (192.168.1.1 or 10.10.1.1).
Any workaround to have 10.10.1.0 proposed ?
My Failover Cluster Virtual Adapter has the mac address of my iscsi card (ndis 0) instead for my cluster adapter is there anthing i can do?
"When two different IP V4 address such as 192.168.1.11 / 255.255.255.0 and 10.10.1.11 / 255.255.255.0 are associated to a network adapter, only 192.168.1.0 is proposed when creating the cluster or a client endpoint, whatever the order is in the IP V4 properties, whatever the gateway is (192.168.1.1 or 10.10.1.1).
Any workaround to have 10.10.1.0 proposed ? "
I have found in a lab that using 10.10.1.0/24 subnet causes problem on the network adapter so in server manager that network will not show its IP address rather it will state (Multipul IP Addresses) even when there is only one 10.10.1.254 set on the adapter. As soon as i change the address out of the 10.10.1.0/24 subnet the problem is solved. Also with the 10.10.1.254 IP address on that adapter in RRAS the interface shows an APIPPA address 169.x.x.x rather than the 10.10.1.254 associated with the interface again If you change to 10.11.1.0/24 it works fine i have found or other IP subnet.
HI Cluster Guru,
I have a question regarding the cluster's behavior:
If the Public Network is down let's say 8 hours due to the maintenance of the connected siwtch or the firewall, what is the cluster's reaction to this situation? all resources turn in failed state or remain intact? or others?
Hi can you advise me if dedicated hardware is required for heartbeat in a multisite cluster deployment for SQL2008R2? Or is this routed over the private LAN as it is unicast not broadcast?
What is the specification of tunnel between the cluster nodes' Microsoft Failover Cluster Virtual Adapters as we don't see any TCP connection between the nodes on the firewall in the Multisubnet MNS with Fileshare witness cluster.