GD Bloggers

This is the blog site for Microsoft Global Delivery Communities focused in sharing the technical knowledge about devices, apps and cloud.
Follow Us On Twitter! Subscribe To Our Blog! Contact Us

Microsoft Windows Multi-Site Failover Cluster Best Practices

Microsoft Windows Multi-Site Failover Cluster Best Practices

  • Comments 7
  • Likes

Windows Server 2012 Multi-Site Failover Cluster is one of High Availability and disaster recovery solutions, although Windows Server 2012 Multi-Site Failover Cluster installation is very straightforward similar to single site Failover Cluster, however there is no consolidated public documentation that describe Multi-Site Failover Cluster in details showing best practices or implementation recommendations, so in this post I will cover some of Microsoft Windows Multi-Site Failover Cluster basics in addition to design/implementation best practices came from my practical experience.

Multi-Site Failover Cluster Basics:

Microsoft Multi-Site Failover Cluster basically is a group of Cluster Nodes distribution through multiple sites, each site Cluster Nodes are connected to local SAN Storage in the same site, while replication between two SAN Storage from each site is handled using SAN-to-SAN replication technology that can be Hardware replication or Software replication,

Multi-Site Failover Cluster can be used for SQL Multi-Site Cluster or Hyper-V Multi-Site Cluster.

image

Multi-Site Failover Cluster Challenges:

There are some challenges with Multi-Site Failover Cluster as follows:

+ Multi-Site Failover Cluster fully depends on SAN-to-SAN replication that is owned by Storage team in most cases, so if SAN-to-SAN replication is not configured properly Multi-Site Failover Cluster will not be functional specially in case of failover to Disaster Recovery site.

+ Multi-Site Failover Cluster depends on multiple Hardware components like Servers Hardware, Host Bus Adapter (HBA), Multi-Path IO (MPIO)…etc. so if drivers versions or Firmware level does not follow the Hardware vendor recommendation for Multi-Site Failover Cluster, then Multi-Site Failover Cluster may shows unexpected behavior.

When decide to select Multi-Site Failover Cluster?

Although there are some challenges in Multi-Site Failover Cluster, but still there are some scenarios that require Multi-Site Failover Cluster, these scenarios can be consolidated into the two main scenarios below:

+ Automatic Failover is Required (need to think how will automate storage failover)

o To reduce downtime as possible.

o To provide faster disaster recovery.

+ Protect against loss of entire location

o In case application does not have native replication technology like what Exchange 2010 or later can provide (for example Cluster Continues Replication).

o If application does not support SQL Always-On for its backend database like in SharePoint 2010 or System Center Configuration Manager 2012.

Multi-Site Failover Cluster Best Practices:

Best Practices for Multi-Site Failover Cluster can be consolidated into five different areas as follows:

Design & Implementation Best Practices:

+ Be sure that customer already has SAN-to-SAN replication technology in-place, because new investment in this area can be very high.

+ Involve Storage team while designing and while implementing the Multi-Site Cluster solution, to mitigate any risk related to supportability for the existing Hardware/Storage with Windows Failover Cluster in addition to readiness for the Multi-Site Failover Cluster Storage requirements.

+ Share all implementation Storage requirements early with customer storage team, as it is it always take time from storage team to prepare the storage requirements for Multi-Site Failover Cluster.

+ As in normal Failover Cluster implementation you should run the cluster verification and be sure that no errors reported before continue in Cluster installation.

Hardware & Storage Best Practices:

+ Be Sure that existing Hardware (SAN, Servers, HBAs, Network…etc.) support Windows 2012 Clustering, not only from Microsoft side but from Hardware and Storage vendor side as well.

+ Follow Hardware vendor recommendation regarding drivers versions and Firmware Level required for Windows Failover Cluster.

+ SAN storage vendor (or customer storage team) should own and fully responsible about the SAN-to-SAN replication which is a core component in multi-Site Failover Cluster.

+ It is very important that SAN-to-SAN replication and Failover Simulation should be verified while testing the implementation.

Network Best Practices:

+ Discuss with customer his network architecture and if he can provide stretched VLANs across sites that can reduce the Multi-Site Failover Cluster complexity against different VLANs.

+ Share all implementation networking requirements early with customer network team, especially if you are going to do changes in network design related to the required VLANs.

+ Consider encryption over WAN.

Quorum Best Practices:

+ Use Node & File Share Witness (FSW) Quorum especially for even number of Cluster Nodes.

+ Host FSW in 3rd Site that has direct connection with both Cluster sites.

+ Avoid hosting FSW in a Cluster node or Virtual Machines in the same Cluster.

Hyper-V VM Configuration Best Practices:

+ In case of Hyper-V Multi-Site Failover Cluster, you should configure the sequence for Virtual Machine failover to allow Virtual machine to failover to Hyper-V hosts in the same site first, then to failover to secondary site Cluster nodes.

+ Be sure that all Multi-Site Failover Cluster nodes are configured as possible owners for each of the high available Virtual Machine.

References:

In below references you can find most of valuable Microsoft Documentations, Videos related to Multi-Site Failover Cluster.

+ Designing for a Clustered Service or Application in a Multi-Site Failover Cluster: http://technet.microsoft.com/en-us/library/dd197430.aspx

+ Setting up a Clustered Service or Application in a Multi-Site Failover Cluster – Checklist: http://technet.microsoft.com/en-us/library/dd197546.aspx

+ Requirements and Recommendations for a Multi-Site Failover Cluster: http://technet.microsoft.com/en-us/library/dd197575(v=ws.10).aspx

+ Hyper-V Multi-Site Failover Cluster Video: http://technet.microsoft.com/en-us/video/tdbe11-failover-clustering-amp-hyper-v-multi-site-disaster-recovery.aspx

Conclusion:

As a conclusion Windows Server Multi-Site Failover Cluster can provide a powerful high availability and disaster recovery in a single solution, and it is very important to consider Multi-Site Cluster challenges in addition to fulfill all Windows Server Multi-Site Cluster requirements and follow Best Practices, and Recommendations above to able to design and implement functional Multi-Site Failover Cluster.

If you have best practices or recommendations from your experience that can be added to the above list, please share so I can evaluate and add it to the post content (your name will be beside it Smile).

Comments
  • Meged, could you suggest SAN vendors/storage which are supporting Hyper-V 2012 Multisite Clustering with CSV onsite?

  • Hi Alex,

    You can find in the documentation below the list of H.W. supported for Windows Server 2012 (same can be found for 2012 R2):

    www.windowsservercatalog.com in the same time you should check with Storage vendor about his supported configuration to work with Windows Server 2012 Clustering especially multi-site cluster.

    If there is no investment in SAN storage and Fiber Channel, then you should think about Cluster Continues Availability which can reduce the cost and complexity of cluster a lot, more information can be found here blogs.technet.com/.../windows-server-2012-continuous-availability-file-server-feature.aspx

  • good

  • Hi Maged Ezzat, Thank for your great document. Sorry that I am newbie in clustering. My situation is I have two servers with two SAN storage in one sites. Is it the same way to use SAN-to-SAN replication for my solution? I just wonder how's the two SAN present it to server? thank you!

  • Hi Fung,
    it does not matter if the two SAN storages are in the same site or different site, however the question here why you need to distribute your cluster contents across multiple SAN storages in the same datacenter? considering the complexity that will be added here, while SAN itself should be redundant & highly available,
    regarding how SAN present LUNs from different SAN storage to Servers, if hardware vendor can provide layer2 networking and virtual SAN which mean LUN on each SAN can be virtualized making it as a single LUN for Failover Clustering, this need to be discussed with SAN vendor to check the virtual SAN capabilities, the other traditional way is that LUN from SAN1 presented to Cluster, while same LUN replicated to SAN2, and in case of disaster SAN Administrator should interact to switch to SAN2 LUN, and again this need to be verified with SAN storage vendor regarding supporting for Windows Server 2012 R2 multi-site cluster and specifically the architecture of multiple SAN with SAN-to-SAN replication,

    Hope that I replied your questions.

  • full disclosure: I am the Mgr (not System Admin or DBA)
    We have Windows2012 VM servers (3 clusters total 1 - 4 node cluster controlling AGP1,2,3,4 and 1 - 2 node cluster controlling AGP 5,6 and 1 -4 node cluster controlling the Stage environment AGS' ) running SQL2012. Our Quorum is NAS using CIFS (same Quorum for all clusters). We are experiencing issue where we lose connectivity to the SQL_Quorum. This loss of connectivity causes a fail-over.
    The SQL servers have Layer2 connectivity to the NAS (no firewall)
    We find the following errors in the logs:
    Event 1177, Failover Clustering
    Event 1135 , Failover Clustering
    Event 1564, Failover Clustering

    after dealing with this issue for he past couple months, my support teams are not sure what to check next.
    We have a change to update NIC drivers on the ESXi host servers scheduled...but any additional suggestions on why we continue to lose connectivity to the Quorum?
    Also, is this NAS using CIFS the best way to configure the Quorum ?

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment