...building hybrid clouds that can support any device from anywhere
Continuing on from my Introduction and Fabric posts, we’re now going to start playing with (learning) Failover Clustering for our evaluation of Windows Server and System Center.
Note: This is NOT best practice deployment advice. This is “notes from the field”. This is how to get by with what we have.
And if I only had a single laptop and needed to learn about Failover Clustering, this is what I’d do:
If you’ve been following along, and have completed all the steps in my Introduction post and Fabric post, you have a laptop (or two, or three) dual booting into Windows Server with the Hyper-V role enabled. They are domain joined and have internet access.
If you know what “Wolfpack” was, then feel free to skip this part (you’re as old as me)
High Availability (unplanned downtime) of virtual machines was introduced with Windows Server 2003 R2 running Virtual Server 2005 SP1 and improved with Windows Server 2008 and Hyper-V.
Planned downtime of the hosts was initially enabled with Quick Migration of the virtual machines (save , move, restore) back with Virtual Server and the first release of Hyper-V and then improved with Live Migration (copy with no downtime using shared storage) with Windows Server 2008 R2. Windows Server 2012 gave us shared nothing live migration (copy a running VM onto another host with just a network cable in common). Windows Server 2012 R2 improves Live Migration further still.
This (picture) is Failover Clustering at its simplest. Both nodes have access to some Shared Storage, are communicating with each other over a Heart Beat network and give access to the clustered resources to the clients over the LAN.
To get your head around this, think Active-Passive (one node doing the work, with the other one waiting). Node 1 is running the mission critical workload. The data is on the shared storage. The workload publishes its name and IP address onto the LAN. The clients resolve the name to the IP address and get access to the data (via Node 1).
In the event of an unplanned failure of Node 1 (pulling the power cord), the remaining node detects the failure (no heartbeat). Node 2 will take ownership of the Shared Storage, will spin up the workload and will publish the name and IP address onto the network. Clients continue to access the clustered resource via Node 2. This can take a short time as the workload needs to start on the surviving node.
Planned downtime is easier (scheduled maintenance of one of the nodes). We simply “move” the clustered resources to one node, perform the maintenance and move them back. This can have zero impact on the workload.
Each node can be actively doing more than one thing and be passively waiting for the others to fail (File & Print – SQL & Exchange, etc). In a two-node configuration, each node has to have enough resources to run all the workloads (to cope with a failure of the other node).
As you add more nodes into the solution, you “normally” end up with a hot standby node (a server doing nothing but waiting). And as we can now have up to 64 nodes in a cluster, we “normally” have multiple standby nodes.
But also note that building a Failover Cluster using virtual machines is a best practice – they are known as Guest Clusters and address the issue of an operating system failure in one of the VMs and allow for scheduled maintenance (patching) of the guests.
So, for test & dev only (NOT for production):
The easiest way to do this is to use Windows Server 2012 R2, as it introduces a Shared VHDX feature. The other option is to use an iSCSI Target – I’ll mention this in part 2 of this post.
In production, each guest cluster node would be on a separate physical host and the shared VHDX file(s) on shared storage. We are going to “cheat” and get this working on a single laptop:
I’ll continue this post in a Part 2 later on.
Meantime, here’s some related “bedtime reading”:
Failover Clustering Overview
Deploy a Guest Cluster Using a Shared Virtual Hard Disk
Windows Server 2012 R2 Storage: Step-by-step with Storage Spaces, SMB Scale-Out and Shared VHDX (Physical)
Part 1 in this series is here: Introduction
Part 2 in this series is here: Fabric
Part 3 in this series is here: Failover Clustering (part 1)
Part 4 in this series is here: Failover Clustering (part 2)