A blog by Jose Barreto, a member of the File Server team at Microsoft.
All messages posted to this blog are provided "AS IS" with no warranties, and confer no rights.
Information on unreleased products are subject to change without notice.
Dates related to unreleased products are estimates and are subject to change without notice.
The content of this site are personal opinions and might not represent the Microsoft Corporation view.
The information contained in this blog represents my view on the issues discussed as of the date of publication.
You should not consider older, out-of-date posts to reflect my current thoughts and opinions.
© Copyright 2004-2012 by Jose Barreto. All rights reserved.
Follow @josebarreto on Twitter for updates on new blog posts.
Increasing availability is a key concern with computer systems. With all the consolidation and virtualization efforts under way, you need to make sure your services are always up and running, even when some components fail. However, it’s usually hard to understand the details of what it takes to make systems highly available (or continuously available). And there are so many options…
In this blog post, I will describe four principles that cover the different requirements for Availability: Redundancy, Entanglement, Awareness and Persistence. They apply to different types of services and I’ll provide some examples related to the most common server roles, including DHCP, DNS, Active Directory, Hyper-V, IIS, Remote Desktop Services, SQL Server, Exchange Server, and obviously File Services (I am in the “File Server and Clustering” team, after all). Every service employs different strategies to implement these “REAP Principles” but they all must implement them in some fashion to increase availability.
Note: A certain familiarity with common Windows Server roles and services is assumed here. If you are not familiar with the meaning of DHCP, DNS or Active Directory, this post is not intended for you. If that’s the case, you might want to do some reading on those topics before moving forward here.
Redundancy – There is more than one of everything
Availability starts with redundancy. In order to provide the ability to survive failures, you must have multiple instance of everything that can possibly fail in that system. That means multiple servers, multiple networks, multiple power supplies, multiple storage devices. You should be seeing everything (at least) doubled in your configuration. Whatever is not redundant is commonly labeled a “Single Point of Failure”.
Redundancy is not cheap, though. By definition, it will increase the cost of your infrastructure. So it’s an investment that can only be justified when there is understanding of the risks and needs associated with service disruption, which should be balanced with the cost of higher availability. Sadly, that understanding sometimes only comes after a catastrophic event (such as data loss or an extended outage).
Ideally, you would have a redundant instance that is as capable as your primary one. That would make your system work as well after the failure as it did before. It might be acceptable, though, to have a redundant component that is less capable. In that case, you’ll be in a degraded (although functional) state after a failure, while the original part is being replaced. Also keep in mind that, these days, redundancy in the cloud might be a viable option.
For this principle, there’s really not much variance per type of Windows Server role. You basically need to make sure that you have multiple servers providing the service, and make sure the other principles are applied.
Entanglement – Achieving shared state via spooky action at a distance
Having redundant equipment is required but certainly not sufficient to provide increased availability. Once any meaningful computer system is up and running, it is constantly gathering information and keeping track of it. If you have multiple instances running, they must be “entangled” somehow. That means that the current state of the system should be shared across the multiple instances so it can survive the loss of any individual component without losing that state. It will typically include some complex “spooky action at a distance”, as Einstein famously said of Quantum Mechanics.
A common way to do it is using a database (like SQL Server) to store your state. Every transaction performed by a set of web servers, for instance, could be stored in a common database and any web server can be quickly reprovisioned and connected to the database again. In a similar fashion, you can use Active Directory as a data store, as it’s done by services like DFS Namespaces and Exchange Server (for user mailbox information). Even a file server could serve a similar purpose, providing a location to store files that can be changed at any time and accessed by a set of web servers. If you lose a web server, you can quickly reprovision it and point it to the shared file server.
If using SQL Server to store the shared state, you must also abide by the Redundancy principle by using multiple SQL Servers, which must be entangled as well. One common way to do it is using shared storage. You can wire these servers to a Fibre Channel SAN or an iSCSI SAN or even a file server to store the data. Failover clustering in Windows Server (used by certain deployments of Hyper-V, File Servers and SQL Server, just to name a few) levarages shared storage as a common mechanism for entanglement.
Peeling the onion further, you will need multiple heads of those storage systems and they must also be entangled. Redundancy at the storage layer is commonly achieved by sharing physical disks and writing the data to multiple places. Most SANs have the option of using dual controllers that are connected to a shared set of disks. Every piece of data is stored synchronously to at least two disks (sometimes more). These SANs can tolerate the failure of individual controllers or disks, preserving their shared state without any disruption. In Windows Server 2012, Clustered Storage Spaces provides a simple solution for shared storage for a set of Windows Servers using only Shared SAS disks, without the need for a SAN.
There are other strategies for Entanglement that do not require shared storage, depending on how much and how frequently the state changes. If you have a web site with only static files, you could maintain shared state by simply provisioning multiple IIS servers with the exact same files. Whenever you lose one, simply replace it. For instance, Windows Azure and Virtual Machine Manager provide mechanisms to quickly add/remove instances of web servers in this fashion through the use of a service template.
If the shared state changes, which is often the case for most web sites, you could go up a notch by regularly copying updated files to the servers. You could have a central location with the current version of the shared state (a remote file server, for instance) plus a process to regularly send full updates to any of the nodes every day (either pushed from the central store or pulled by the servers). This is not very efficient for large amounts of data updated frequently, but could be enough if the total amount of data is small or it changes very infrequently. Examples of this strategy include SQL Server Snapshot Replication, DNS full zone transfers or a simple script using ROBOCOPY to copy files on a daily schedule.
In most cases, however, it’s best to employ a mechanism that can cope with more frequently changing state. Going up the scale you could have a system that sends data to its peers every hour or every few minutes, being careful to send only the data that has changed instead of the full set. That is the case for DNS incremental zone transfers, Active Directory Replication, many types of SQL Server Replication, SQL Server Log Shipping, Asynchronous SQL Server Mirroring (High-Performance Mode), SQL Server AlwaysOn Availability Groups (asynchronous-commit mode), DFS Replication and Hyper-V Replica. These models provide systems that are loosely converging, but do not achieve up-to-the-second coherent shared state. However, that is good enough for some scenarios.
At the high end of replication and right before actual shared storage, you have synchronous replication. This provides the ability to update the information on every entangled system before considering the shared state actually changed. This might slow down the overall performance of the system, especially when the connectivity between the peers suffers from latency. However, there’s something to be said of just having a set of nodes with local storage that achieve a coherent shared state using only software. Common examples here include a few types of SAN replication, Exchange Server (Database Availability Groups), Synchronous SQL Mirroring (High Safety Mode) and SQL Server AlwaysOn Availability Groups (synchronous-commit mode).
As you can see, the Entanglement principle can be addressed in a number of different ways depending on the service. Many services, like File Server and SQL Server, provide multiple mechanisms to deal with it, with varying degrees of cost, complexity, performance and coherence.
Awareness – Telling if Schrödinger's servers are alive or not
Your work is not done after you have a redundant entangled system. In order to provide clients with seamless access to your service, you must implement some method to find one of the many sources for the service. The awareness principle refers to how your clients will discover the location of the access points for your service, ideally with a mechanism to do it quickly while avoiding any failed instances. There a few different ways to achieve it, including manual configuration, broadcast, DNS, load balancers, or a service-specific method.
One simple method is to statically configure each client with the name or IP Address of two or more instances of the service. This method is effective if the configuration of the service is not expected to change. If it ever does change, you would need to reconfigure each client. A common example here is how static DNS is configured: you simply specify the IP address of your preferred DNS server and also the IP address if an alternate DNS server in case the preferred one fails.
Another common mechanism is to broadcast a request for the service and wait for a response. This mechanism works only if there’s someone in your local network capable of providing an answer. There’s also a concern about the legitimacy of the response, since a rogue system on the network might be used to provide a malicious version of the service. Common examples here include DHCP service requests and Wireless Access Point discovery. It is fairly common to use one service to provide awareness for others. For instance, once you access your Wireless Access Point, you get DHCP service. Once you get DHCP service, you get your DNS configuration from it.
As you know, the most common use for a DNS server is to map a network name to an IP address (using an A, AAAA or CNAME DNS record). That in itself implements a certain level of this awareness principle. DNS can also associate multiple IP addresses with a single name, effectively providing a mechanism to give you a list of servers that provide a specific service. That list is provided by the DNS server in a round robin fashion, so it even includes a certain level of load balancing as part of it. Clients looking for Web Servers and File Servers commonly use this mechanism alone for finding the many devices providing a service.
DNS also provides a different type of record specifically designed for providing service awareness. This is implemented as SRV (Service) records, which not only offer the name and IP address of a host providing a service, but can decorate it with information about priority, weight and port number where the service is provided. This is a simple but remarkably effective way to provide service awareness through DNS, which is effectively a mandatory infrastructure service these days. Active Directory is the best example of using SRV records, using DNS to allow clients to learn information about the location of Domain Controllers and services provided by them, including details about Active Directory site topology.
Windows Server failover clustering includes the ability to perform dynamic DNS registrations when creating clustered services. Each cluster role (formerly known as a cluster group) can include a Network Name resource which is registered with DNS when the service is started. Multiple IP addresses can be registered for a given cluster Network Name if the server has multiple interfaces. In Windows Server 2012, a single cluster role can be active on multiple nodes (that’s the case of a Scale-Out File Server) and the new Distributed Network Name implements this as a DNS name with multiple IP addresses (at least one from each node).
DNS does have a few limitations. The main one is the fact that the clients will cache the name/IP information for some time, as specified in the TTL (time to live) for the record. If the service is reconfigure and new addresses or service records are published, DNS clients might take some time to become aware of the change. You can reduce the TTL, but that has a performance impact, causing DNS clients to query the server more frequently. There is no mechanism in DNS to have a server proactively tell a client that a published record has changed. Another issue with DNS is that it provides no method to tell if the service is actually being provided at the moment or even if the server ever functioned properly. It is up to the client to attempt communication and handle failures. Last but not least, DNS cannot help with intelligently balancing clients based on the current load of a server.
Load balancers are the next step in providing awareness. These are network devices that function as an intelligent router of traffic based on a set of rules. If you point your clients to the IP address of the load balancer, that device can intelligently forward the requests to a set for servers. As the name implies, load balancers typically distribute the clients across the servers and can even detect if a certain server is unresponsive, dynamically taking it out of the list. Another concern here is affinity, which is an optimization that consistently forwards a given client to the same server. Since these devices can become a single point of failure, the redundancy principle must be applied here. The most common solution is to have two load balancers in combination with two records in DNS.
SQL Server again uses multiple mechanisms for implementing this principle. DNS name resolution is common, both statically or dynamically using failover clustering Network Name resources. That name is then used as part of the client configuration known as a “Connection String”. Typically, this string will provide the name of a single server providing the SQL Service, along with the database name and authentication details. For instance, a typical connection string would be: "Server=SQLSERV1A; Database=DB301; Integrated Security=True;". For SQL Mirroring, there is a mechanism to provide a second server name in the connection string itself. Here’s an example: "Server=SQLSERV1A; Failover_Partner=SQLSRV1B; Database=DB301; Integrated Security=True;".
Other services provide a specific layers of Awareness, implementing a broker service or client access layer. This is the case of DFS (Distributed File System), which simplifies access to multiple file servers using a unified namespace mechanism. In a similar way, SharePoint web front end servers will abstract the fact that multiple content databases live behind a specific SharePoint farm or site collection. SharePoint Server 2013 goes one step further by implementing a Request Manager service that can even be configure as a Web Server farm placed in front of the main SharePoint web front end farm, with the purpose of routing and throttling incoming requests to improve both performance and availability.
Exchange Server Client Access Servers will query Active Directory to find which Mailbox Server or Database Access Group contains the mailbox for an incoming client. Remote Desktop Connection Broker (formerly known as Terminal Services Session Broker), is used to provide users with access to Remote Desktop services across a set of servers. All these brokers services can typically handle a fair amount of load balancing and be aware of the state of the services behind it. Since these can become single point of failures, they are typically placed behind DNS round robin and/or load balancers.
Persistence – The one that is the most adaptable to change will survive
Now that you have redundant entangled services and clients are aware of them, here comes the greatest challenge in availability. Persisting the service in the event of a failure. There are three basic steps to make it happen: server failure detection, failing over to a surviving server (if required) and client reconnection (if required).
Detecting the failure is the first step. It requires a mechanism for aliveness checks, which can be performed by the servers themselves, by a witness service, by the clients accessing the services or a combination of these. For instance, Windows Server failover clustering makes cluster nodes check each other (through network checks), in an effort to determine when a node becomes unresponsive.
Once a failure is detected, for services that work in an active/passive fashion (only one server provides the service and the other remains on standby), a failover is required. This can only be safely achieved automatically if the entanglement is done via Shared Storage or Synchronous Replication, which means that the data from the server that is lost is properly persisted. If using other entanglement methods (like backups or asynchronous replication), an IT Administrator typically has to manually intervene to make sure the proper state is restored before failing over the service. For all active/active solutions, with multiple servers providing the same service all the time, a failover is not required.
Finally, the client might need to reconnect to the service. If the server being used by the client has failed, many services will lose their connections and require intervention. In an ideal scenario, the client will automatically detect (or be notified of) the server failure. Then, because it is aware of other instances of the service, it will automatically connect to a surviving instance, restoring the exact same client state before the failure. This is how Windows Server 2012 implements failover of file servers though a process called SMB 3.0 Continuous Availability, available for both Classic and Scale-Out file server clusters. The file server cluster goes one step further, providing a Witness Service that will proactively notify SMB 3.0 clients of a server failure and point them to an alternate server, even before current pending requests to the failed server time out.
File servers might also leverage a combination of DFS Namespaces and DFS Replication that will automatically recover from a failed server situation, with some potential side effects. While the file client will find an alternative file server via DFS Namespaces, the connection state will be lost and need to be reestablished. Another persistence mechanism in the file server is the Offline Files option (also known as Client Side Caching) commonly used with the Folder Redirection feature. This allows you to keep working on local storage while your file server is unavailable, synchronizing again when the server comes back.
For other services, like SQL Server, the client will surface an error to the application indicating that a failover has occurred and the connection has been lost. If the application is properly coded to handle that situation, the end user will be shielded from error message because the application will simply reconnect to the SQL Server using either the same name (in the case of another server taking over that name) or a Failover Partner name (in case of SQL Server Mirroring) or another instance of SQL Server (in case of more complex log shipping or replication scenarios).
Clients of Web Servers and other load balanced workloads without any persistent state might be able to simply retry an operation in case of a failure. This might happen automatically or require the end-user to retry the operation manually. This might also be the case of a web front end layer that communicates with a web services layer. Again a savvy programmer could code that front end server to automatically retry web services requests, if they are idempotent.
Another interesting example of client persistence is provided by an Outlook client connecting to an Exchange Server. As we mentioned, Exchange Servers implement a sophisticated method of synchronous replication of mailbox databases between servers, plus a Client Access layer that brokers connections to the right set of mailbox servers. On top of that, the Outlook client will simply continue to work from its cache (using only local storage) if for any reason the server becomes unavailable. Whenever the server comes back online, the client will transparent reconnect and synchronize. The entire process is automated, without any action required during or after the failure from either end users and IT Administrators.
Samples of how services implement the REAP principles
Now that you have the principles down, let’s look at how the main services we mentioned implement them.
I hope this post helped you understand the principles behind increasing server availability.
As a final note, please take into consideration that not all services require the highest possible level of availability. This might be an easier decision for certain services like DHCP, DNS and Active Directory, where the additional cost is relatively small and the benefits are sizable. You might want to think twice when increasing the availability of a large backup server, where some hours of down time might be acceptable and the cost of duplicating the infrastructure is significantly higher.
Depending on how much availability you service level agreement states, you might need different types of solutions. We generally measure availability in “nines”, as described in the table below:
You should consider your overall requirements and the related infrastructure investments that would give you the most “nines” per dollar.