Abstract: With the introduction of new features in Lync Server 2013, IT administrators and partners can provide users a rich unified communications experience that is highly resilient to single points of failure.   However, failures can – and do – happen, so the product enables a set of recovery services that allow for minimal data loss and swift re-enablement of services in the cases of server or datacenter outages.

High Availability

A system is considered high available if it can tolerate the loss of one or more of its subcomponents and still provide service.  For Lync Server 2013, high availability is achieved by a number of methods, in particular are two: the replicated distribution of user sets and the user data across multiple Front End servers in a pool and the mirroring of Back End servers via SQL Mirroring (preferred) or SQL Clustering**.  These two functions can ensure that a potential failure of any one Front End or Back End server (including the respective storage) can no longer present a single point of failure for the entire system.

Since these features are dependent upon the presence of multiple servers within a single pool to ensure both Front End and Back End components remain available, Standard Edition - which is an all-in-one instance of Lync Server – offers no high availability.  By default, it is a single point of failure as the mechanisms for replicating data and enabling the recovery of all services (in an automated fashion) are not available if the server experiences failure.  When a Standard Edition server fails, effectively all the Front Ends and Back Ends in that pool fail with it.

Variations in deployment configuration can determine where the solution provides automated resiliency and where it will need manual intervention.  If manual intervention is required, the entire solution cannot be considered “highly available” as users would experience a service interruption until an administrator invoked whatever manual process is required. Additionally, consideration should be given to things like server maintenance and other server roles: maintenance on a Lync Server in a two Front End pool eliminates any availability during the maintenance window, and without a SQL Witness there can be no automated failover and failback for the backend SQL Mirror.

** While SQL Clustering is now supported by Lync Server 2013, it should be noted that SQL Mirroring – which can be configured and managed by Lync Server 2013 – is the preferred solution.  For more on SQL Clustering support, see Database Software Support.

 

Lync Server Edition

Configuration

High Availability

Standard

Single Server

None

Standard

Paired SE pools (in data center)

Automatic for Resiliency Mode*

Enterprise

Single FE, Single BE

None

Enterprise

Single FE, Paired BEs (SQL Mirror)

None

Enterprise

Single FE, Paired BEs (SQL Mirror) + Witness

Automated Backend Failover only

Enterprise

Two FEs, Single BE

HA for Lync only w/o BE failure

Enterprise

Two FEs, Paired BEs (SQL Mirroring)

HA for Lync only w/o BE failure

Enterprise

Two FEs, Paired BEs (SQL Mirror) + Witness or SQL Cluster

Full HA during non-maintenance

Enterprise

Three+ FEs, Paired BEs (SQL Mirror) + Witness or SQL Cluster

Full HA

Table 1 - High Availability matrix by Deployment Configuration

*The Lync client will eventually utilize a backup registrar for voice if so configured, but some delay should be expected between lost connection to the home pool and the successful retry. See Planning for Central Site Resiliency in the TechNet Library for more information on configuring registrar intervals.

 

Disaster Recovery

Service outages are still possible when there is a hardware failure affecting an entire pool (Standard Edition server failure, network appliance, server rack, etc.) or when there is a location based challenge (such as a network outage in a particular datacenter).  In these cases, re-establishing these services quickly and with minimal data loss is the focus of disaster recovery planning.  Lync Server 2013 supports two disaster recovery functions via “pool pairing”: site resiliency and pool failover.

In site resiliency, users of one Lync 2013 pool can be configured to automatically connect to a backup pool for resiliency mode services (a subset of full production features) when their own pool is unavailable.  This period of unavailability is configurable now for both basic SIP connectivity as well as Voice services (see footnote in Table 1 for more).  For pool failover, users are manually moved from one pool to another (failover) and then back (failback).  While there is no automation either failover or failback, Lync Server 2013 introduces data replication between paired pools during regular (non-disaster) service. This real-time persistent data replication enables a faster recovery of services with minimal risk of data loss in the event of a site (datacenter) failure.

Lync Server Edition

Configuration

Recovery Enabled

Standard

Paired SE Pools

Site Resiliency (Automated) and Pool Failover (RTO/RPO of 30min after manual initiation)

Enterprise

Paired EE Pools

Site Resiliency (Automated) and Pool Failover (RTO/RPO of 30min after manual initiation)

Table 2 – Site Resiliency by Product Edition


Note: Please note that while SQL Clustering is now supported, Metropolitan Site Resiliency remains unsupported for Lync Server 2013.  All the nodes in a SQL Cluster serving a Lync pool – as well as the associated Front End servers – should be deployed within the same physical site represented within Topology Builder.

 

Configuration and Considerations

From an overall solution standpoint, there are “best practices” about how to pair pools – such as keeping pairs of only same editions (EE pools paired with EE pools, SE pools with SE pools), platforms (hardware paired with hardware, virtual paired with virtual), etc.  Furthermore, it is recommended to pair pools within geographic regions to mitigate challenges with performance across WANs.  When a set of users are failed over from one pool to another, their conferences are hosted on the new pool until they are failed back.  If the failover is from one continent to another, all users joining the conference - even if local to each other - will traverse the WAN to join the conference hosted in the failover pool.  Since elements like Call Admission Control settings and Direct Inward Dial (DID) numbers are tied to pools and are not easily transferred from one region to another (such as from North America to Europe), even an organization with robust WAN links should consider such a deployment carefully. 

Finally, there are often users homed on Survivable Branch Appliances (SBAs).  These are often remote office locations, and like Lync 2010, SBAs can be paired to a Lync pool in 2013 for failover.  Users homed on the SBAs can, in the event of a SBA failure, have their clients redirect to the Lync pool for many services:

Users

Configuration

Resiliency Achieved

Homed on SBA

SBA paired with pool, both functional

All Services

Homed on SBA

SBA paired with pool, pool fails

Resiliency Mode

Homed on SBA

SBA paired with pool, SBA fails

All Services

Table 3 –Resiliency with SBAs

*While SBAs can be paired to a Lync pool, they are not capable of utilizing pool failover services.  So if an SBA is paired with Pool A which is also paired with Pool B, and Pool A fails – users of Pool A will redirect to the backup registrar of Pool B but SBA users will not.