Lync Server 2013 Preview introduces new high availability and disaster recovery features which are scalable and cost-effective for your organization’s mission-critical business needs. The two primary new features are the pairing of pools in geographically-dispersed sites to provide disaster recovery, and support for mirroring to improve Back End Server high availability.

Author: Chris Dragich, Microsoft Senior Technical Writer

Publication date: August 29, 2012

Product version: Lync Server 2013 Preview

Lync Server 2013 Preview introduces significant new high availability and disaster recovery features. To enable disaster recovery, you can pair together Front End pools at different sites. In the event of a disaster, you can then fail over users from the affected pool to the other pool, with minimum recovery time. Lync Server 2013 Preview also introduces support for server mirroring as an improved Back End database high-availability solution. This article provides an overview of these new features.

Disaster Recovery in Lync Server 2013 Preview

One of the most important and most requested new features in Lync Server 2013 Preview is a new scalable and cost-effective disaster recovery solution. If you have multiple data centers, you can deploy multiple Front End pools in separate sites to provide continuation of service in the event that one entire pool or site goes down.

To deploy disaster recovery, designate pairs of Front End pools across geographically dispersed sites. Each site contains a Front End pool which is paired with a corresponding Front End pool in another site. Both pools in a pair are active, and the new Lync Server Backup Service replicates data to keep the pools synchronized. In addition to providing disaster recovery ability, two paired pools serve as the backup Registrars for each other.

Figure 1. A pair of Front End pools, located at separate geographical sites.

There is no restriction on the distance between two data centers that have paired pools. We recommend using two data centers in the same world region, with high-speed links between them that are separated enough to avoid a single disaster hitting both at the same time. Having two data centers across world regions is possible, but could incur higher data loss due to latency in data replication.

When planning which pools to pair, you should follow these recommended best practices:

  • Pair Enterprise Edition pools only with other Enterprise Edition pools, and Standard Edition pools only with other Standard Edition pools.
  • Pair physical pools only with other physical pools, and pair virtual pools only with other virtual pools.

Each pool in a pair should have the capacity to serve all users from both pools in the event of a disaster.

The solution also supports the Central Management Store. If one pool in a pair contains the Central Management Store, a backup Central Management store database is created in the backup pool, and Central Management store services are installed in both pools. At any point in time, one of the two Central Management store databases is the active master, and the other is a standby. The content is replicated by the Backup Service from the active master to the standby.

Failover and Failback

Disaster recovery procedures, both failover and failback, are manual. If there is a disaster, the administrator manually invokes the failover procedures using Lync Server Management Shell cmdlets.

Recovery Time for Failover and Failback

For pool failover and pool failback, the engineering target for recovery time objective (RTO) is 30 minutes. This is the time required for the failover process to happen, after administrators have determined there was a disaster and initiated the failover procedures. It does not include the time for administrators to assess the situation and make a decision, nor does it include the time for users to sign in again after failover is complete.

For pool failover and pool failback, the engineering target for recovery point objective (RPO) is 30 minutes. This represents the time measure of data that could be lost due to the disaster, due to replication latency of the Backup Service. For example, if a pool goes down at 10:00 A.M., and the RPO is 30 minutes, data written to the pool between 9:30 A.M. and 10:00 A.M. might not have replicated to the backup pool, and would be lost.

The RTO and RPO numbers assume that the two data centers are located within the same world region with high-speed, low-latency transport between the two sites. These numbers are measured for a pool with 40,000 concurrently active users and 200,000 users enabled for Lync. They also assume the failover occurs during a time of average usage, with no backlog in data replication.

User Experience During Failover

If a pool fails, and failover is invoked, all users of the affected pool are forced to sign out and then sign into the backup pool. For a brief period during the failover process, users who sign into the backup pool may be in resiliency mode. After the failover is complete, all users can get all services from the backup pool.

Most user sessions are disrupted when a pool fails, and the user must re-establish those sessions after failover to continue. The exceptions are peer-to-peer voice and video calls, which should continue uninterrupted because of Lync Server’s voice resiliency features.

Users are not rehomed during failover or failback. Users who are homed on a pool that fails are temporarily serviced by the backup pool. When the home pool is restored, an administrator can fail back users to their original home pool.

Backup Service

Backup Service is a new feature in Lync Server 2013 Preview, designed to support the disaster recovery solution. It is installed on a Front End pool only when you pair the pool with another Front End pool.

The Backup Service is active on one Front End Server in each pool, with the other Front End Servers in the pool serving as standbys. It uses a cookie-based replication process.

When user or conference data in Pool A is updated, the active Front End Server in Pool A sends these updates to Pool B.

When Pool B receives the changes, it imports them and then sends an acknowledgement cookie to Pool A.

When Pool A receives the cookie, it checks to see whether there have been any new changes made in Pool A since that data was sent. If there are new changes, Pool A immediately sends these latest changes to Pool B. Pool B imports the changes and sends another cookie. This way, there is a constant pipeline of data replication during times of frequent changes.

If Pool A receives a cookie and there have not been new changes, then the Backup Service in Pool A waits for a sync interval duration before again checking for changes. The default value for the sync interval is two minutes.

Additionally, when the Backup-CsPool or Invoke-CsPoolFailover cmdlets are run, they trigger the Backup Service to check for changes and send them to the paired pool.

The same process is simultaneously running to replicate changes from Pool B to Pool A as well.

Mirroring Support for Back End Server High Availability

Lync Server 2013 Preview also adds support for the use of synchronous mirroring for your Back End databases. Setting up mirroring is optional, and is fully supported by Topology Builder.

When you deploy server mirroring, all Lync Server databases in the pool are mirrored, including the Central Management store, if it is located in this pool, as well as the Response Group application database and the Call Park application database, provided those applications are running in the pool.

With SQL mirroring, you do not need to use shared storage for the servers. Each server keeps its copy of the databases in local storage.

You may choose to deploy SQL mirroring with or without a witness. We recommend using a witness because it enables automatic failover of the Back End Server. Otherwise, an administrator must manually invoke failover. Note: even if a witness is deployed, an administrator can manually invoke Back End Server failover, if necessary.

If you use a witness, you can use a single witness for multiple pairs of Back End Servers. There is no strict 1:1 correspondence between witnesses and pairs of Back End Servers. Deployments that use a single witness for multiple pairs of Back End Servers are slightly less resilient than topologies with a separate witness for each Back End Server pair.

Recovery Time for Automatic Back End Server Failover

For automatic Back End failover, the engineering target for recovery time objective (RTO) is 5 minutes. Because of the synchronous SQL mirroring, we do not anticipate data loss during Back End Server failures except in rare occasions when both the Front End Servers and the Back End Server go down simultaneously while data is being moved between the servers. The engineering target for recovery point objective (RPO) is 5 minutes.

Summary

The new disaster recovery feature adds support for one of the most-requested features from past versions of Lync Server. It is a scalable solution with a strong RTO and RPO. Additionally, the new server mirroring support removes the previous dependency on SQL clustering and SAN-based shared storage solutions for Back End Server high availability.

Additional Information

To learn more, check out the following:

Lync Server 2013 Preview Articles

Lync Server Resources

We Want to Hear from You

Keywords: disaster recovery; high availability; HADR; pool pairing; backup service; mirroring