As everyone knows by now, with the release of Lync Server 2013 we introduced much improved chat functionality, named Persistent Chat (pChat). Not only is it much improved, but it is also an actual role within the topology (no longer standalone application). One thing that I get questioned about a lot regarding pChat is High Availability and Disaster Recovery (HA-DR). Hopefully with this two part article I can help answer some of those questions.
Before we get into how to configure HA-DR, let’s first look at the basic architecture. Our design guidelines state that each pChat pool can have up to 8 servers (4 Active, 4 Standby) with each of the 4 Active servers supporting 20K concurrent users. Based on our default user model this totals 80K concurrent users per pChat deployment although we can have up to 150K users provisioned. Having multiple Persistent Chat Server pools does not give you more scale (you can still have only 80,000 concurrently connected users across all your Persistent Chat Server pools). The primary reason for supporting multiple Persistent Chat Server pools is to support regulatory concerns.
Some important points to keep in mind with pChat HADR:
Figure 1: Lync Server 2013 pChat Architecture
pChat High Availability (Single Datacenter)
Let’s start by looking at pChat High Availability within a single datacenter or centralized datacenter (In Figure 2 below). I have broken out all the services to specific layers. The services contribute to our overall HA solution as follows:
Figure 2: Lync Server 2013 pChat HA (Single Datacenter)
** Production pChat deployments should have a dedicated SQL Instance**
pChat Disaster Recovery (Multiple Datacenters)
Figure 3: Lync Server 2013 pChat Disaster Recovery (Multiple Datacenters)
All of our discussions about pChat Disaster Recovery are going to reference the Topology Diagram in Figure 3. As we did previously, let’s break each service out individually, but this time as it pertains to Disaster Recovery.
Figure 4: Stretched Pool configuration with Low Bandwidth\High Latency between datacenters
b. Stretched Pool configuration with High Bandwidth\Low Latency between datacenters – active pChat servers should be distributed evenly between datacenters. Note that there are 2 active pChat servers in each datacenter.
Figure 5:Stretched Pool configuration with High Bandwidth\Low Latency between datacenters
3. Backend Services – for pChat DR SQL log shipping is used to replicate backend data from one datacenter to the other. This provides an up-to-date replica of our MGC and MGCCOMP DBs across datacenters.
In part 2 of this article I am going to focus heavily on different DR failure scenarios including complete site outages (all services), pChat Server failover, FE Pool failure, setting active pChat servers, and Failback. This will include both individual steps for each scenario and detailed description of why we are performing each step. Stay tuned…
Understand HADR in Lync Server 2013 - link
Planning Front End Pool Pairing - link
SQL Log Shipping - link link
pChat Capacity Planning (User model) - link
pChat Architecture - link