Exchange Server 2013 Cumulative Update 1 (CU1) has been released and is now available for download! CU1 is the first release to use the servicing model introduced with Exchange 2013. CU1 includes new features, new functionality and bug fixes, including in the area of high availability. The announcement post on the Exchange Team Blog already has some great information on what’s new in CU1, but I wanted to augment that announcement with some additional details. Below is a list of some of the high availability-related changes in CU1. This is by no means an exhaustive list; just a list of some of the changes that we have made.
I first wrote about this issue back in June 2011. This is where the system displays an incorrect warning message when you are using a non-Exchange server as your witness server, even when you have configured things correctly. This issue was eventually fixed in Exchange 2010 Service Pack 2 RU5, but it didn’t make it’s way into Exchange 2013 RTM. Fortunately, the fix did make its way into CU1.
Exchange 2013 continues the innovation introduced in Exchange 2010 by including functionality that allows the system to self-recover from failures that affect resiliency or redundancy. In addition to the Exchange 2010 self-recovery behaviors, Exchange 2013 RTM includes additional behaviors for long I/O times, excessive memory consumption by the Microsoft Exchange Replication service (MSExchangeRepl.exe), and severe cases where threads can't be scheduled. For example, every 30 seconds, the Exchange Replication service heartbeats the crimson channel, as it is a required component for normal operations. If this heartbeat fails, an indication that the crimson channel is inaccessible for some reason, the Exchange Replication service self-recovers the server by forcibly rebooting the server, thereby triggering a server failover.
In addition to the behaviors in Exchange 2013 RTM, CU1 includes new behaviors:
Automatic reseed, or AutoReseed, is a feature that's the replacement for what is normally administrator-driven action in response to a disk failure, database corruption event, or other issue that necessitates a reseed of a database copy. When properly configured, AutoReseed is designed to automatically restore database redundancy after a disk failure by using spare disks that have been provisioned on the system.
CU1 includes numerous fixes to AutoReseed, including fixes for issues around AutoReseed not detecting spare disks correctly and AutoReseed not using detected spare disks. In addition, the following enhancements have been made to AutoReseed:
As a result of these and other changes, the workflow for AutoReseed in CU1 has changed. The primary input condition for the AutoReseed workflow is still a database copy that is in an Failed and Suspended (F&S) state for 15 consecutive minutes. When that condition is detected, the following AutoReseed workflow is initiated:
Once all retries are exhausted, the workflow stops. If, after 3 days, the database copy is still F&S, the workflow state is reset and it starts again from Step 1. This reset/resume behavior is useful (and intentional) since it can take a few days to replace a failed disk, controller, etc..
The Update-MailboxDatabaseCopy cmdlet includes some new parameters in CU1 that are designed to aid with automation of seeding operations. These parameters include:
The Set-DatabaseAvailabilityGroup cmdlet includes a new parameter named SkipDagValidation. It is used to bypass the validation of the DAG's quorum model and the health check on the DAG's witness during certain DAG configuration operations. While this parameter has some usefulness for us in Exchange Online (and that is why it was introduced), and while it is enabled for on-premises use, it won’t be of much use to on-premises environments. I’m only pointing it out because, as I said, it is enabled for on-premises use.
The Get-ServerHealth and Get-HealthReport cmdlets are used to get and process raw health set data from Managed Availability, the new monitoring and recovery framework used by the various components within Exchange. Get-ServerHealth can be used to view the various health sets and their current status. In Exchange 2013 RTM, the Get-HealthReport cmdlet consumed results from Get-ServerHealth to produce a summary rollup of health. But the way in which it was implemented made it very slow and inefficient.
With CU1, instead of piping Get-ServerHealth to Get-HealthReport, Get-HealthReport is now capable of reporting the consolidated results on its own, and it now takes an Identity parameter that enables you to specify a server instead of InputObject/InputEntries. Get-HealthReport also includes a new HealthSet parameter, which is used to return the health state for a group of monitors. However, to use a rollup group, a list of names must be pipelined to Get-HealthReport. Unfortunately, Get-HealthReport -Identity does not support an array of names, so our recommended way to do this is to simply get the list of DAG members and pipe that to Get-HealthReport. For example to display a rollup summary of transport health on members of a DAG, you would run:
(Get-DatabaseAvailabilityGroup DAG1).Servers | Get-HealthReport -RollupGroup -HealthSet HubTransport
There are a couple of changes for Get-ServerHealth in CU1; namely, two parameters have also been added:
Best Copy and Server Selection (BCSS) is the algorithm used by Active Manager in Exchange 2013 to select the best database copy to activate in response to a failover or a target-less switchover. In CU1, a change was made so that the Primary Active Manager (PAM) now keeps track of the number of active databases per server, so that during BCSS it can honor the value of MaximumActiveDatabases, if configured. The server holding the PAM role now keeps an in-memory state that tracks the number of active databases per server. When the PAM role moves or when the Exchange Replication service is restarted on the PAM, this information is rebuilt from the cluster database.
This change allows Active Manager to exclude servers that are already hosting the maximum amount of active databases when determining potential candidates for activation. Prior to this change, Active Manager would not evaluate whether a potential server candidate for activation was already at its configured active database limit. Thus, if such a server were selected for activation, the activation process would fail during the mount attempt, and a new server would have to be selected (if available). This scenario is now avoided as a result of this change.
Of course there are other changes in CU1 besides the above, so be sure to read the Release Notes and other appropriate documentation when everything is released.
Great Changes Scott. Thank you for the update.
Thanks Scott, your blog updates help make these updates and their impacts more understandable!
Any update on an Edge Transport server for Exchange 2013?
The Edge Transport server is not currently available in Microsoft Exchange Server 2013.
thank a lot Scott. More update in this CU1 on CAS ?