This is the official blog of the Exchange Server Product Group. All content here is considered authoritative and supported by Microsoft, unless otherwise specified.
Would you like to suggest a topic for the Exchange team to blog about? Send suggestions to us.
Every second on every Exchange 2013 server, Managed Availability polls and analyzes hundreds of health metrics. If something is found to be wrong, most of the time it will be fixed automatically. But of course there will always be issues that Managed Availability won’t be able to fix on its own. In those cases, Managed Availability will escalate the issue to an administrator by means of event logging, and perhaps alerting if System Center Operations Manager is used in tandem with Exchange 2013. When an administrator needs to get involved and investigate the issue, they can begin by using the Get-HealthReport and Get-ServerHealth cmdlets.
Start with Get-HealthReport to find out the status of every Health Set on the server:
Get-HealthReport –Identity <ServerName>
This will result in the following output (truncated for brevity):
In the above example, you can see that that the ECP (Exchange Control Panel) Health Set is Unhealthy. And based on the value for MonitorCount, you can also see that the ECP Health Set relies on two Monitors. Let's find out if both of those Monitors are Unhealthy.
The next step would be to use Get-ServerHealth to determine which of the ECP Health Set Monitors are in an unhealthy state.
Get-ServerHealth –Identity <ServerName> –HealthSet ECP
This results in the following output:
As you can see above, both Monitors are Unhealthy. As an aside, if you pipe the above command to Format-List, you can get even more information about these Monitors.
Most Monitors are one of these four types:
The EacSelfTestMonitor Probes along the "1" path, while the EacDeepTestMonitor Probes along the "4" path. Since both are unhealthy, it indicates that the problem lies on the Mailbox server in either the protocol stack or the store. It could also be a problem with a dependency, such as Active Directory, which is common when multiple Health Sets are unhealthy. In this case, the Troubleshooting ECP Health Set topic would be the best resource to help diagnose and resolve this issue.
Abram Jackson
Program Manager, Exchange Server
Beginning with Exchange 2007 the Exchange database has had an internal table called EventHistory. This table has been used to track the events upon which several of the assistants are based and for other short term internal record keeping. The way to query the table hasn’t been publicized before but it has a number of uses:
Events are kept in the EventHistory table for up to 7 days by default. You can check what your retention period is for all databases by running:
Get-mailboxdatabase | fl name,event* Name : MainDB EventHistoryRetentionPeriod : 7.00:00:00
There are a number of approaches to querying the table. Let’s start with a script (please review my caveats before actually running the script) and review the data that is displayed. The script is:
Add-PSSnapin Microsoft.Exchange.Management.Powershell.Support $db = (get-mailbox <user alias>).database $mb=(get-mailbox <user alias>).exchangeguid Get-DatabaseEvent $db -MailboxGuid $mb -resultsize unlimited | ? {$_.documentid -ne 0 -and $_.CreateTime -ge “<mm/dd/yyyy>”} | fl > c:\temp\EventHistory.txt
For the CreateTime specify the day of the event you are looking for. By default a maximum of 7 days are tracked. Depending on the date range selected and the activity in the mailbox the resulting file size starts at about 5KB and I have seen it rise to nearly 1GB. You can also replace the “| fl > c:\temp\EventHistory.txt” with “| export-csv c:\temp\EventHistory.csv”. I am using the FL output because it is easier for illustration purposes.
Inside the EventHistory.txt file will be events like this one (this one is a bulk delete of emails using OWA):
Counter : 15328155 CreateTime : 1/28/2013 9:46:16 PM ItemType : MAPI_MESSAGE EventName : ObjectMoved Flags : None MailboxGuid : d05f83c1-255c-42ae-b74f-1ac3329b306a ObjectClass : IPM.Note ItemEntryId : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000BC9C9BA4 2124CD4F896E8915C86B2BD0000041B6E6570000 ParentEntryId : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0100BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000 OldItemEntryId : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027BF0000BC9C9BA4 2124CD4F896E8915C86B2BD0000041B6D6260000 OldParentEntryId : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0100BC9C9BA42124CD4F896E8915C86B2BD00000006027BF0000 ItemCount : 0 UnreadItemCount : 0 ExtendedFlags : 2147483648 ClientCategory : WebServices PrincipalName : Contoso\TestUser PrincipalSid : S-1-5-21-915020002-1829042167-1583638127-1930 Database : Mailbox Database 1858470524 DocumentId : 10876
The EventName shows what was done with the object. End user deletes will be listed as moves. When you delete an item it is moved to either Deleted Items or to the Recoverable Items subtree
I highlighted the ItemEntryID because that ties directly to the Item you need to locate. The subject and other human readable properties are not included in this table. The ItemEntryID is the database engine’s way of uniquely identifying each item. You can use this to search the mailbox in MFCMAPI and get properties like Subject, From, To, etc.
Flags will often show values like SearchFolder. Many events flagged as being related to search folders or folders are not going to be interesting to your investigations. If you are researching the fate of a deleted item they can be ignored.
ClientCategory is the type of client that requested the operation. In this case webservices means that OWA was used to remove the item as part of a bulk operation conducted against a 2010 mailbox. If it was deleted individually then Exchange 2010 would list OWA here. The way ClientCategories are tracked in Exchange 2013 is a little different; you should see OWA for all End User deletes through that tool.
PrincipalName and PrincipalSid give you the identity of the account that was passed to the information store when the operation was requested. At the time of writing these are not displayed by Exchange 2013.
So – we have an output file. What do we do with it? The easy uses for the file (once it is imported into your favorite data analysis tool) at this time are:
In our output the ItemEntryID is not immediately useful. To find out what the ItemEntryID in each record actually is we need to use MFCMAPI (steps related to MFCMAPI are at the end of this blog). Once you are in MFCMAPI you can go to the Tools menu, select “Entry ID” and then “Open given entry ID”. In the dialog that appears paste in the ItemEntryId or the OldItemEntryId that you want to investigate. When you click OK MFCMAPI will take you to the item you specified (if it is still in the mailbox). Once MFCMAPI takes you to the mail item you will see the Subject, From, To, Creation date and other meaningful properties. You will also see there is a property called PR_ENTRYID. PR_ENTRYID is the MAPI name for ItemEntryID. This field is our link between the representation of the data in our PowrShell cmdlet and in the more human readable presentation in MFCMAPI.
Pulling ItemEntryIDs from the PowerShell output and looking them up one at a time in MFCMAPI may be a little too tedious for most Exchange administrators. If you have more than a handful of items you want to check (to see if they are useful and meaningful) it will take a long time to locate them all.
The alternative is to start in MFCMAPI. If you can find the item you want there by looking at the subject line, date or other properties you can use the content of the PR_ENTRYID field in MFCMAPI to modify the Get-DatabaseEvent query to pull up the history for just that item. To do this you need access to either a restored copy of the mailbox in a lab or the item of interest must still be in the mailbox (possibly in deleted items or recoverable items). Here is a sample of how the get-databaseevent cmdlet would be used if you have the PR_ENTRYID:
Get-DatabaseEvent $db -MailboxGuid $mb -resultsize unlimited | ? {$_.ItemEntryID -eq “000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000BC9C9 BA42124CD4F896E8915C86B2BD0000041B6E6570000” –or $_.OldItemEntryId –eq “000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000BC9C9 BA42124CD4F896E8915C86B2BD0000041B6E6570000”} | export-csv c:\temp\SingleItemEventHistory.txt
Sometimes I have not been able to locate an item using this technique. If that happens it is useful to note that the PR_ENTRYID contains the ID of the mailbox, the folder and the item. For example here is the PR_ENTRYID of an item in the Inbox followed by the PR_ENTRYID of the Inbox itself:
000000006064986ABA58DF40A86C0C67E716264807004885B50069B1D04994374C02417D45A100000000324E00003DEF8F7 FFC1E3448B9D276F022E0E42D0000396D1B280000 000000006064986ABA58DF40A86C0C67E716264801004885B50069B1D04994374C02417D45A100000000324E0000
For the sake of comparison here are the PR_ENTRYIDs of two more folders in the same mailbox:
000000006064986ABA58DF40A86C0C67E716264801004885B50069B1D04994374C02417D45A10000000032510000 - deleted items folder 000000006064986ABA58DF40A86C0C67E716264801004885B50069B1D04994374C02417D45A100000000324B0000 - ipm_subtree folder
From this you should be able to get an idea of how the field is divided up by looking at where the repeated digits end. For the purpose of tracking down an individual item that may be in a different folder (because of multiple moves) we want to be able to isolate the portion of the PR_ENTRYID that is specific to the item and modify our PowerShell statement appropriately. The final statement would look like this:
Get-DatabaseEvent $db –MailboxGuid $mb -resultsize unlimited | ? {$_.ItemEntryID -like “*3DEF8F7FFC1E3448B9D276F022E0E42D0000396D1B280000” –or $_.OldItemEntryId –like “*3DEF8F7FFC1E3448B9D276F022E0E42D0000396D1B280000”} | export-csv c:\temp\SingleItemEventHistory.txt
At this point if we still can’t find the item we want then our last chances are to remove the $_.MailboxGuid from the conditions (meaning we will search all mailboxes in the database – a very expensive operation please review the caveats) or to search other databases in the organization (databases containing delegates of the current user would be the ones to start with). If the data still can’t be found you have either made an error or the records are no longer present. If the records are present you should see all actions taken on the item recently.
Caveats:
You can make the operation less expensive by lowering the number of records returned by Get-DatabaseEvent. We are already including the database and mailbox to look for. You can also add the EventNames and the StartCounter. The latter of these might be a little tricky. The StartCounter is an internal number that is specific to this table in the current database. You probably won’t know what counter value to use until you have already run a query and noted the counter values. This means StartCounter is mostly useful for reducing the impact of your second and subsequent queries of the same table in the same database.
Assuming you know a relevant StartCounter value here is an example of doing this:
Get-DatabaseEvent $db -MailboxGuid $mb –EventNames objectmodified, objectdeleted –StartCounter 15328155 -resultsize unlimited | ? {$_.documentid -ne 0 -and $_.CreateTime -ge “<mm/dd/yyyy>”} | fl > c:\temp\EventHistory.txt
The example above searches a mailbox on a particular database for the event types specified and ignores any rows with a lower counter value than specified. This smaller dataset is then passed to the PowerShell pipeline for additional filtering and is ultimately saved to a CSV file that you can import into your favorite analysis tool. If you prefer to conduct your analysis in PowerShell you also have the option of assigning the result of Get-DatabaseEvent to a PowerShell variable (just remember the variable and the PowerShell session will consume memory proportional to the resultset returned).
So how do you find the PR_ENTRYIDs I mentioned above in MFCMAPI?
You can download MFCMAPI from https://mfcmapi.codeplex.com.
1. We need an Outlook profile for the mailbox we are searching. That profile should NOT be configured for Cached mode. If you are doing this from your machine make sure you have Full Access to the mailbox of the user. You can then create a profile for that specific user.
2. Once you have the profile open MFCMAPI and Log on
3. Select the profile you created for Step 1. You will see a screen like this one:
4. Double-click the mailbox which will open a window showing you the mailbox details.
5. If you already know the ItemEntryID you want to open and inspect you can locate it with this menu option:
6. If you don’t have the ItemEntryID expand the Root Container, Recoverable Item and Top of Information Store. If you are trying to locate details on a deleted item look in the Deleted Items folder and the Recoverable Items folder (and it’s subfolders)
7. Double-click Deleted Items to open a window that looks like this one:
8. Click the item to fill in the lower half of the window with the properties
9. Locate the PR_EntryID property and double-click it
10. The Binary box contains the value of the PR_ENTRYID field that you can use to search the EventHistory table in the Store. If you locate this value with MFCMAPI first you can use it to limit the search as I described above. If you don’t have this value you can pull the full history and use the ItemEntryIDs as a basis to search MFCMAPI.
Thanks to Jesse Tedoff for the idea!
Chris Pollitt
Since the initial release of Log Parser Studio (LPS) there have been over 30,000 downloads and thousands of customers use the tool on a daily basis. In Exchange support many of our engineers use the tool to solve real world issues every day and in turn share with our customers, empowering them to solve the same issues themselves moving forward. LPS is still an active work in progress; based on both engineer and customer feedback many improvements have been made with multiple features added during the last year. Below is a short list of new features:
For those who create their own queries this is a real time-saver. We can now import from multiple XML files simultaneously only choosing the queries we wish to import from multiple query libraries or XML files.
The existing feature allowing searching of queries in the library is now context aware meaning if you have a completed query in the query window, the search option searches that query. If you are in the library it searches the library and so on. This allows drilling down into existing query results without having to run a new query if all you want to do is narrow down existing result sets.
All LP 2.2 Input and Output formats contain preliminary support in LPS. Each format has its own property window containing all known LP 2.2 settings which can be modified to your liking.
Custom parser support was added for most all Exchange logs. These are covered by the EEL and EELX log formats included in LPS which cover Exchange logs from Exchange 2003 through Exchange 2013.
I can't tell you how many times myself or another engineer spent lots of time creating the perfect query for a particular issue we were troubleshooting, forgetting to save the query in the heat of the moment and losing all that work. No longer! We now have the capability to log every query that is executed to a text file (Query.log). What makes this so valuable is if you ran it, you can retrieve it.
There are now over 170 queries in the library including new sample queries for Exchange 2013.
You can now export any query as a standalone PowerShell script. The only requirement of course is that Log Parser 2.2 is installed on the machine you run it on but LPS is not required. There are some limitations but you can essentially use LPS as a query editor/test bed for PowerShell scripts that run Log Parser queries for you!
The ability to submit a request to cancel a running query has been added which will allow you to cancel a running query in many cases.
There are now 23 Keyboard shortcuts. Be sure to check these out as they will save you lots of time. To display the short cuts use CTRL+K or Help > Keyboard Shortcuts.
There are literally hundreds of improvements and features; far too many to list here so be sure and check out our blog series with existing and upcoming tutorials, deep dives and more. If you are installing LPS for the first time you'll surely want to review the getting started series:
If you are already familiar with LPS and are installing this latest version, you'll want to check out the upgrade blog post here:
Additional LPS articles can be found here:
http://blogs.technet.com/b/karywa/
LPS doesn't require an install so just extract to the folder of your choice and run LPS.EXE. If you have the previous version of LPS and you have added your own custom queries to the library, be sure to export those queries as a backup before running the newest version. See the "Upgrading to LPS V2" blog post above when upgrading.
Kary Wall
We in the Exchange product group get this question from time to time. The first thing we ask in response is always, “What was the customer impact?” In some cases, there is customer impact; these may indicate bugs that we are motivated to fix. However, in most cases there was no customer impact: a service restarted, but no one noticed. We have learned while operating the world’s largest Exchange deployment that it is fantastic when something is fixed before customers even notice. This is so desirable that we are willing to have a few extra service restarts as long as no customers are impacted.
You can see this same philosophy at work in our approach to database failovers since Exchange 2007. The mantra we have come to repeat is, “Stuff breaks, but the user experience doesn’t!” User experience is our number one priority at all times. Individual service uptime on a server is a less important goal, as long as the user experience remains satisfactory.
However, there are cases where Managed Availability cannot fix the problem. In cases like these, Exchange provides a huge amount of information about what the problem might be. Hundreds of things are checked and tested every minute. Usually, Get-HealthReport and Get-ServerHealth will be sufficient to find the problem, but this blog post will walk you through getting the full details from an automatic recovery action to the results of all the probes by:
Every time Managed Availability takes a recovery action, such as restarting a service or failing over a database, it logs an event in the Microsoft.Exchange.ManagedAvailability/RecoveryActions crimson channel. Event 500 indicates that a recovery action has begun. Event 501 indicates that the action that was taken has completed. These can be collected via the MMC Event Viewer, but we usually find it more useful to use PowerShell. All of these Managed Availability recovery actions can be collected in PowerShell with a simple command:
$RecoveryActionResultsEvents = Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ManagedAvailability/RecoveryActionResults
We can use the events in this format, but it is easier to work with the event properties if we use PowerShell’s native XML format:
$RecoveryActionResultsXML = ($RecoveryActionResultsEvents | Foreach-object -Process {[XML]$_.toXml()}).event.userData.eventXml
Some of the useful properties for this Recovery Action event are:
So for example, if you wanted to know why MSExchangeRepl was restarted on your server around 9:30PM, you could run a command like this:
$RecoveryActionResultsXML | Where-Object {$_.State -eq "Finished" -and $_.ResourceName –eq "MSExchangeRepl" -and $_.EndTime -like "2013-06-12T21*"}| ft -AutoSize StartTime,RequestorName
$_.ResourceName –eq "MSExchangeRepl" -and $_.EndTime -like "2013-06-12T21*"}| ft -AutoSize StartTime,RequestorName
StartTime
RequestorName
---------
-------------
2013-05-12T21:49:18.2113618Z
ServiceHealthMSExchangeReplEndpointRestart
The RequestorName property indicates the name of the Responder that took the action. In this case, it was ServiceHealthMSExchangeReplEndpointRestart. Often, the responder name will give you an indication of the problem. Other times, you will want more details.
Monitors are the central part of Managed Availability. They are the primary means, through Get-ServerHealth and Get-HealthReport, by which an administrator can learn the health of a server. Recall that a Health Set is a grouping of related Monitors. This is why much of our troubleshooting documentation is focused on these objects. It will often be useful to know what Monitors and Health Sets are repeatedly unhealthy in your environment.
Every time the Health Manager service starts, it logs events to the Microsoft.Exchange.ActiveMonitoring/ResponderDefinition crimson channel, which we can use to get the properties of the Responders we found in the last step by the RequestorName property. First, we need to collect the Responders that are defined:
$DefinedResponders = (Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/ResponderDefinition | % {[xml]$_.toXml()}).event.userData.eventXml
One of these Responder Definitions will match the Recovery Action’s RequestorName. The Monitor that controls the Responder we are interested in is defined by the AlertMask property of that Definition. Here are some of the useful Responder Definition properties:
To get the Monitor for the ServiceHealthMSExchangeReplEndpointRestart Responder, you run:
$DefinedResponders | ? {$_.Name –eq "ServiceHealthMSExchangeReplEndpointRestart"} | ft -a Name,AlertMask
Name
AlertMask
----
ServiceHealthMSExchangeReplEndpointMonitor
Many Monitor names will give you an idea of what to look for. In this case, the ServiceHealthMSExchangeReplEndpointMonitor Monitor does not tell you much more than the Responder name did. The Technet article on Troubleshooting DataProtection Health Set lists this Monitor and suggests running Test-ReplicationHealth. However, you can also get the exact error messages of the Probes for this Monitor with a couple more commands.
Remember that Monitors have their definitions written to the Microsoft.Exchange.ActiveMonitoring/MonitorDefinition crimson channel. Thus, you can get these in a similar way as the Responder definitions in the last step. You can run:
$DefinedMonitors = (Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/MonitorDefinition | % {[xml]$_.toXml()}).event.userData.eventXml
Some useful properties of a Monitor definition are:
To get the SampleMask for the identified Monitor, you can run:
($DefinedMonitors | ? {$_.Name -eq ‘ServiceHealthMSExchangeReplEndpointMonitor’}).SampleMask
ServiceHealthMSExchangeReplEndpointProbe
Now that we know what Probes to look for, we can search the Probes’ definition channel. Useful properties for Probe Definitions are:
To get definitions of this Monitor’s Probes, you can run:
(Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/ProbeDefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ? {$_.Name -like “ServiceHealthMSExchangeReplEndpointProbe*”} | ft -a Name, TargetResource
TargetResource
--------------
ServiceHealthMSExchangeReplEndpointProbe/ServerLocator
MSExchangeRepl
ServiceHealthMSExchangeReplEndpointProbe/RPC
ServiceHealthMSExchangeReplEndpointProbe/TCP
Remember, not all Monitors use synthetic transactions via Probes. See this blog post for the other ways Monitors collect their information.
This Monitor has three Probes that can cause it to become Unhealthy. You’ll see that they are named such that each is named with the Monitor’s SampleMask, but are then differentiated. When getting the Probe Results in the next step, the Probes will also have the TargetResource in their ServiceName.
Now that we know all the Probes that could have failed, but we don’t yet know which did or why.
There are many Probes and they execute often, so the channel where they are logged (Microsoft.Exchange.ActiveMonitoring/ProbeResult) generates a lot of data. There will often only be a few hours of data, but the Probes we are interested in will probably have a few hundred Result entries. Here are some of the Probe Result properties you may be interested in for troubleshooting:
Some Probes may use some of the other available fields to provide additional data about failures.
We can use XPath to filter the large number of events to just the ones we are interested in; those with the ResultName we identified in the last step and with a ResultType of 4 indicating that they failed:
$replEndpointProbeResults = (Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/ProbeResult -FilterXPath "*[UserData[EventXML[ResultName='ServiceHealthMSExchangeReplEndpointProbe/RPC/MSExchangeRepl'][ResultType='4']]]" | % {[XML]$_.toXml()}).event.userData.eventXml
To get a nice graphical view of the Probe’s errors, you can run:
$replEndpointProbeResults | select -Property *Time,Result*,Error*,*Context,State* | Out-GridView
In this case, the full error message for both Probe Results suggests making sure the MSExchangeRepl service is running. This actually is the problem, as for this scenario I restarted the service manually.
This article is a detailed look at how you have access to an incredible amount of information about the health of Exchange Servers. Hopefully, you will not often need it! In most cases, the alerts will be enough notification and the included cmdlets will be sufficient for investigation.
Managed Availability is built and hardened at scale, and we continuously analyze these same events collected in this article so that we can either fix root causes or write Responders to fix more problems before users are impacted. In those cases where you do need to investigate a problem in detail, we hope this post is a good starting point.
We just released a downloadable PDF version of the Exchange Server 2013 Architecture Poster. This is the poster that we handed out at the Office booth and in various Exchange 2013 breakout sessions last week at TechEd North America 2013 in New Orleans, LA. We’ll also be handing out printed copies of the poster at TechEd Europe 2013 in Madrid, Spain in a couple of weeks.
While we cannot provide printed copies for everyone, you can download the PDF file and take it to your favorite printer/copy center, and have them print it for you. It is designed to be printed in 36” x 24” format.
This poster highlights the significantly updated and modernized architecture in Exchange 2013, and highlights the new technologies in Exchange 2013, such as Managed Availability, the new storage and high availability features, and integration with SharePoint and Lync. In addition, it illustrates the new transport architecture in Exchange 2013.
A zoom.it version of the poster can be found at http://zoom.it/BuoF.
We welcome your feedback on the poster. If you have any, please feel free to send it to eapf@microsoft.com.
Scott Schnoll
It is no secret that if you are an Exchange/Office 365 administrator you will no doubt have to troubleshoot Outlook connectivity at some point. Whether you use Exchange Online, on-premises, or some combination of both, you will inevitably have an issue with Outlook performance, connectivity, profile corruption, or some other unknown Outlook disease before retirement.
To assist you with these issues, we have released a Guided Walk Through (GWT) for troubleshooting Outlook Connectivity issues in Office 365. There are a couple of ways to access the troubleshooter. You can access it directly at: http://aka.ms/outlookconnectivity
In addition, it will be embedded in various Outlook connectivity technical resources such as the following:
The purpose of this walk through is to assist you in resolving these complex issues by focusing on the scoping and steps used to isolate and resolve problems. Therefore the walk through starts by focusing on commonly encountered symptoms related to Outlook connectivity.
Consider that there might not be a single solution, but a combination of factors contributing to the problem. Following the walk through will allow you to isolate and remedy the most common causes of Outlook connectivity issues to Office 365.
This walk through is not meant to replace all of the data that helps you understand Outlook connectivity issues, but rather quickly give you the steps you need to help find the solution. The walk through focuses on all version of Office 365.
I wanted to thank the people who helped make this a reality. Here are the parties involved (that I am aware of):
Exchange/Outlook support:
Documentation / content creation teams:
Nagesh Mahadev
Over the past year, we have discussed the architectural changes that have been introduced in Exchange Server 2013. I wrote about the reduction in complexity that the new server role architecture introduces, as well as, the one of the new capabilities introduced in Exchange 2013, Managed Availability’s recovery oriented computing. However, we haven’t been clear on other architectural changes that have shaped decisions we’ve made about the Exchange 2013 product. For example, the decision on reducing the number of databases supported per-server from 100 to 50. There were three main reasons for this:
Let me explain each of these in more detail.
Exchange 2013 includes fundamental changes to the search and store components and data is processed and rendered.
The old content indexing service was replaced with Search Foundation. Search Foundation is an actively developed search platform that is used across the Office Server products. Search Foundation allows us to have notification-driven content indexing which improves indexing performance; in addition, we now annotate during transport, reducing the number of times a message must be indexed significantly.
The monolithic store.exe process was re-written; store is now written in managed code and there are now at least three processes that make up the Information Store service: The Microsoft Exchange Replication service, the Information Store service process controller, and the Information store worker process. By utilizing the worker process model, each database is now isolated from every other database (e.g., a database crashing due to a malformed message will not bring down the rest of the databases on the server).
In addition, there is a core shift in the server role architecture such that the protocol responsible for servicing a user’s request is the protocol instance that is local to the user’s active mailbox database copy. This means that the Mailbox server role now performs more work when compared to its Exchange 2010 counterpart.
The end result is that with the server architecture changes we introduced in Exchange 2013, search, store, and the protocols typically can be CPU and memory bound, as opposed to disk IO or capacity bound.
As discussed in our server sizing guidance, we are big fans of commodity server hardware. Office 365 is designed to run on commodity hardware that leverages 2 processor sockets and 12 disks – we do not leverage external storage chassis as this increases the operational complexity in the environment. Our Exchange 2013 Mailbox servers have less than 50 database copies per-server in Office 365.
The last reason as to why we limited support to 50 databases per-server is that we did not have actual deployments at any scale to validate that store, search, the protocols, and Managed Availability could handle 100 databases per-server. Automation and lab testing can only take you so far; the lack of real world usage was one of the key reasons why we chose to limit the database count.
The Exchange Product Group takes pride in the feedback mechanisms we have invested in with the Exchange community. Since the release of Exchange 2013, we’ve received an inordinate amount of feedback regarding the reduction in supported databases per-server. The driving response has been “we currently deploy more than 50 databases per-server in Exchange 2010; with this change, this means we will need to deploy more servers, which increases our capital expenditures significantly.” Rest assured, that is not the message we want with Exchange 2013. It is true that Exchange 2013 utilizes more CPU and memory than its predecessors – this is due to the architecture changes we’ve made, as well as the changes we’ve made to reduce disk IO, so that you can deploy more mailboxes per disk. But we do not want to see architectures artificially limited by the supported databases per-server constraint.
Over the last several months, we’ve been working to resolve our concerns and improve our test matrices to validate supporting more databases/server.
As a result of the work done by the Mailbox Intelligence team and Operations teams, I am pleased to announce that when Exchange Server 2013 RTM Cumulative Update 2 (CU2) releases we are increasing the number of databases per-server back to 100. Both the Exchange 2013 Server Role Calculator and our sizing guidance will be updated to include this architectural change in tandem with CU2’s release. CU2 will release later this summer.
As always, we continue to identify ways to better serve your needs through our regular servicing releases. We hope you find this architectural change useful. Please keep the feedback coming, we are listening.
Ross Smith IV Principal Program Manager Exchange Customer Experience
As customers move their organization into the Cloud or choose to coexist, there is a need to ensure that some of the basic functionalities users have grown accustomed to, continue to work. While some of you will move all of the users in a cutover fashion which reduces complexity, others will choose a more gradual approach. This troubleshooter is for administrators that have chosen the hybrid approach.
Are you seeing the hash marks in your hybrid Exchange environment as depicted below and want to get rid of them? Then this troubleshooter is for you.
The reason we focused on a troubleshooter for Free Busy is because it is the most commonly used “feature set” in a hybrid deployment. If you were to resolve issues with Free Busy lookups, many of the other potential issues you have with your hybrid deployment would be resolved as well.
A Hybrid Deployment consists of an on-premises Exchange server environment that has at least one Exchange 2010 or Exchange 2013 server. In this environment there is also a DirSync (Directory Synchronization) server, and in many cases, a deployment of ADFS (Active Directory Federation Services) to provide single sign-on capabilities to the users.
The idea of the hybrid environment is to allow two separate organizations (Exchange Online and Exchange On-Premises) to feel like one organization. To accomplish this, we rely on a token authorization process that is made possible through a combination of Organizations Relationships and Federation Trusts with the Microsoft Federation Gateway.
When this is configured properly, you can do basic things like redirect OWA requests to their proper destination, see “MailTips” for a user, and of course the most common feature, view availability information for another user cross-premises.
To read more about Hybrid Deployments click here.
If you are the type that does not like running into issues you can attempt to avoid them, all you have to do is deploy using the Hybrid Configuration Wizard and the Exchange Deployment Assistant. These tools have been designed to get you into an optimal Hybrid configuration which should limit the amount of issues you face. However, with all of the moving parts involved and numerous variants in the on-premises deployments you could still run into issues.
When working with customers and engineers, we have found that the troubleshooting steps that need to be followed are not very clear. There is confusion on what steps are applicable when free busy works in one direction (Cloud to on-premises), but not in the other (on-premises to Cloud). While searching Bing for answers can definitely lead to a solution, we believe we can be more expedient by using the troubleshooter to target solutions at your specific symptom.
The troubleshooter can be found here or at the following simple URL: http://aka.ms/hybridfreebusy
Thanks to Charlotte Raymundo, Nagesh Mahadev, Edgar Quevedo, John Chappelle, Geoffrey Crisp, Star Li and Chen Jiang for their help in creation and review of this troubleshooter.
Timothy Heeney
TechEd North America 2013 happens next week in New Orleans, Louisiana. This year, there are several Exchange and Office 365 break-out sessions and hands-on labs for IT pros and developers, including sessions on Exchange 2013 high availability, virtualization, hybrid deployments, managed availability, retention, archiving & eDiscovery, DLP, site mailboxes, modern public folders, transport, unified messaing, Outlook Web App, EWS, and more!
Recorded sessions are now available on Channel 9. Use the links below to view a session, or head over to TechEd North America 2013 on Channel 9 for more, including the keynote presentation by Brad Anderson.
You can use the Schedule Builder on the TechEd web site to select the sessions you want to attend and sync session info with your Outlook calendar (and have the info handy on your mobile device). For more info, head over to the TechEd North America 2013 web site.
If you’re attending, swing by the Micosoft Office booths to meet Exchange, SharePoint & Office team folks. We’d love to hear from you and answer your Exchange-related questions.
Also check out the following posts from our friends in the Office team:
We look forward to seeing you in New Orleans next week!
Exchange Team
Update 6/13/13: we added a known issue with transport rules to the blog post below.
Today the Exchange CXP team released Update Rollup 1 for Exchange Server 2010 SP3 to the Download Center.
Note: Some of the following KB articles may not be available at the time of publishing this post.
This update contains fixes for a number of customer-reported and internally found issues. For more details, including a list of fixes included in this update, see KB 2803727. We would like to specifically call out the following fixes which are included in this release:
For DST changes, see Daylight Saving Time Help and Support Center (microsoft.com/time).
You cannot install or uninstall Update Rollup 1 for Exchange Server 2010 SP3 on the double-byte character set (DBCS) version of Windows Server 2012 if the language preference for non-Unicode programs is set to the default language. To work around this issue, you must first change this setting. To do this, follow these steps:
After you successfully install or uninstall Update Rollup 1, revert this language setting, as appropriate.
We have identified the cause of this problem and plan to resolve it in a future rollup, but did not want to further delay the release of RU1 for customers who are not impacted by it.
We have an issue where the messages stick in poison queue and transport continually crashes after this rollup is applied.
We have gathered enough information and have determined the issue. Specifically, the issue is caused by a transport rule (disclaimer) attempting to append the disclaimer to the end of HTML formatted messages. When this occurs, messages will be placed in the poison queue and the transport service will crash with an exception. We are investing resources to develop a code fix. You can either disable or reconfigure the disclaimer transport rule.
A question that is often asked of Support in regard to legacy Public Folders is whether they're replicating and how much progress they're making. The most common scenario arises when the administrator is adding a new Public Folder database to the organization and replicating a large amount of data to it. What commonly happens is that the administrator calls Support and says:
The database on the old server is 300GB, but the new database is only 150GB! How can I tell what still needs to be replicated? Is it still progressing??
You can raise diagnostic logging for public folders, but reading the events to see which folders are replicating is tedious. Most administrators want a more detailed way of estimating the progress of replication than comparing file sizes. They also want to avoid checking all the individual replication events.
There are a number of ways to monitor replication progress so that one can make an educated guess as to how long a particular environment will take to complete an operation. In this post, I'm going to provide a detailed example of one approach to estimating the progress of replication by comparing item counts between different public folder stores.
To get the item counts in an Exchange 2003 Public folder database you can use PFDAVAdmin. The process is outlined in this previous EHLO blog post. For what we're doing below, you'll need the DisplayName, Folderpath and the total number of items in the folder. The rest of the fields aren't necessary.
To get the item counts on an Exchange 2007 server, use (remember there is only one Pub per server):
Get-PublicFolderStatistics -Server <servername> | Export-Csv c:\file1.txt
To get the item counts on an Exchange 2010 server, you use:
Get-PublicFolderStatistics -Server <servername> -ResultSize unlimited | Export-Csv c:\file1.txt
There are some very important caveats to this whole procedure. The things you need to watch out for are:
For the actual comparison you can use any number of products. For this blog I have chosen Microsoft Access for demonstrating the process of comparing the CSV files from the different servers. To keep things simple I am going to use the Access database. There are some limitations to my approach:
An outline of the process is:
Assumptions for the steps below:
If your file is different than expected you will have to modify the steps as you go along
Here are the steps for conducting the comparison:
1. Create a new blank Microsoft Access database in a location that has more than double the size of your CSV files available as free space.
2. By default, the Export-Csv cmdlet includes the .NET type information in the first line of the CSV output. Because this line will interfere with the import, we'll need to remove it. Open each CSV file in notepad (this can take a while for larger files) and remove the line highlighted below. In this example the line starting with “AdminDisplayName” would become the topmost line of the file. Once the top line is deleted close and save the file.
Figure 1
TIP You can avoid this step by including the -NoTypeInformation switch when using the Export-CSV cmdlet, which filters out the .NET object type information from the CSV output. For details, see Using the Export-Csv cmdlet on TechNet. (Thanks to #MSExchange MVP @SteveGoodman for the tip!)
3. Import the CSV file to a new table:
Figure 2
4. In the wizard that starts specify the file is delimited as shown and then click Next.
Figure 3
5. Tell the wizard that the text qualifier is the double quote (character 34 in ASCII), the delimiter is the comma and that the “First Row Contains Field Names” as shown in Figure 4.
Note: It is possible that you will receive a warning when you click “First Row Contains Field Names”. If any of the field names violate the rules for a field name Access will display a warning. Don’t panic. Access will replace the non-conforming names with ones it considers appropriate (typically Field1, Field2, etc.). You can change the names if you wish on the Advanced screen.
Figure 4
6. Switch to Advanced view (click the Advanced button highlighted in Figure 4) so that we can change the data type of the FolderPath field. In Access 2010 and older the data type needs to be changed from Text to Memo. In Access 2013 it needs to be changed from Short Text to Long Text. While we are in this window you have the option to exclude columns that are not needed by placing a checkmark in the box from the skip column. In this blog we are only going to use the FolderPath, name and the item count. You can also exclude fields earlier in the process by specifying what fields will be exported when you do the export-csv. The following screenshots show the Advanced properties window.
Figure 5a: Access 2010 and older
Figure 5b: Access 2013
Note: If you think you will be doing this frequently you can use the Save As button to save your settings. The settings will be saved inside the Access database and can then be selected during future imports by clicking on the Specs button.
7. Click OK on the Advanced dialog and then click Finish in the wizard.
8. When prompted to save the Import steps click Close. If you think you will be repeating this process in the future feel free to explore saving the import steps.
9. Access will import the data into a table. By default the table will have the same name as the source CSV file. The files used in creating this blog were called 2007PF_120301 and 2010 PF_120301. If there are any import errors they will be saved in a separate table. Take a moment to examine what they are. The most common is that a field got truncated. If that field is the folderpath it will affect the comparisons later. If there are other problems you will have to troubleshoot what is wrong with the highlighted lines (typically there should be no import errors as long as the FolderPath is set as a Memo field).
10. Go back to Step 2 to import the second file that will be used in the comparison.
11. Now a query must be run to determine if any folderpath exceeds 255 characters. Fields longer than 255 characters cannot be used for a join in an Access query. If we have values that exceed 255 characters in this field we will need to exclude them from the comparison. Additional work to split a long path across multiple fields can be done, but that is being left as an exercise for any Access savvy readers.
12. To get started select the options highlighted in Yellow in Figure 6:
Figure 6
13. Highlight the table where we want to check the length of the folderpath field as shown in Figure 7. Once you have selected the table click Add and then Close:
Figure 7
14. Switch to SQL view as shown in Figure 8:
Figure 8
15. Replace the default select statement with one that looks like this (please make sure you substitute your own table name for the one that I have Bolded in the example):
SELECT Len([FolderPath]) AS Expr1, [2007PF_120301].FolderPath FROM 2007PF_120301 WHERE (((Len([FolderPath]))>254));
Note: Be sure the semi-colon is the last character in the statement.
16. Run the query using the red “!” as shown in Figure 9:
Figure 9
Figure 10
17. If the result is a single empty row (as shown in Figure 10) then skip down to step 19. If the result is at least one row then go back to SQL view (as shown in Figure 8) and change the statement to look like this one (as before please make sure 2007PF_120301 is replaced with the table name actually being used in your database):
SELECT [2007PF_120301].FolderPath, [2007PF_120301].ItemCount, [2007PF_120301].Name, [2007PF_120301].Identity INTO 2007PF_120301_trimmed FROM 2007PF_120301 WHERE (((Len([FolderPath]))<255));
18. You will get a prompt like the one in Figure 11 when you run the query. Select Yes:
Figure 11
19. After it is done repeat steps 11-18 for the other CSV file that was imported to be part of the comparison. If you have done steps 11-18 for both files you will be comparing then advance to step 20.
20. Originally the FolderPath was imported as a memo field (Long Text if using Access 2013). However we cannot join memo fields in a query. We need to convert them to a text field with a length of 255.
If you got a result greater than zero rows in step 16 this step and the subsequent steps will all be carried out on the table specified in the INTO clause of the SQL statement (in this blog that table is named 2007PF_120301_trimmed).
If you were able to skip steps 17 and 18 this step and the subsequent steps will be carried out on the table you imported (2007PF_120301 in this example).
Open the table in Design view by right-clicking on it and selecting Design View as shown in Figure 12. If you select the wrong tables for the subsequent steps you will get a lot of unwanted duplicates in your final comparison output.
Figure 12
21. Change the folderpath from Memo to Text as shown in Figure 13. If you are using Access 2013 change it from Long Text to Short Text.
Figure13
22. With the FolderPath field highlighted look to the lower part of the Design window where the properties of the currently selected field are displayed. Change the field size of folderpath to 255 characters as shown in Figure 14.
Figure 14
23. Save the table and close its design view. You will be prompted as shown in Figure 15. Don’t panic. All the folderpaths should be shorter than the 255 characters specified in the properties of the table. The dialog is just a standard warning from Access. No data should be truncated (the earlier queries should have seen to that). Say Yes and repeat steps 20-23 for the other table being used in this comparison. If you make a mistake here remember that you will still have your original CSV files and can always fix the mistake by removing the tables and redoing the import.
Figure 15
24. We have been on a bit of a journey to make sure we prepared the tables. Now for the comparison. Create a new query (as shown in Figure 6) and highlight both tables that have had the FolderPath shortened to 255 characters as shown in Figure 16. Once they are highlight click Add and then close.
Figure 16
25. Drag Folderpath from the table that is the source of your replication to Folderpath on the other database. The result will look like Figure 17.
Figure 17
26. In the top half of the Query Design window we have the tables with their fields listed. In the bottom half we have the query grid. You can make fields appear in the grid in 3 ways:
For this step we need to add:
27. Go to an empty column in the grid. We need to enter the text that will tells us the difference between the two item counts. Type the following text into the column (be sure to use the table names from your own database and not my example):
Expr1: Abs([2007PF_120301_trimmed].[itemcount]-[2010pf_120301_trimmed].[itemcount])
Note: After steps 25-27 the final result should look like Figure 18. The equivalent SQL looks like this:
SELECT [2007PF_120301_trimmed].FolderPath, [2007PF_120301_trimmed].ItemCount, [2010PF_120301_trimmed].ItemCount, Abs([2007PF_120301_TRIMMED].[ItemCount]-[2010PF_120301_TRIMMED].[ItemCount]) AS Expr1 FROM 2007PF_120301_trimmed INNER JOIN 2010PF_120301_trimmed ON [2007PF_120301_trimmed].FolderPath = [2010PF_120301_trimmed].FolderPath;
Figure 18
28. Run the query using the red “!” shown in Figure 9. The results will show you all the folders that exist in BOTH public folder databases, the itemscount in each database and the difference between them. I like the difference reported as a positive number, but you might prefer to remove the absolute value function.
There is more that can be done with this. You can use Access to run a Find Unmatched query to find all items from one table that are not in the other table (thus locating folders that have an instance in one database, but not the other). You can experiment with different Join types in the query and you can deal with Folderpaths longer than a single text field can accommodate. These and any other additional functionality you desire are left as an exercise for the reader to tackle. I hope this provides you with a process that can be used to compare the item counts between two Public Folder stores (just remember the caveats at the top of the article).
Thanks To Bill Long for reviewing my caveats and Oscar Goco for reviewing my steps with Access.
Change log for the Exchange Server 2013 Server Role Requirements calculator.
Today, we have released an updated version of the Exchange 2013 Server Role Requirements Calculator that addresses several issues found since its initial release. You can view what changes have been made, or download the update directly.
In addition, we are releasing an updated version of the Exchange 2010 Server Role Requirements Calculator as well. You can view what changes have been made, or download the update directly.
With the recent releases of Exchange Server 2013 RTM CU1, Exchange 2013 sizing guidance, Exchange 2013 Server Role Requirements Calculator, and the updated Exchange 2013 Deployment Asistant, on-premises customers now have the tools you need to begin designing and performing migrations to Exchange Server 2013. Many of you have introduced Exchange 2013 RTM CU1 into your test environments alongside Exchange 2010 SP3 and/or Exchange 2007 SP3 RU10, and are readying yourselves for the production migrations.
There's one particular Exchange 2010 design choice some customers made that could throw a monkey wrench into your upgrade plans to Exchange 2013, and we want to walk you through how to mitigate it so you can move forward. If you're still in the design or deployment phase of Exchange Server 2010, we recommend you continue reading this article so you can make some intelligent design choices which will benefit you when you migrate to Exchange 2013 or later.
In Exchange 2010, all Outlook clients in the most typical configurations will utilize MAPI/RPC or Outlook Anywhere (RPC over HTTPS) connections to a Client Access Server. The MAPI/RPC clients connect to the CAS Array Object FQDN (also known as the RPC endpoint) for Mailbox access and the HTTPS based clients connect to the Outlook Anywhere hostname (also known as the RPC proxy endpoint) for all Mailbox and Public Folder access. In addition to these primary connections, other HTTPS based workloads such as EAS, ECP, OAB, and EWS may be sharing the same FQDN as Outlook Anywhere. In some environments you may also be sharing the same FQDN with POP/IMAP based clients and using it as an SMTP endpoint for internal mail submissions.
In Exchange 2010, the recommendation was to utilize split DNS and ensure that the CAS Array Object FQDN was only resolvable via DNS by internal clients. External clients should never be able to resolve the CAS Array Object FQDN. This was covered previously in item #4 of Demystifying the CAS Array Object - Part 2. If you put those two design rules together you come to the conclusion your ClientAccessArray FQDN used by the mailbox database RpcClientAccessServer property should have been an internal-only unique FQDN not utilized by any workload besides MAPI/RPC clients.
Take the following chart as an example of what a suggested configuration in a split DNS configuration would have looked like.
If your do not utilize split DNS, then a suggested configuration may have been.
In speaking with our Premier Field Engineers and MCS consultants, we learned that some of our customers did not choose to use a unique ClientAccessArray FQDN. This design choice may manifest itself in one of two ways. The MAPI/RPC and HTTPS workloads may both utilize the mail.contoso.com FQDN internally and externally, or a unique external FQDN of mail.contoso.com is used while internal MAPI/RPC and HTTPS workloads share mail-int.contoso.com. The shared FQDN in either situation is ambiguous because we can't look at it and immediately understand the workload type that's using it. Perhaps we were not clear enough in our original guidance, or customers felt fewer names would help reduce overall design complexity since everything appeared to work with this configuration.
Take a look at the figure below and the FQDNs in use for some of the different workloads. Shown are EWS, ECP, OWA, CAS Array Object, and Outlook Anywhere External Hostname. The yellow arrow specifically points out the CAS Array Object, the value used as the RpcClientAccessServer for Exchange 2010 mailbox databases, and seen in the Server field of an Outlook profile for an Exchange 2010 mailbox.
An Exchange 2010 deployment with a single ambiguous URL for all workloads.
Let us pause for a moment to visualize what we have talked about so far. If we were to compare an Exchange 2010 environment using ambiguous URLs to one not using ambiguous URLs, it would look like the following diagrams. Notice the first diagram below uses the same FQDN for Outlook MAPI/RPC based traffic and HTTPS based traffic.
If we were to then look at an environment not utilizing ambiguous URLs, we see the clients utilize unique FQDNs for MAPI/RPC based traffic and HTTPS based traffic. In addition, the FQDN utilized for MAPI/RPC based traffic is only resolvable via internal DNS.
If your environment does not look like the one above using ambiguous URLs, then you can go hit the coffee shop for a while or play some XBOX 360. Tell your boss we gave the okay. If your environment does look similar to the first example using ambiguous URLs or you are in the planning stages for Exchange 2010, then please read on as we need you to perform some extra steps when migrating to Exchange 2013.
While this may be working for you today, it certainly will not work tomorrow if you migrate to Exchange 2013. In this scenario where both the MAPI/RPC and HTTP workloads are using the same FQDN you cannot successfully move the FQDN to CAS 2013 without breaking your MAPI/RPC client connectivity entirely. I repeat, your MAPI/RPC clients will start failing to connect via MAPI/RPC once their DNS cache expires after the shared FQDN is moved to CAS 2013. The MAPI/RPC clients will fail to connect because CAS 2013 does not know how to handle direct MAPI/RPC connections as all Windows based Outlook clients utilize MAPI over a RPC over HTTPS connection in Exchange 2013. There is a chance your Outlook clients may successfully fall back to HTTPS only if Outlook Anywhere is currently enabled for Exchange 2010 when the failure to connect via MAPI/RPC takes place, but this article will help with the following.
First off, if you are still in the planning stages of Exchange 2010 you need to take our warning to heart and immediately change your design to use a specific internal-only FQDN for MAPI/RPC clients. If you are in the middle of a 2010 deployment using an Ambiguous URL I recommend you change your ClientAccessArray FQDN to a unique name and update the mailbox database RpcClientAccessServer values on all Exchange 2010 mailbox databases accordingly. Fixing this item mid-migration to Exchange 2010 or even in your fully migrated environment will ensure any newly created or manually repaired Outlook profiles are protected, but it will not automatically fix existing Outlook clients with the old value in the server field.
While not necessary as long as you go through our mitigation steps below, any existing Outlook profiles could be manually repaired to reflect the new value. If you are curious why a manual repair is necessary you can refer to items #5 and #6 in Demystifying the CAS Array Object - Part 2. Again, forcing this update is not necessary if you follow our mitigation steps later in this article. However, if you were to choose to update some specific Outlook profiles we suggest you perform those steps in your test environment first to make sure you have the process down correctly.
Additionally as we previously discussed in item #3 of Demystifying the CAS Array Object – Part 1, the ClientAccessArray FQDN is not needed in your SSL certificate as it is not being used for HTTPS based traffic. Because of this, the only thing you would need to do is create a new internal DNS record, update your ClientAccessArray FQDN, and finally update your Exchange 2010 Mailbox Database RpcClientAccessServer values. It bears repeating that you do not have to get a new SSL certificate only to fix an Ambiguous URL situation.
Our suggestion is to implement Outlook Anywhere internally for all users prior to introducing Exchange Server 2013 to the environment.
Many of our customers have already moved to Outlook Anywhere internally for all Windows Outlook clients. In fact, those of you reading this with OA in use internally are good to proceed to the coffee shop or go play XBOX 360 with the other folks if you’d like to.
Now for the rest of you… sit a little closer. Go ahead and fill in, there are plenty of seats in the front row like usual.
In Exchange Server 2013 all Windows Outlook clients operate in Outlook Anywhere mode internally. By following these mitigation steps you will be one step ahead of where you will end up after your migration to Exchange Server 2013 anyways.
If you do not have Outlook Anywhere enabled at all in your environment, please see Enable Outlook Anywhere on TechNet for steps on how to enable it in Exchange 2010. If your company does not wish to provide external access for Outlook Anywhere that is ok. By simply enabling Outlook Anywhere you will not be providing remote access unless you also publish the /rpc virtual directory to the Internet.
It is suggested customers, especially very large ones, consider enabling Kerberos authentication to avoid any potential performance issues you may run into utilizing the default NTLM authentication. Information on how to configure Kerberos Authentication can be found here on TechNet for Exchange Server 2010 and the steps for Exchange Server 2013 are similar which we will have documentation for in the near future. However, please keep in mind Kerberos authentication with Outlook Anywhere is only supported with Windows Vista or later.
By default with Outlook Anywhere enabled in the environment your clients prefer RPC/TCP connections when on Fast Networks as seen below.
The trick we use to force Outlook Anywhere to also be used internally is via Autodiscover. Using Autodiscover we can make Windows Outlook clients prefer RPC/HTTPS on both Fast and Slow networks as seen here.
The method used to make clients always prefer HTTPS is configuring the OutlookProviderFlags option via the Set-OutlookProvider cmdlet. The following commands are executed from the Exchange 2010 Management Shell.
Set-OutlookProvider EXPR -OutlookProviderFlags:ServerExclusiveConnect Set-OutlookProvider EXCH -OutlookProviderFlags:ServerExclusiveConnect
Set-OutlookProvider EXPR -OutlookProviderFlags:ServerExclusiveConnect
Set-OutlookProvider EXCH -OutlookProviderFlags:ServerExclusiveConnect
If for any reason you need to put the configuration back to its default settings, issue the following commands and clients will no longer prefer HTTP on Fast Networks.
Set-OutlookProvider EXPR -OutlookProviderFlags:None Set-OutlookProvider EXCH -OutlookProviderFlags:None
Set-OutlookProvider EXPR -OutlookProviderFlags:None
Set-OutlookProvider EXCH -OutlookProviderFlags:None
You can prepare to introduce Exchange Server 2013 to your environment once all of your Windows Outlook clients are preferring HTTP on both fast and slow networks and are connecting through mail.contoso.com for RPC over HTTPS connections.
There are a small number of things we would like to call out as you plan this migration to enable Outlook Anywhere for all internal clients.
First, your front end infrastructure (CAS 2013, Load Balancer, etc…) must ready to immediately handle the full production load of Windows Outlook clients when you re-point the mail.contoso.com FQDN in DNS.
Second, if your Exchange 2010 Client Access Servers were not scaled for 100% Outlook Anywhere connections then performance should be monitored when OA is enabled and all clients are moved from MAPI/RPC based to HTTPS based workloads. You should be ready to scale out your CAS 2010 infrastructure if necessary to mitigate any possible performance issues.
Lastly, Windows Outlook clients older than Outlook 2007 are not supported going through CAS 2013 even if their mailbox is on an older Exchange version. All Windows Outlook clients going through CAS 2013 have to be at least the minimum versions supported by Exchange 2013. Any unsupported clients, such as Outlook 2003, do not support Autodiscover and would have to be manually with a new MAPI/RPC specific endpoint to assure they continue communicating with Exchange 2010 until the client can be updated and the mailbox migrated to Exchange 2013.
Note: The easiest way to confirm what major/minor version of Outlook you have is to look at the version of OUTLOOK.EXE and EMSMDB32.DLL via Windows Explorer or to run an inventory report through Microsoft System Center Configuration Manager or similar software. The minimum version numbers Exchange Server 2013 supports for on-premises deployments are provided below.
If we were to visualize the mitigation steps from start to end we need to compare it between phases.
First, the upper area of the below diagram depicts the start state of the environment with internal Windows Outlook clients utilizing MAPI/RPC and ambiguous URLs for their HTTPS based workloads. The lower area of the diagram depicts the same environment, but we have now forced Outlook Anywhere to be used by internal Windows Outlook clients. This change has forced all mailbox and public folder access traffic over HTTPS through the mail.contoso.com Outlook Anywhere FQDN.
We now have all Windows Outlook clients utilizing Outlook Anywhere internally by levering Autodiscover to force the preference of HTTPS. Now that all Windows Outlook traffic is routed through mail.contoso.com via HTTPS, the ambiguous URL problem has been mitigated. However, you may have other applications integrating with Exchange whom are unable to utilize Outlook Anywhere and/or Autodiscover. These applications will also be affected if you were to update the mail.contoso.com DNS entry to point at Exchange 2013. Before moving onto the second step it may be most efficient to add a HOSTS file entry on the servers hosting these external applications to force resolution of mail.contoso.com to the Layer-7 Load Balancer used by Exchange 2010. This should allow you to temporarily continue routing external application traffic that needs to talk to only Exchange 2010 via MAPI/RPC while you work on updating the applications to be Outlook Anywhere compatible, which they will need to be before they can ever connect to Exchange 2013.
Having dealt with both the Windows Outlook clients and third-party applications whom cannot utilize Outlook Anywhere, we can now move onto the second step. The second step is executed when you are ready to introduce Exchange 2013 to the environment.
The below diagram starts by showing where we finished after executing step one. The lower area of the below diagram shows that we have updated DNS to point the mail.contoso.com entry to the new IP of the new Exchange 2013 load balancer configuration. Because of the HOSTS entry we made our application server continues talking to the old Layer-7 load balancer for its MAPI over RPC/TCP connections. Exchange 2013 CAS will now receive all client traffic and then we proxy traffic for users still on Exchange 2010 back to the Exchange 2010 CAS infrastructure. The redundant CAS was removed from the diagram to simplify the view and simply show traffic flow.
In summary, we hope those of you in this unique configuration will be able to smoothly migrate from Exchange 2010 to Exchange 2013 now that you have these mitigation steps. Some of you may identify other potential methods to use and wonder why we are offering only a single mitigation approach. There were many methods investigated, but this mitigation approach came back every time as the most straightforward method to implement, maintain, and support. Given the potential complexity of this change we invite you to ask follow-up questions at the follow Exchange Server Forum where we can often better interact with you than the comments format allows.
Exchange Server Forum: Exchange Server 2013 – Setup, Deployment, Updates, and Migration
Brian Day Senior Program Manager Exchange Customer Experience
In Exchange 2010, we introduced Retention Tags, a Messaging Records Management (MRM) feature that allows you to manage email lifecycle. You can use retention policies to retain mailbox data for as long as it’s required to meet business or regulatory requirements, and delete items older than the specified period.
One of the design goals for MRM 2.0 was to simplify administration compared to Managed Folders, the MRM feature introduced in Exchange 2007, and allow users more flexibility. By applying a Personal Tag to a folder, users can have different retention settings apply to items in that folder than the default tag applied to the entire mailbox(known as a Default Policy Tag). Similarly, users can apply a different tag to a subfolder than the one applied to the parent folder. Users can also apply a Personal Tag to individual items, allowing them the freedom to organize messages based on their work habits and preference, rather than forcing them to move messages, based on the retention requirement, to an admin-controlled Managed Folder.
You can still use Managed Folders in Exchange 2010, but they’re not available in Exchange 2013.
For a comparison of Retention Tags with Managed Folders and migration details, see Migrate Managed Folders.
If you like the Managed Folders approach of being able to create a folder in the user’s mailbox and configure a retention setting for that folder, you can use Exchange Web Services (EWS) to accomplish something similar, with some caveats mentioned later in this post. You can write your own code or even a PowerShell script to create a folder in the user’s mailbox and apply a Personal Tag to it. There are scripts available on the interwebs, including some code samples on MSDN to accomplish this. For example:
Note: The above scripts are examples for your reference. They’re not written or tested by the Exchange product group.
We frequently get questions about whether this is supported by Microsoft. Short answer: Yes. Exchange Web Services (EWS) is a supported and documented API, which allows ISVs and customers to create custom solutions for Exchange.
When using EWS in your code or PowerShell script to apply a Personal Tag to a folder, it’s important to consider the following:
For Developers
For IT Pros
If using EWS to apply a Personal Tag to custom folders helps you meet your business requirements, absolutely! However, do note and consider the following:
Provisioning custom folders with different retention settings (by applying Personal Tags) may help you meet your organization’s retention requirements. As an IT Pro, make sure you understand the above and follow the best practices.
Bharat Suneja
The Microsoft Exchange Server 2013 Management Pack (SCOM MP) is now live!
As I discussed in my Managed Availability article, the key difference between this management pack and previous releases, is that our health logic is now built into Exchange, as opposed to the management pack. This means updates to Exchange 2013 (like our cumulative updates), will include changes to the probes, monitors, and responders. Any issues that Managed Availability cannot solve are bubbled up to SCOM via an event monitor.
You can download the management pack via Microsoft Download Center at http://www.microsoft.com/en-us/download/details.aspx?id=39039.
You can also view the following documentation:
More information can be found at the SCOM team’s blog - http://blogs.technet.com/b/momteam/archive/2013/05/14/exchange-2013-management-pack-released.aspx.
It’s been a long road, but the initial release of the Exchange 2013 Server Role Requirements Calculator is here. No, that isn’t a mistake, the calculator has been rebranded. Yes, this is no longer a Mailbox server role calculator; this calculator includes recommendations on sizing Client Access servers too! Originally, marketing wanted to brand it as the Microsoft Exchange Server 2013 Client Access and Mailbox Server Roles Theoretical Capacity Planning Calculator, On-Premises Edition. Wow, that’s a mouthful and reminds me of this branding parody. Thankfully, I vetoed that name (you’re welcome!).
The calculator supports the architectural changes made possible with Exchange 2013:
Like with Exchange 2010, the recommendation in Exchange 2013 is to deploy multi-role servers. There are very few reasons you would need to deploy dedicated Client Access servers (CAS); CPU constraints, use of Windows Network Load Balancing in small deployments (even with our architectural changes in client connectivity, we still do not recommend Windows NLB for any large deployments) and certificate management are a few examples that may justify dedicated CAS.
When deploying multi-role servers, the calculator will take into account the impact that the CAS role has and make recommendations for sizing the entire server’s memory and CPU. So when you see the CPU utilization value, this will include the impact both roles have!
When deploying dedicated server roles, the calculator will recommend the minimum number of Client Access processor cores and memory per server, as well as, the minimum number of CAS you should deploy in each datacenter.
Now that the Mailbox server role includes additional components like transport, it only makes sense to include transport sizing in the calculator. This release does just that and will factor in message queue expiration and Safety Net hold time when calculating the database size. The calculator even makes a recommendation on where to deploy the mail.que database, either the system disk, or on a dedicated disk!
Exchange 2010 introduced the concept of 1 database per JBOD volume when deploying multiple database copies. However, this architecture did not ensure that the drive was utilized effectively across all three dimensions – throughput, IO, and capacity. Typically, the system was balanced from an IO and capacity perspective, but throughput was where we saw an imbalance, because during reseeds only a portion of the target disk’s total capable throughput was utilized. In addition, capacity on the 7.2K disks continue to increase with 4TB disks now available, thus impacting our ability to remain balanced along that dimension. In addition, Exchange 2013 includes a 33% reduction in IO when compared to Exchange 2010. Naturally, the concept of 1 database / JBOD volume needed to evolve. As a result, Exchange 2013 made several architectural changes in the store process, ESE, and HA architecture to support multiple databases per JBOD volume. If you would like more information, please see Scott’s excellent TechEd session in a few weeks on Exchange 2013 High Availability and Site Resilience or the High Availability and Site Resilience topic on TechNet.
By default, the calculator will recommend multiple databases per JBOD volume. This architecture is supported for single datacenter deployments and multi-datacenter deployments when there is copy and/or server symmetry. The calculator supports highly available database copies and lagged database copies with this volume architecture type. The distribution algorithm will lay out the copies appropriately, as well as, generate the deployment scripts correctly to support AutoReseed.
The calculator has been improved in several ways for high availability architectures:
Over the years, a few, but vocal, members of the community have requested that I add more mailbox tiers to the calculator. As many of you know, I rarely recommend sizing multiple mailbox tiers, as that simply adds operational complexity and I am all about removing complexity in your messaging environments. While, I haven’t specifically added additional mailbox tiers, I have added the ability for you to define a percentage of the mailbox tier population that should have the IO and Megacycle Multiplication Factors applied. In a way, this allows you to define up to eight different mailbox tiers.
I’ve received a number of questions regarding processor sizing in the calculator. People are comparing the Exchange 2010 Mailbox Server Role Requirements Calculator output with the Exchange 2013 Server Role Requirements Calculator. As mentioned in our Exchange 2013 Performance Sizing article, the megacycle guidance in Exchange 2013 leverages a new server baseline, therefore, you cannot directly compare the output from the Exchange 2010 calculator with the Exchange 2013 calculator.
There are many other minor improvements sprinkled throughout the calculator. We hope you enjoy this initial release. All of this work wouldn’t have occurred without the efforts of Jeff Mealiffe (for without our sizing guidance there would be no calculator!), David Mosier (VBA scripting guru and the master of crafting the distribution worksheet), and Jon Gollogy (deployment scripting master).
As always we welcome feedback and please report any issues you may encounter while using the calculator by emailing strgcalc AT microsoft DOT com.
Update 7/15/2013: we have made a few updates to the below blog post to adjust the instructions for the release of the version 2 of the script.
Prior to Exchange 2007, there were two primary methods of implementing automated resource scheduling – Direct Booking and the AutoAccept Agent (a store event sink released as a web download for Exchange 2003). In Exchange 2007, we changed how automated resource scheduling is implemented. The AutoAccept Agent is no longer supported, and the Direct Booking method, technically an Outlook function, has been replaced with server-side calendar booking function called the Resource Booking Attendant.
Note: There are various terms associated with this new Resource Booking function, such as: Calendar Processing, Automatic Resource Booking, Calendar Attendant Processing, Automated Processing and Resource Booking Assistant. We will be using the “Resource Booking Attendant” nomenclature for this article.
While the Direct Booking method for resource scheduling can indeed work on Exchange Server 2007/2010/2013, we strongly recommend that you disable Direct Booking for resource mailboxes and use the Resource Booking Attendant instead. Specifically, we are referring to the “AutoAccept” Automated Processing feature of the Resource Booking Attendant, which can be enabled for a mailbox after it has been migrated to Exchange 2007 or later and upgraded to a Resource Mailbox.
Note: The published resource mailbox upgrade guidance on TechNet specifies to disable Direct Booking in the resource mailbox while still on Exchange 2003, move the mailbox, and then enable the AutoAccept functionality via the Resource Booking Attendant. This order of steps can introduce an unnecessary amount of time where the resource mailbox may be without automated scheduling capabilities.
We are currently working to update that guidance to reflect moving the mailbox first, and only then proceed with disabling the Direct Booking functionality, after which the AutoAccept functionality via the Resource Booking Attendant can be immediately enabled. This will shorten the duration where the mailbox is without automated resource scheduling capabilities.
This conversion process to resource mailboxes utilizing the Resource Booking Attendant is sometimes an honest oversight or even deliberately ignored when migrating away from Exchange 2003 due to Direct Booking’s ability to continue to work with newer versions of Exchange, even Exchange Online. This will often result in resource mailboxes (or even user mailboxes!) with Direct Booking functionality remaining in place long after Exchange 2003 is ancient history in the environment.
There are issues that can arise from leaving Direct Booking enabled, from simple administrative burden scenarios all the way to major calendaring issues. Additionally, Resource Booking Attendant offers advantages over Direct Booking functionality:
How does one validate if Direct Booking settings are enabled on mailboxes in the organization, especially if mailboxes had previously been hosted on Exchange 2003?
Figure 1: Checking Direct Booking settings in Microsoft Outlook 2010
Unfortunately, the manual steps involve assigning permissions to all mailboxes, creating MAPI profiles for each mailbox, logging into each mailbox, checking Tools > Options > Calendar > Resource Scheduling, note which of the three Direct Booking checkboxes are checked, click OK/Cancel a few times, log out of mailbox. Whew! That can be a major undertaking even for a small to midsize company that has more than a handful of mailboxes! Having staff perform this type of activity manually can be a costly and tedious endeavor. Once you have discovered which mailboxes have the Direct Booking settings enabled, you would then have to repeat this entire process to disable these settings unless you removed them at the time of discovery.
Having an automated method to discover, track, and even disable Direct Booking settings would be nice right?
Using Exchange Web Services (EWS) and PowerShell, we can automate the discovery of Direct Booking settings that are enabled, track the results, and even disable them! We wrote Remove-DirectBooking.ps1, a sample script, to do exactly that and even more to aid in automating this manual effort.
After you've downloaded it, rename the file and remove the .txt extension.
IMPORTANT: The previously uploaded script had the last line truncated to Stop-Tran (instead of Stop-Transcript). We've uploaded an updated version to TechNet Gallery. If you downloaded the previous version of the script, please download the updated version. Alternatively, you can open the previously downloaded version in Notepad or other text editor and correct the last line to Stop-Transcript.
Let’s break down the major tasks the PowerShell script does:
Uses EWS Application Impersonation to tap into a mailbox (or set of mailboxes) and read the three MAPI properties where the Direct Booking settings are stored. It does this by accessing the localfreebusy item sitting in the NON_IPM_SUBTREE\FreeBusy Data folder, which resides in the root of the Information Store in the mailbox. The three MAPI properties and their equivalent Outlook settings the script looks at are:
These three properties contain Boolean values mirroring the Resource Scheduling checkboxes found in Outlook (see Figure 1 above).
Note: It is important to understand that by default the script runs in a read-only mode. Additional command line switches are available to run the script to disable Direct Booking settings.
Here are a couple of example scenarios that illustrate how to use the script to discover and remove enabled Direct Booking settings.
You've recently migrated from Exchange 2003 to Exchange 2010 and would like to disable Direct Booking for your company’s conference room mailboxes as well as any user mailboxes that may have Direct Booking settings enabled. The administrator’s logged in account has Application Impersonation rights and the View-Only Recipients RBAC role assigned.
.\Remove-DirectBooking.ps1 –identity * -UseDefaultCredentials
Figure 2: Output file containing list of mailboxes with Direct Booking enabled
.\Remove-DirectBooking.ps1 –InputFile ‘.\Remove-DirectBooking_<timestamp>.txt’ –UseDefaultCredentials -RemoveDirectBooking
Figure 3: Reviewing runtime log file in Excel
Note The Direct Booking Removed? column now shows Yes where applicable, but the three Direct Booking settings columns still show their various values as “Yes”; this is because we record those three values pre-removal. If you were to run the script again in read-only mode against the same input file, those columns would reflect a value of N/A since there would no longer be any Direct Booking settings enabled. The Resource Room?, AutoAccept Enabled?, and Conflict Detected all have a value of N/A regardless because they are not relevant when disabling the Direct Booking settings.
You're an administrator who's new to an organization. You know that they migrated from Exchange 2003 to Exchange 2007 in the distant past and are currently in the process of implementing Exchange 2010, having already migrated some users to Exchange 2010. You have no idea what resources mailboxes or even user mailboxes may be using Direct Booking and would like to discover who has what Direct Booking settings enabled. You would then like to selectively choose which mailboxes to pilot for Direct Booking removal before taking action on the majority of found mailboxes.
Here's how you would accomplish this using the Remove-DirectBooking.ps1 script:
.\Remove-DirectBooking.ps1 –Identity *
Figure 4: Reviewing the Remove-DirectBooking_<timestamp>.csv in Excel
.\Remove-DirectBooking.ps1 –InputFile ‘.\’ -RemoveDirectBooking
Please see the script’s help section (via “get-help .\remove-DirectBooking.ps1 -full”) for full information on all the available parameters. Here are some additional options that may be useful in certain scenarios:
The script does have several prerequisites and caveats to ensure proper operation and meaningful results:
The discovery and removal of Direct Booking settings can be a tedious and costly process to perform manually, but you can avoid and automate it using current functions and features via PowerShell and EWS in Microsoft Exchange Server 2007, 2010, & 2013. With careful use, the Remove-DirectBooking.ps1 script can be a valuable tool to aid Exchange administrators in maintaining automated resource scheduling capabilities in their Microsoft Exchange environments.
Your feedback and comments are welcome.
Thank you to Brian Day and Nino Bilic for their guidance in content review, and to our customers (you know who you are) for piloting the script.
Seth Brandes & Dan Smith
Since the release to manufacturing (RTM) of Exchange 2013, you have been waiting for our sizing and capacity planning guidance. This is the first official release of our guidance in this area, and updates to our TechNet content will follow in a future milestone.
As we continue to learn more from our own internal deployments of Exchange 2013, as well as from customer feedback, you will see further updates to our sizing and capacity planning guidance in two forms: changes to the numbers mentioned in this document, as well as further guidance on specific areas not covered here. Let us know what you think we are missing and we will do our best to respond with better information over time.
Historically, the Exchange Server product group has used various sources of data to produce sizing guidance. Typically, this data would come from scale tests run early in the product development cycle, and we would then fine-tune that guidance with observations from production deployments closer to final release. Production deployments have included Exchange Dogfood (our internal pre-release deployment that hosts the Exchange team and various other groups at Microsoft), Microsoft IT’s corporate Exchange deployment, and various early adopter programs.
For Exchange 2013, our guidance is primarily based on observations from the Exchange Dogfood deployment. Dogfood hosts some of the most demanding Exchange users at Microsoft, with extreme messaging profiles and many client sessions per user across multiple client types. Many users in the Dogfood deployment send and receive more than 500 messages per day, and typically have multiple Outlook clients and multiple mobile devices simultaneously connected and active. This allows our guidance to be somewhat conservative, taking into account additional overhead from client types that we don’t regularly see in our internal deployments as well as client mixes that might be different from what's considered “normal” at Microsoft.
Does this mean that you should take this conservative guidance and adjust the recommendations such that you deploy less hardware? Absolutely not. One of the many things we have learned from operating our own very high-scale service is that availability and reliability are very dependent on having capacity available to deal with those unexpected peaks.
Sizing is both a science and an art form. Attempting to apply too much science to the process (trying to get too accurate) usually results in not having enough extra capacity available to deal with peaks, and in the end, results in a poor user experience and decreased system availability. On the other hand, there does need to be some science involved in the process, otherwise it’s very challenging to have a predictable and repeatable methodology for sizing deployments. We strive to achieve the right balance here.
From a sizing and performance perspective, there are a number of advantages with the new Exchange 2013 architecture. As many of you are aware, a couple of years ago we began recommending multi-role deployment for Exchange 2010 (combining the Mailbox, Hub Transport, and Client Access Server (CAS) roles on a single server) as a great way to take advantage of hardware resources on modern servers, as well as a way to simplify capacity planning and deployment. These same advantages apply to the Exchange 2013 Mailbox role as well. We like to think of the services running on the Mailbox role as providing a balanced utilization of resources rather than having a set of services on a role that are very disk intensive, and a set of services on another role that are very CPU intensive.
Another example to consider for the Mailbox role is cache effectiveness. Software developers use in-memory caching to prevent having to use higher-latency methods to retrieve data (like LDAP queries, RPCs, or disk reads). In the Exchange 2007/2010 architecture, processing for operations related to a particular user could occur on many servers throughout the topology. One CAS might be handling Outlook Web App for that user, while another (or more than one) CAS might be handling Exchange ActiveSync connections, and even more CAS might be processing Outlook Anywhere RPC proxy load for that same user. It’s even possible that the set of servers handling that load could be changing on a regular basis. Any data associated with that user stored in a cache would become useless (effectively a waste of memory) as soon as those connections moved to other servers. In the Exchange 2013 architecture, all workload processing for a given user occurs on the Mailbox server hosting the active copy of that user’s mailbox. Therefore, cache utilization is much more effective.
The new CAS role has some nice benefits as well. Given that the role is totally stateless from a user perspective, it becomes very easy to scale up and down as demands change by simply adding or removing servers from the topology. Compared to the CAS role in prior releases, hardware utilization is dramatically reduced meaning that fewer CAS role machines will be required. Additionally, it may make sense for many customers to consider a multi-role deployment in which CAS and Mailbox are co-located – this allows further simplification of capacity planning and deployment, and also increases the number of available CAS which has a positive effect on service availability. Look for a follow up post on the benefits of a multi-role deployment soon.
Sizing an Exchange deployment has six major phases, and I will go through each of them in this post in some detail.
The primary input to all of the calculations that you will perform later is the average user profile of the deployment, where the user profile is defined as the sum of total messages sent and total messages received per-user, per-workday (on average). Many organizations have quite a bit of variability in user profiles. For example, a segment of users might be considered “Information Workers” and spend a good part of their day in their mailbox sending and reading mail, while another segment of users might be more focused on other tasks and use email infrequently. Sizing for these segments of users can be accomplished by either looking at the entire system using weighted averages, or by breaking up the sizing process to align with the various segments of users. In general it’s certainly easier to size the whole system as a unit, but there may be specific requirements (like the use of certain 3rd party tools or devices) which will significantly impact the sizing calculation for one or more of the user segments, and it can be very difficult to apply sizing factors to a user segment while attempting to size the entire solution as a unit.
The obvious question in your mind is how to go get this user profile information. If you are starting with an existing Exchange deployment, there are a number of options that can be used, assuming that you aren’t the elusive Exchange admin who actually tracks statistics like this on an ongoing basis. If you are using Exchange 2007 or earlier, you can utilize the Exchange Profile Analyzer (EPA) tool, which will provide overall user profile statistics for your Exchange organization as well as detailed per-user statistics if required. If you are on Exchange 2010, the EPA tool is not an option for you. One potential option is to evaluate message traffic using performance counters to come up with user profile averages on a per-server basis. This can be done by monitoring the MSExchangeIS\Messages Submitted/sec and MSExchangeIS\Messages Delivered/sec counters during peak average periods and extrapolating the recorded data to represent daily per-user averages. I will cover this methodology in a future blog post, as it will take a fair amount of explanation. Another option is to use message tracking logs to generate these statistics. This could be done via some crafty custom PowerShell scripting, or you could look for scripts that attempt to do this work for you already. One of our own consultants points to an example on his blog.
Typical user profiles range from 50-500 messages per-user/per-day, and we provide guidance for those profiles. When in doubt, round up.
The other important piece of profile information for sizing is the average message size seen in the deployment. This can be obtained from EPA, or from the other mentioned methods (via transport performance counters, or via message tracking logs). Within Microsoft, we typically see average message sizes of around 75KB, but we certainly have worked with customers that have much higher average message sizes. This can vary greatly by industry, and by region.
Just as we recommended for Exchange 2010, the right way to start with sizing calculations for Exchange 2013 is with the Mailbox role. In fact, those of you who have sized deployments for Exchange 2010 will find many similarities with the methodology discussed here.
Throughout this article, we will be referring to an example deployment. The deployment is for a relatively large organization with the following attributes:
The first thing you need to determine is your high availability model, e.g., how you will meet the availability requirements that you determined earlier. This likely includes multiple database copies in one or more Database Availability Groups, which will have an impact on storage capacity and IOPS requirements. The TechNet documentation on this topic provides some background on the capabilities of Exchange 2013 and should be reviewed as part of the sizing process.
At a minimum, you need to be able to answer the following questions:
Once you have an understanding of how you will meet your high availability requirements, you should know the number of database copies and sites that will be deployed. Given this, you can begin to evaluate capacity requirements. At a basic level, you can think of capacity requirements as consisting of storage for mailbox data (primarily based on mailbox storage quotas), storage for database log files, storage for content indexing files, and overhead for growth. Every copy of a mailbox database is a multiplier on top of these basic storage requirements. As a simplistic example, if I was planning for 500 mailboxes of 1GB each, the storage for mailbox data would be 500GB, and then I would need to apply various factors to that value to determine the per-copy storage requirement. From there, if I needed 3 copies of the data for high availability, I would then need to multiply by 3 to obtain the overall capacity requirement for the solution (all servers). In reality, the storage requirements for Exchange are far more complex, as you will see below.
To determine the actual size of a mailbox on disk, we must consider 3 factors: the mailbox storage quota, database white space, and recoverable items.
The mailbox storage quota is what most people think of as the “size of the mailbox” – it’s the user perceived size of their mailbox and represents the maximum amount of data that the user can store in their mailbox on the server. While this is certainly represents the majority of space utilization for Exchange databases, it’s not the only element by which we have to size.
Database whitespace is the amount of space in the mailbox database file that has been allocated on disk but doesn’t contain any in-use database pages. Think of it as available space to grow into. As content is deleted out of mailbox databases and eventually removed from the mailbox recoverable items, the database pages that contained that content become whitespace. We recommend planning for whitespace size equal to 1 day worth of messaging content.
Estimated Database Whitespace per Mailbox = per-user daily message profile x average message size
This means that a user with the 200 message/day profile and an average message size of 75KB would be expected to consume the following whitespace:
200 messages/day x 75KB = 14.65MB
When items are deleted from a mailbox, they are really “soft-deleted” and moved temporarily to the recoverable items folder for the duration of the deleted item retention period. Like Exchange 2010, Exchange 2013 has a feature known as single item recovery which will prevent purging data from the recoverable items folder prior to reaching the deleted item retention window. When this is enabled, we expect to see a 1.2 percent increase in mailbox size for a 14 day deleted item retention window. Additionally, we expect to see a 3 percent increase in the size of the mailbox for calendar item version logging which is enabled by default. Given that a mailbox will eventually reach a steady state where the amount of new content will be approximately equal to the amount of deleted content in order to remain under quota, we would expect the size of the items in the recoverable items folder to eventually equal the size of new content sent & received during the retention window. This means that the overall size of the recoverable items folder can be calculated as follows:
Recoverable Items Folder Size = (per-user daily message profile x average message size x deleted item retention window) + (mailbox quota size x 0.012) + (mailbox quota size x 0.03)
If we carry our example forward with the 200 message/day profile, a 75KB average message size, a deleted item retention window of 14 days, and a mailbox quota of 10GB, the expected recoverable items folder size would be:
(200 messages/day x 75KB x 14 days) + (10GB x 0.012) + (10GB x 0.03) = 210,000KB + 125,819.12K + 314,572.8KB = 635.16MB
Given the results from these calculations, we can sum up the mailbox capacity factors to get our estimated mailbox size on disk:
Mailbox Size on disk = 10GB mailbox quota + 14.65MB database whitespace + 635.16MB Recoverable Items Folder = 10.63GB
The space required for files related to the content indexing process can be estimated as 20% of the database size.
Per-Database Content Indexing Space = database size x 0.20
In addition, you must additionally size for one additional content index (e.g. an additional 20% of one of the mailbox databases on the volume) in order to allow content indexing maintenance tasks (specifically the master merge process) to complete. The best way to express the need for the master merge space requirement would be to look at the average database file size across all databases on a volume and add 1 database worth of disk consumption to the calculation when determining the per-volume content indexing space requirement:
Per-Volume Content Indexing Space = (average database size x (databases on the volume + 1) x 0.20)
As a simple example, if we had 2 mailbox databases on a single volume and each database consumed 100GB of space, we would compute the per-volume content indexing space requirement like this:
100GB database size x (2 databases + 1) x 0.20 = 60GB
The amount of space required for ESE transaction log files can be computed using the same method as Exchange 2010. You can find details on the process in the Exchange 2010 TechNet guidance. To summarize the process, you must first determine the base guideline for number of transaction logs generated per-user, per-day, using the following table. As in Exchange 2010, log files are 1MB in size, making the math for log capacity quite straightforward.
Once you have the appropriate value from the table which represents guidance for a 75KB average message size, you may need to adjust the value based on differences in the target average message size. Every time you double the average message size, you must increase the logs generated per day by an additional factor of 1.9. For example:
Transaction logs at 200 messages/day with 150KB average message size = 40 logs/day (at 75KB average message size) x 1.9 = 76 Transaction logs at 200 messages/day with 300KB average message size = 40 logs/day (at 75KB average message size) x (1.9 x 2) = 152
While daily log volume is interesting, it doesn’t represent the entire requirement for log capacity. If traditional backups are being used, logs will remain on disk for the interval between full backups. When mailboxes are moved, that volume of change to the target database will result in a significant increase in the amount of logs generated during the day. In a solution where Exchange native data protection is in use (e.g., you aren’t using traditional backups), logs will not be truncated if a mailbox database copy is failed or if an entire server is unreachable unless an administrator intervenes. There are many factors to consider when sizing for required log capacity, and it is certainly worth spending some time in the Exchange 2010 TechNet guidance mentioned earlier to fully understand these factors before proceeding. Thinking about our example scenario, we could consider log space required per database if we estimate the number of users per database at 65. We will also assume that 1% of our users are moved per week in a single day, and that we will allocate enough space to support 3 days of logs in the case of failed copies or servers.
Log Capacity to Support 3 Days of Truncation Failure = (65 mailboxes/database x 40 logs/day x 1MB log size) x 3 days = 7.62GB Log Capacity to Support 1% mailbox moves per week = 65 mailboxes/database x 0.01 x 10.63GB mailbox size = 6.91GB Total Local Capacity Required per Database = 7.62GB + 6.91GB = 14.53GB
The easiest way to think about sizing for storage capacity without having a calculator tool available is to make some assumptions up front about the servers and storage that will be used. Within the product group, we are big fans of 2U commodity server platforms with ~12 large form-factor drive bays in the chassis. This allows for a 2 drive RAID array for the operating system, Exchange install path, transport queue database, and other ancillary files, and ~10 remaining drives to use as mailbox database storage in a JBOD direct attached storage configuration with no RAID. Fill this server up with 4TB SATA or midline SAS drives, and you have a fantastic Exchange 2013 server. If you need even more storage, it’s quite easy to add an additional shelf of drives to the solution.
Using the large deployment example and thinking about how we might size this on the commodity server platform, we can consider a server scaling unit that has a total of 24 large form-factor drive bays containing 4TB midline SAS drives. We will use 2 of those drives for the OS & Exchange, and the remaining drive bays will be used for Exchange mailbox database capacity. Let’s use 12 of those drive bays for databases – that leaves 10 remaining drive bays that could contain spares or remain empty. For this sizing exercise, let’s also plan for 4 databases per drive. Each of those drives has a formatted capacity of ~3725GB. The first step in figuring out the number of mailboxes per database is to look at overall capacity requirements for the mailboxes, content indexes, and required free space (which we will set to 5%).
To calculate the maximum amount of space available for mailboxes, let’s apply a formula (note that this doesn’t consider space for logs – we will make sure that the volume will have enough space for logs later in the process). First, we can remove our required free space from the available storage on the drive:
Available Space (excluding required free space) = Formatted capacity of the drive x (1 – free space)
Then we can remove the space required for content indexing. As discussed above, the space required for content indexing will be 20% of the database size, with an additional 20% of one database for content indexing maintenance tasks. Given the additional 20% requirement, we can’t model the overall space requirement as a simple 20% of the remaining space on the volume. Instead we need to compute a new percentage that takes the number of databases per-volume into consideration.
Now we can remove the space for content indexing from our available space on the volume:
And we can then divide by the number of databases per-volume to get our maximum database size:
In our example scenario, we would obtain the following result:
Given this value, we can then calculate our maximum users per database (from a capacity perspective, as this may change when we evaluate the IO requirements):
Let’s see if that number is actually reasonable given our 4 copy configuration. We are going to use 16-node DAGs for this deployment to take full advantage of the scalability and high-availability benefits of large DAGs. While we have many drives available on our selected hardware platform, we will be limited by the maximum of 50 database copies per-server in Exchange 2013. Considering this maximum and our desire to have 4 databases per volume, we can calculate the maximum number of drives for mailbox database usage as:
With 12 database volumes and 4 database copies per-volume, we will have 48 total database copies per server.
With 66 users per database and 100,000 total users, we end up with the following required DAG count for the user population:
In this very large deployment, we are using a DAG as a unit of scale or “building block” (e.g. we perform capacity planning based on the number of DAGs required to meet demand, and we deploy an entire DAG when we need additional capacity), so we don’t intend to deploy a partial DAG. If we round up to 8 DAGs we can compute our final users per database count:
With 65 users per-database, that means we will expect to consume the following space for mailbox databases:
Estimated Database Size = 65 users x 10.63GB = 690.95GB Database Consumption / Volume = 690.95GB x 4 databases = 2763.8GB
Using the formula mentioned earlier, we can compute our estimated content index consumption as well:
690.95GB database size x (4 databases + 1) x 0.20 = 690.95GB
You’ll recall that we computed transaction log space requirements earlier, and it turns out that we magically computed those values with the assumption that we would have 65 users per-database. What a pleasant coincidence! So we will need 14.53GB of space for transaction logs per-database, or to get a more useful result:
Log Space Required / Volume = 14.53GB x 4 databases = 58.12GB
To sum it up, we can estimate our total per-volume space utilization and make sure that we have plenty of room on our target 4TB drives:
Looks like our database volumes are sized perfectly!
To determine the IOPS requirements for a database, we look at the number of users hosted on the database and consider the guidance provided in the following table to compute total required IOPS when the database is active or passive.
For example, with 50 users in a database, with an average message profile of 200, we would expect that database to require 50 x 0.134 = 6.7 transactional IOPS when the database is active, and 50 x 0.134 = 6.7 transactional IOPS when the database is passive. Don’t forget to consider database placement which will impact the number of databases with IOPS requirements on a given storage volume (which could be a single JBOD drive or might be a more complex storage configuration).
Going back to our example scenario, we can evaluate the IOPS requirement of the solution, recalling that the average user profile in that deployment is the 200 message/day profile. We have 65 users per database and 4 databases per JBOD drive, so we can estimate our IOPS requirement in worst-case (all databases active) as:
65 mailboxes x 4 databases per-drive x 0.134 IOPS/mailbox at 200 messages/day profile = ~34.84 IOPS per drive
Midline SAS drives typically provide ~57.5 random IOPS (based on our own internal observations and benchmark tests), so we are well within design constraints when thinking about IOPS requirements.
While IOPS requirements are usually the primary storage throughput concern when designing an Exchange solution, it is possible to run up against bandwidth limitations with various types of storage subsystems. The IOPS sizing guidance above is looking specifically at transactional (somewhat random) IOPS and is ignoring the sequential IO portion of the workload. One place that sequential IO becomes a concern is with storage solutions that are running a large amount of sequential IO through a common channel. A common example of this type of load is the ongoing background database maintenance (BDM) which runs continuously on Exchange mailbox databases. While this BDM workload might not be significant for a few databases stored on a JBOD drive, it may become a concern if all of the mailbox database volumes are presented through a common iSCSI or Fibre Channel interface. In that case, the bandwidth of that common channel must be considered to ensure that the solution doesn’t bottleneck due to these IO patterns.
In Exchange 2013, we expect to consume approximately 1MB/sec/database copy for BDM which is a significant reduction from Exchange 2010. This helps to enable the ability to store multiple mailbox databases on the same JBOD drive spindle, and will also help to avoid bottlenecks on networked storage deployments such as iSCSI. This bandwidth utilization is in addition to bandwidth consumed by the transactional IO activity associated with user and system workload processes, as well as storage bandwidth consumed by the log replication and replay process in a DAG.
Since transport components (with the exception of the front-end transport component on the CAS role) are now part of the Mailbox role, we have included CPU and memory requirements for transport with the general Mailbox role requirements described later. Transport also has storage requirements associated with the queue database. These requirements, much like I described earlier for mailbox storage, consist of capacity factors and IO throughput factors.
Transport storage capacity is driven by two needs: queuing (including shadow queuing) and Safety Net (which is the replacement for transport dumpster in this release). You can think of the transport storage capacity requirement as the sum of message content on disk in a worst-case scenario, consisting of three elements:
Of course, all three of these factors are also impacted by shadow queuing in which a redundant copy of all messages is stored on another server. At this point, it would be a good idea to review the TechNet documentation on Transport High Availability if you aren’t familiar with the mechanics of shadow queuing and Safety Net.
In order to figure out the messages per day that you expect to run through the system, you can look at the user count and messaging profile. Simply multiplying these together will give you a total daily mail volume, but it will be a bit higher than necessary since it is double counting messages that are sent within the organization (i.e. a message sent to a coworker will count towards the profile of the sending user as well as the profile of the receiving user, but it’s really just one message traversing the system). The simplest way to deal with that would be to ignore this fact and oversize transport, which will provide additional capacity for unexpected peaks in message traffic. An alternative way to determine daily message flow would be to evaluate performance counters within your existing messaging system.
To determine the maximum size of the transport database, we can look at the entire system as a unit and then come up with a per-server value.
Overall Daily Messages Traffic = number of users x message profile Overall Transport DB Size = average message size x overall daily message traffic x (1 + (percentage of messages queued x maximum queue days) + Safety Net hold days) x 2 copies for high availability
Let’s use the 100,000 user sizing example again and size the transport database using the simple method.
Overall Transport DB Size = 75KB x (100,000 users x 200 messages/day) x (1 + (50% x 2 maximum queue days) + 2 Safety Net hold days) x 2 copies = 11,444GB
In our example scenario, we have 8 DAGs, each containing 16-nodes, and we are designing to handle double node failures in each DAG. This means that in a worst-case failure event we would have 112 servers online with 2 failed servers in each DAG. We can use this value to determine a per-server transport DB size:
Sizing for transport IO throughput requirements is actually quite simple. Transport has taken advantage of many of the IO reduction changes to the ESE database that have been made in recent Exchange releases. As a result, the number of IOPS required to support transport is significantly lower. In the internal deployment we used to produce this sizing guidance, we see approximately 1 DB write IO per message and virtually no DB read IO, with an average message size of ~75KB. We expect that as average message size increases, the amount of transport IO required to support delivery and queuing would increase. We do not currently have specific guidance on what that curve looks like, but it is an area of active investigation. In the meantime, our best practices guidance for the transport database is to leave it in the Exchange install path (likely on the OS drive) and ensure that the drive supporting that directory path is using a protected write cache disk controller, set to 100% write cache if the controller allows optimization of read/write cache settings. The write cache allows transport database log IO to become effectively “free” and allows transport to handle a much higher level of throughput.
Once we have our storage requirements figured out, we can move on to thinking about CPU. CPU sizing for the Mailbox role is done in terms of megacycles. A megacycle is a unit of processing work equal to one million CPU cycles. In very simplistic terms, you could think of a 1 MHz CPU performing a megacycle of work every second. Given the guidance provided below for megacycles required for active and passive users at peak, you can estimate the required processor configuration to meet the demands of an Exchange workload. Following are our recommendations on the estimated required megacycles for the various user profiles.
The second column represents the estimated megacycles required on the Mailbox role server hosting the active copy of a user’s mailbox database. In a DAG configuration, the required megacycles for the user on each server hosting passive copies of that database can be found in the fourth column. If the solution is going to include multi-role (Mailbox+CAS) servers, use the value in the third column rather than the second, as it includes the additional CPU requirements for the CAS role.
It is important to note that while many years ago you could make an assumption that a 500 MHz processor could perform roughly double the work per unit of time as a 250 MHz processor, clock speeds are no longer a reliable indicator of performance. The internal architecture of modern processors is different enough between manufacturers as well as within product lines of a single manufacturer that it requires an additional normalization step to determine the available processing power for a particular CPU. We recommend using the SPECint_rate2006 benchmark from the Standard Performance Evaluation Corporation.
The baseline system used to generate this guidance was a Hewlett-Packard DL380p Gen8 server containing Intel Xeon E5-2650 2 GHz processors. The baseline system SPECint_rate2006 score is 540, or 33.75 per-core, given that the benchmarked server was configured with a total of 16 physical processor cores. Please note that this is a different baseline system than what was used to generate our Exchange 2010 guidance, so any tools or calculators that make assumptions based on the 2010 baseline system would not provide accurate results for sizing an Exchange 2013 solution.
Using the same general methodology we have recommended in prior releases, you can determine the estimated available Exchange workload megacycles available on a different processor through the following process:
Using the example HP platform with E5-2630 processors mentioned previously, we would calculate the following result:
x 12 processors = 25,479 available megacycles per-server
Keep in mind that a good Exchange design should never plan to run servers at 100% of CPU capacity. In general, 80% CPU utilization in a failure scenario is a reasonable target for most customers. Given that caveat that the high CPU utilization occurs during a failure scenario, this means that servers in a highly available Exchange solution will often run with relatively low CPU utilization during normal operation. Additionally, there may be very good reasons to target a lower CPU utilization as maximum, particularly in cases where unanticipated spikes in load may result in acute capacity issues.
Going back to the example I used previously of 100,000 users with the 200 message/day profile, we can estimate the total required megacycles for the deployment. We know that there will be 4 database copies in the deployment, and that will help to calculate the passive megacycles required. We also know that this deployment will be using multi-role (Mailbox+CAS) servers. Given this information, we can calculate megacycle requirements as follows:
100,000 users ((11.69 mcycles per active mailbox) + (3 passive copies x 2.74 mcycles per passive mailbox)) = 1,991,000 total mcycles required
You could then take that number and attempt to come up with a required server count. I would argue that it’s actually a much better practice to come up with a server count based on high availability requirements (taking into account how many component failures your design can handle in order to meet business requirements) and then ensure that those servers can meet CPU requirements in a worst-case failure scenario. You will either meet CPU requirements without any additional changes (if your server count is bound on another aspect of the sizing process), or you will adjust the server count (scale out), or you will adjust the server specification (scale up).
Continuing with our hypothetical example, if we knew that the high availability requirements for the design of the 100,000 user example resulted in a maximum of 16 databases being active at any time out of 48 total database copies per server, and we know that there are 65 users per database, we can determine the per-server CPU requirements for the deployment.
(16 databases x 65 mailboxes x 11.69 mcycles per active mailbox) + (32 databases x 65 mailboxes x 2.74 mcycles per passive mailbox) = 12157.6 + 5699.2 = 17,856.8 mcycles per server
Using the processor configuration mentioned in the megacycle normalization section (E5-2630 2.3 GHz processors on an HP DL380p Gen8), we know that we have 25,479 available mcycles on the server, so we would estimate a peak average CPU in worst-case failure of:
17.857 / 25,479 = 70.1%
That is below our guidance of 80% maximum CPU utilization (in a worst-case failure scenario), so we would not consider the servers to be CPU bound in the design. In fact, we could consider adjusting the CPU selection to a cheaper option with reduced performance getting us closer to a peak average CPU in worst-case failure of 80%, reducing the cost of the overall solution.
To calculate memory per server, you will need to know the per-server user count (both active and passive users) as well as determine whether you will run the Mailbox role in isolation or deploy multi-role servers (Mailbox+CAS). Keep in mind that regardless of whether you deploy roles in isolation or deploy multi-role servers, the minimum amount of RAM on any Exchange 2013 server is 8GB.
Memory on the Mailbox role is used for many purposes. As in prior releases, a significant amount of memory is used for ESE database cache and plays a large part in the reduction of disk IO in Exchange 2013. The new content indexing technology in Exchange 2013 also uses a large amount of memory. The remaining large consumers of memory are the various Exchange services that provide either transactional services to end-users or handle background processing of data. While each of these individual services may not use a significant amount of memory, the combined footprint of all Exchange services can be quite large.
Following is our recommended amount of memory for the Mailbox role on a per mailbox basis that we expect to be used at peak.
To determine the amount of memory that should be provisioned on a server, take the number of active mailboxes per-server in a worst-case failure and multiply by the value associated with the expected user profile. From there, round up to a value that makes sense from a purchasing perspective (i.e. it may be cheaper to configure 128GB of RAM compared to a smaller amount of RAM depending on slot options and memory module costs).
Mailbox Memory per-server = (worst-case active database copies per-server x users per-database x memory per-active mailbox)
For example, on a server with 48 database copies (16 active in worst-case failure), 65 users per-database, expecting the 200 profile, we would recommend:
16 x 65 x 48MB = 48.75GB, round up to 64GB
It’s important to note that the content indexing technology included with Exchange 2013 uses a relatively large amount of memory to allow both indexing and query processing to occur very quickly. This memory usage scales with the number of items indexed, meaning that as the number of total items stored on a Mailbox role server increases (for both active and passive copies), memory requirements for the content indexing processes will increase as well. In general, the guidance on memory sizing presented here assumes approximately 15% of the memory on the system will be available for the content indexing processes which means that with a 75KB average message size, we can accommodate mailbox sizes of 3GB at 50 message profile up to 32GB at the 500 message profile without adjusting the memory sizing. If your deployment will have an extremely small average message size or an extremely large average mailbox size, you may need to add additional memory to accommodate the content indexing processes.
Multi-role server deployments will have an additional memory requirement beyond the amounts specified above. CAS memory is computed as a base memory requirement for the CAS components (2GB) plus additional memory that scales based on the expected workload. This overall CAS memory requirement on a multi-role server can be computed using the following formula:
Essentially this is 2GB of memory for the base requirement, plus 2GB of memory for each processor core (or fractional processor core) serving active load at peak in a worst-case failure scenario. Reusing the example scenario, if I have 16 active databases per-server in a worst-case failure and my processor is providing 2123 mcycles per-core, I would need:
If we add that to the memory requirement for the Mailbox role calculated above, our total memory requirement for the multi-role server would be:
48.75GB for Mailbox + 5.12GB for CAS = 53.87GB, round up to 64GB
Regardless of whether you are considering a multi-role or a split-role deployment, it is important to ensure that each server has a minimum amount of memory for efficient use of the database cache. There are some scenarios that will produce a relatively small memory requirement from the memory calculations described above. We recommend comparing the per-server memory requirement you have calculated with the following table to ensure you meet the minimum database cache requirements. The guidance is based on total database copies per-server (both active and passive). If the value shown in this table is higher than your calculated per-server memory requirement, adjust your per-server memory requirement to meet the minimum listed in the table.
In our example scenario, we are deploying 48 database copies per-server, so the minimum physical memory to provide necessary database cache would be 16GB. Since our computed memory requirement based on per-user guidance including memory for the CAS role (53.87GB) was higher than the minimum of 16GB, we don’t need to make any further adjustments to accommodate database cache needs.
With the new architecture of Exchange, Unified Messaging is now installed and ready to be used on every Mailbox and CAS. The CPU and memory guidance provided here assumes some moderate UM utilization. In a deployment with significant UM utilization with very high call concurrency, additional sizing may need to be performed to provide the best possible user experience. As in Exchange 2010, we recommend using a 100 concurrent call per-server limit as the maximum possible UM concurrency, and scale out the deployment if the sizing of your deployment becomes bound on this limit. Additionally, voicemail transcription is a very CPU-intensive operation, and by design will only transcribe messages when there is enough available CPU on the machine. Each voicemail message requires 1 CPU core for the duration of the transcription operation, and if that amount of CPU cannot be obtained, transcription will be skipped. In deployments that anticipate a high amount of voicemail transcription concurrency, server configurations may need to be adjusted to increase CPU resources, or the number of users per server may need to be scaled back to allow for more available CPU for voicemail transcription operations.
In the case where you are going to place the Mailbox and CAS roles on separate servers, the process of sizing CAS is relatively straightforward. CAS sizing is primarily focused on CPU and memory requirements. There is some disk IO for logging purposes, but it is not significant enough to warrant specific sizing guidance.
CAS CPU is sized as a ratio from Mailbox role CPU. Specifically, we need to get 37.5% of the megacycles used to support active users on the Mailbox role. You could think of this as a 3:8 ratio (CAS CPU to active Mailbox CPU) compared to the 3:4 ratio we recommended in Exchange 2010. One way to compute this would be to look at the total active user megacycles required for the solution, take 37.5% of that, and then determine the required CAS server count based on high availability requirements and multi-site design constraints. For example, consider the 100,000 user example using the 200 message/day profile:
Total CAS Required Mcycles = 100,000 users x 8.5 mcycles x 0.375 = 318,750 mcycles
Assuming that we want to target a maximum CPU utilization of 80% and the servers we plan to deploy have 25,479 available megacycles, we can compute the required number of servers quite easily:
Obviously we would need to then consider whether the 16 required servers meet our high availability requirements considering the maximum CAS server failures that we must design for given business requirements, as well as the site configuration where some of the CAS servers may be in different sites handling different portions of the workload. Since we specified in our example scenario that we want to survive a double failure in the single site, we would increase our 16 CAS to 18 such that we could sustain 2 CAS failures and still handle the workload.
To size memory, we will use the same formula that was used for Exchange 2010:
Per-Server CAS Memory = 2GB + 2GB per physical processor core
Using the example scenario we have been using, we can calculate the per-server CAS memory requirement as:
In this example, 20.77GB would be the guidance for required CAS memory, but obviously you would need to round-up to the next highest possible (or highest performing) memory configuration for the server platform: perhaps 24GB.
Active Directory sizing remains the same as it was for Exchange 2010. As we gain more experience with production deployments we may adjust this in the future. For Exchange 2013, we recommend deploying a ratio of 1 Active Directory global catalog processor core for every 8 Mailbox role processor cores handling active load, assuming 64-bit global catalog servers:
If we revisit our example scenario, we can easily calculate the required number of GC cores required.
Assuming that my Active Directory GCs are also deployed on the same server hardware configuration as my CAS & Mailbox role servers in the example scenario with 12 processor cores, then my GC server count would be:
In order to sustain double failures, we would need to add 2 more GCs to this calculation, which would take us to 7 GC servers for the deployment.
As a best practice, we recommend sizing memory on the global catalog servers such that the entire NTDS.DIT database file can be contained in RAM. This will provide optimal query performance and a much better end-user experience for Exchange workloads.
Turn it off. While modern implementations of simultaneous multithreading (SMT), also known as hyperthreading, can absolutely improve CPU throughput for most applications, the benefits to Exchange 2013 do not outweigh the negative impacts. It turns out that there can be a significant impact to memory utilization on Exchange servers when hyperthreading is enabled due to the way the .NET server garbage collector allocates heaps. The server garbage collector looks at the total number of logical processors when an application starts up and allocates a heap per logical processor. This means that the memory usage at startup for one of our services using the server garbage collector will be close to double with hyperthreading turned on vs. when it is turned off. This significant increase in memory, along with an analysis of the actual CPU throughput increase for Exchange 2013 workloads in internal lab tests has led us to a best practice recommendation that hyperthreading should be disabled for all Exchange 2013 servers. The benefits don’t outweigh the negative impact.
There’s an important caveat to this recommendation for customers who are virtualizing Exchange. Since the number of logical processors visible to a virtual machine is determined by the number of virtual CPUs allocated in the virtual machine configuration, hyperthreading will not have the same impact on memory utilization described above. It’s certainly acceptable to enable hyperthreading on physical hardware that is hosting Exchange virtual machines, but make sure that any capacity planning calculations for that hardware are based purely on physical CPUs. Follow the best practice recommendations of your hypervisor vendor on whether or not to enable hyperthreading. Note that the extra logical CPUs that are added when hyperthreading is enabled must not be considered when allocating virtual machine resources during the sizing and deployment process. For example, on a physical host running Hyper-V with 40 physical processor cores and hyperthreading enabled, 80 logical processor cores will be visible to the root operating system. If your Exchange design required 16-core servers, you could place 2 Exchange VMs on the physical host as those 2 VMs would consume 32 physical processor cores without enough physical processor cores to host another 16-core VM (32+16 = 48, which is greater than 40).
Now that you have digested all of this guidance, you are probably thinking about how much more of a pain it will be to size a deployment compared to using the Mailbox Role Requirements Calculator for Exchange 2010. UPDATE: You can now read about and download the calculator from here.
Hopefully that leaves you with enough information to begin to properly size your Exchange 2013 deployments. If you have further questions, you can obviously post comments here, but I’d also encourage you to consider attending one of the upcoming TechEd events. I’ll be at TechEd North America as well as TechEd Europe with a session specifically on this topic, and would be happy to answer your questions in person, either in the session or at the “Ask the Experts” event. Recordings of those sessions will also be posted to MSDN Channel9 after the events have concluded.
Jeff Mealiffe Principal Program Manager Lead Exchange Customer Experience
Update 6/5/2013: We have updated the blog post to add the link to the first TechNet document on public folder Hybrid scenarios.
“You mean… this is really happening?”
Last November we gave you a teaser about public folders in the new Exchange. We explained how public folders were given a lot of attention to bring their architecture up-to-date, and as a result of this work they would take advantage of the other excellent engineering work put into Exchange mailbox databases over the years. Many of you have given the new public folders a try in Exchange Online and Exchange Server 2013 in your on-premises environments. At this time we would like to give you a bit more detail surrounding the Exchange Online public folder feature set so you can start planning what makes sense for your environment. So, yes, we really meant our beloved public folders were coming to Exchange Online!
We are still putting the finishing touches on some of our migration documentation for on-premises Exchange Server environments to Exchange Online. We know there is a lot of interest in this documentation and we are making sure it is as easy to follow as possible. We will update this article with links to the content when more documentation becomes available on TechNet. The following two articles are available now.
Before we cover the migration process at a high level (and very deeply in those TechNet articles!), we want to be very clear everyone understands the following few important points.
Public Folder migrations to Exchange Online should not be performed unless all of your users are located in Exchange Online, and/or all of your on-premises users are on Exchange Server 2013.
Public folder migrations are a cutover migration. You cannot have some public folders on-premises and some public folders in Exchange Online. There will be a small window of public folder access downtime required when the migration is completed and all public folder connections are moved from on-premises to Exchange Online.
Public folder migrations are entirely PowerShell based at this time. Once the migration has completed you can then perform your public folder management in the tool of your choice, EAC or PowerShell.
In the TechNet content we walk you through exactly how to use PowerShell and some scripts provided by the product group to help automate the analysis and content location mapping in Exchange 2013 or Exchange Online. The migration process is similar whether you are doing an on-premises to on-premises migration, or an on-premises to Exchange Online migration with the latter having a couple more twists. Both scenarios will include a few major steps you will go through to migrate your legacy public folder infrastructure. Again, the following section is meant to be an overview and not a complete rendering of what the more detailed step-by-step TechNet documentation contains. Consider this section an appetizer to get you thinking about your migration and what potential caveats may or may not affect you. The information below is tailored more to an Exchange Online migration, but our on-premises customers will also be facing many of the same steps and considerations.
(Size limits pertain to Exchange Online)
Now that we have given you an idea of what the migration process will be let us talk about the feature itself. Starting with the new Office 365, customers of Exchange Online will be able to store, free of charge, approximately 1.25 terabytes of public folder data in the cloud. Yes, you read the right… over a terabyte. The way this works is your tenant will be allowed to create up to fifty (50) public folder mailboxes, each yielding a 25 GB quota. However, when operating in a hybrid environment, public folders can exist only on-premises or in Exchange Online.
Once you complete the migration process of public folders to Exchange Online, the on-premises public folder infrastructure will have its hierarchy locked to prevent user connections and its content frozen at that point in time. By locking the on-premises content we provide you with a way to rollback a migration from Exchange Online, if you deem it necessary. However, as mentioned before, a rollback can result in data loss as no changes made while using the Exchange Online public folder infrastructure are copied back on-premises.
We will support on-premises Exchange Server 2013 users accessing Exchange Online public folders. We will also support Exchange Online users accessing on-premises public folders if you choose to keep your public folder infrastructure local. The below table depicts what users can access what public folder infrastructures. Please note for a hybrid deployment on-premises users must be on Exchange 2013 if you wish for them to access Exchange Online public folders. Also it bears worth repeating that public folders can only exist in one location, on-premises or in Exchange online. You cannot have two different public folder infrastructures being utilized at once.
Yes
No
When your public folder content migration is complete or you create public folders for the very first time, you will not have to worry about managing many aspects of public folders in Exchange Online. As you previously read, public folders in Exchange Server 2013 and Exchange Online are now stored within a new mailbox type in the mailbox database. Our on-premises customers will have to create public folder mailboxes, monitor their usage, create new public folder mailboxes when necessary, and split content to different public folder mailboxes as their content grows over time. In Exchange Online we will automatically perform the public folder mailbox management so you may focus your time managing the actual public folders and their content. If we were to peek behind the Exchange Online curtain, we would see two automated processes running at all times to make everything happen:
Let’s go through each one of them, shall we?
This process actively monitors your public folder mailbox quota usage. This process’ goal ensures you do not inadvertently fill a public folder mailbox and stop it from being able to accept new content for any public folder within it.
When a public folder mailbox reaches the Issue Warning Quota value of 24.5 GB, this process is automatically triggered to redistribute where your public folders currently reside. This may result in Exchange Online simply moving some public folders from the nearly-filled public folder mailbox to another pre-existing public folder mailbox holding less content. However, if there are no public folder mailboxes with enough free space to move public folders into, Exchange Online will automatically create a new public folder mailbox and move some of your public folders into the newly created public folder mailbox. The end result will be all public folder mailboxes being below the Issue Warning Quota.
Public folder moves from one public folder mailbox to another are an online move process similar to normal mailbox moves. Due to the move process being an online experience your users may experience a slight disruption in accessing one or more public folders during the completion phase of the online move process. Any mail destined for mail enabled public folders being moved would be temporarily queued and then delivered once the move request completes.
In case the curious amongst you are wondering, we do not currently prevent customers from lowering the public folder mailbox quota values even though there is no reason you should do that. However, you are prevented from configuring quotas values larger than 25 GB.
Let us take a moment to visualize this process as a picture is worth a thousand words. In the first scenario below a customer currently has to two public folder mailboxes, PFMBX-001 and PFMBX-002. PFMBX-001 contains three public folders while PFMBX-002 contains only one public folder. PFMBX-001 has gone over the IssueWarningQuota value of 24.5 GB and currently contains 24.6 GB of content. When the automatic split process runs in this environment it sees there is plenty of space available in PFMBX-002, and moves a public folder from PFMBX-001 into PFMBX-002. In this example, the final result is two public folder mailboxes with a similar amount of data in each of them. Depending on the size of your folders this process may move a single large public folder, or numerous mall public folders. The example shows a single folder being moved.
Scenario 1: Auto split process shuffles public folders from one public folder mailbox to another one.
In a second scenario below, a customer has a single public folder mailbox, PFMBX-001 containing three public folders. PFMBX-001 has gone over the IssueWarningQuota value of 24.5 GB and contains 24.6 GB of content. When the split process runs in this environment it sees there are no other public folder mailboxes available to move public folders into. As a result, the process creates a new empty public folder mailbox, PFMBX-002, and moves some public folders into the new public folder mailbox; the final result is two public folder mailboxes with a similar amount of data in each of them. Again in this example we are showing a single public folder being moved, but the process may determine it has to move many smaller public folders.
Scenario 2: Auto split process must create a new empty public folder mailbox before moving a public folder.
One noteworthy limit in Exchange Online which should be mentioned is no single public folder in Exchange Online can be over 25 GB in size due to the underlying public folder mailbox having a 25 GB quota. To give you an idea how much data that is; 25 GB of data is similar to 350,000 items of 75 KB each, or 525,000 items of 50 KB each. In most cases this volume of data can easily be split amongst multiple public folders to avoid a single folder coming anywhere near the 25 GB limit of a single public folder.
Our migration documentation will also suggest if you currently have a single public folder over 15 GB that you try to reduce that public folder’s size to under 15 GB prior to the migration by deleting old content or splitting it into multiple smaller public folders. When we say a single public folder over 15 GB we mean exactly that and it excludes any child folders. Any child folder of a parent folder is not considered part of the 15 GB content limit suggestion for these purposes because the child public folder may reside in a different public folder mailbox if necessary. The reason for this suggestion is two-fold. First, it helps prevent you from triggering the automated split-process as soon as your migration takes place if you were to migrate very large public folders form on-premises. Second, content moved from Exchange 2007/2010 to Exchange Online may result in the reported space utilized by a single public folder increasing by 30%. The increase is due to a more accurate method used by Exchange Server 2013 to calculate space used within a mailbox database compared to earlier versions of Exchange Server. If you were to migrate a single massive public folder residing in on-premises Exchange Server 2007/2010 to Exchange Online this space recalculation may push the single public folder over the 25 GB quota. We want to help you avoid this situation as this would only be noticed once you were well into the data copy portion of the migration, and would cause you lost time having to redo the process all over again.
If you have a particular business requirement which does not allow you to reduce the size of this single massive public folder in one of the ways previously suggested, then we will recommend you retain your entire public folder infrastructure on-premises instead of moving it to Exchange Online as we cannot increase the public folder mailbox quota beyond 25 GB.
The second automated process helps maintain the most optimal user experience accessing public folders in Exchange Online. Exchange Online will actively monitor how many hierarchy connections are being spread across all of your public folder mailboxes. If this value goes over a pre-determined number we will automatically create a new public folder mailbox. Creating the additional public folder mailbox will reduce the number of hierarchy connections accessing each public folder mailbox by scaling the user connections out across a larger number of public folder mailboxes. If you are a customer whom has a small amount of public folder content in Exchange Online, yet you have an extremely large number of active users, then you may see the system create additional public folder mailboxes regardless of your content size.
Ready for another example? In this example we will use low values for explanatory purposes. Let us pretend in Exchange Online we did not want more than two hundred active hierarchy connections per public folder mailbox. The diagram below shows nine hundred users making nine hundred active hierarchy connections across four public folder mailboxes. This scenario will work out to approximately 225 active hierarchy connections per public folder mailbox as the Client Access Servers spread the hierarchy connections across all available public folder mailboxes in the customer’s environment. When Exchange Online monitoring determines the desired number of two hundred active hierarchy connections per public folder mailbox has been exceeded, PFMBX-005 is automatically created. Immediately after creating PFMBX-005, Exchange Online will force a hierarchy sync to PFMBX-005 ensuring it has the most up to date information available regarding public folder structure and permissions before allowing it to accept client hierarchy connections. The end result in this example is we now have five public folder mailboxes accepting nine hundred active hierarchy connections for an average of 180 connections per public folder mailbox, thus assuring all active users have the best interactive experience possible.
Scenario 3: Auto split process creates a new public folder mailbox to scale out active hierarchy connections.
Once you begin utilizing the Exchange Online public folder infrastructure we are confident this built-in automation will help our customers focus on doing what they do best, which is running their business. Let us take care of the infrastructure for you so you have more time to spend on your other projects.
In summary we are extremely excited to deliver public folders in the new Exchange Online to you, our customers. We believe you will find the migration process from on-premises to Exchange Online fairly straightforward and our backend automation will alleviate you from having to manage many aspects of the feature. We really hope you enjoy using the public folders with Exchange Online as much as we enjoyed creating them for you.
Special thanks to the entire Public Folder Feature Crew, Nino Bilic, Tim Heeney, Ross Smith IV and Andrea Fowler for contributing to and validating this data.
Microsoft Remote Connectivity Analyzer is a web-based tool that provides administrators and end users with the ability to run connectivity diagnostics for our servers to test common issues with Microsoft Exchange, Lync and Office 365. The tool started as Microsoft Exchange Server Remote Connectivity Analyzer, and based on your feedback we've continued to add functionality to test connectivity with Lync and Office 365, and made other enhancements such as tests for Outlook Anywhere, Exchange Web Services, outbound SMTP, Office 365 Single Sign-On test, support for 10 additional languages and an improved captcha experience.
We're excited to announce Message Analyzer, a brand new addition to the Remote Connectivity Analyzer. Message Analyzer makes reading email headers less painful.
Figure 1: The new Message Analyzer tab in RCA
SMTP message headers contain a wealth of information which allows you to determine the origins of a message and how it made its way through one or more SMTP servers to its destination. To use Message Analyzer, all you need to do is copy message headers from a message and paste them in the Message Analyzer tab on the RCA web site.
Figure 2: Paste message headers in the Message Analyzer
Trying to locate message headers in Outlook 2010 and later? See Hey Outlook 2010, where are my message headers?
Here's a quick look at what you can do with Message Analyzer.
View the most important properties and total delivery time at a quick glance.
Figure 3: View the most important header properties and delivery time
Analyze the received headers and displays the longest delays quickly for easy discovery of sources of message transfer delays.
Figure 4: Quickly detect where the longest message transfer delays occurred
Sort all headers by header name or value.
Figure 5: Sort message headers
Quickly collapse the sections that you don’t need.
All processing is done in your browser, and no private information is shared with Microsoft.
Useful for any header, whether generated by Exchange, Office 365, or any other RFC standard SMTP server or agent.
Note, we consider this feature to be in beta for the moment. Please send us feedback and we’ll continue to make improvements.
Check out this update to the RCA at testconnectivity.microsoft.com (short URL: aka.ms/rca).
Stephen Griffin & Scott Landry On behalf of the entire MCA/RCA team Follow the team on Twitter - @ExRCA
Frequently in support, we encounter several backup related calls for Exchange 2010 databases. A sample of common issues we hear from our customers are:
It is critical to understand how backups and log truncation work in Exchange 2010. If you haven't already done so, check out our three-part blog series by Jesse Tedoff on backups and log truncation in Exchange 2010, Everything You Need to Know About Exchange Backups*.
When troubleshooting backups in Exchange 2010 we are interested in two writers – the Exchange Information Store Writer (utilized for active copy backups) and the Exchange Replica Writer (utilized for passive copy backups). The writers are responsible for providing the metadata information for databases to the VSS Requestor (aka the backup software). The VSS Provider is the component that creates and maintains shadow copies. At the end of successful backups, when the Volume Shadow Copy Service signals backup is complete, the writers initiate post-backup steps which include updating the database header and performing log truncation. (For more details, see Exchange VSS Writers on MSDN.)
As explained above, it is the responsibility of the VSS Requestor to get metadata information from Exchange writers and at the end of successful backup, VSS service signals backup complete to the Exchange writers so the writers can perform post-backup operations.
The purpose of this blog is to discuss the VSSTester script, its functionality and how it can help diagnose backup problems.
The script has two major functions:
The script can be run on any DAG configuration. You can use this to troubleshoot Mailbox and Public folder database backup issues. Databases and log files can be on regular drives or mount points. Mix and match of the two will also work!
Let us discuss in detail the two main functionalities of the script.
What is Diskshadow and why do we utilize it in VSSTester script?
Diskshadow.exe is a command line tool built in to Windows Server 2008 operating system family as well as Windows Server 2012. Diskshadow is an in-box VSS requestor. It is utilized to test the functionality provided by the Volume Shadow Copy Service (VSS). For more details on Diskshadow please visit:
http://technet.microsoft.com/en-us/library/ee221016(v=ws.10).aspx
http://blogs.technet.com/b/josebda/archive/2007/11/30/diskshadow-the-new-in-box-vss-requester-in-windows-server-2008.aspx
The best part about Diskshadow is that it includes a script mode for automating tasks. This feature of Diskshadow is utilized in the VSSTester. The shadow copy done by Diskshadow is a snapshot of the entire volume at a given point in time. This copy is read-only.
More details on how a shadow copy is created, please visit the following link: http://technet.microsoft.com/en-us/library/ee923636(v=ws.10).aspx
During the course of the blog post, I will be mentioning the term “Diskshadow backup”. It is very important to understand that the term “backup” is relative here. Diskshadow uses the VSS service and gets the appropriate writer to be utilized for the snapshot. The writer will provide the metadata information of database /log files to the Diskshadow. After which Diskshadow utilizes the VSS Provider to create a shadow copy.
After a successful shadow copy /snapshot of databases and log files, the VSS Provider signals an end-backup to Exchange writers. To Exchange this looks like a full backup has been performed on the database. The key to understand here is NO data is actually transferred to a device, tape etc. This is only a test! You will see events in the application logs that usually show up when you take a regular backup, but NO data is actually backed up. Diskshadow has simply run all the backup APIs through the backup process without transferring any data.
The VSS Provider will take a snapshot of all the databases and logs (if present) on the volume. We will be doing a mirrored snapshot of the entire volume at the point in time when Diskshadow was run. Anything that is on the volume will be part of the snapshot. During the Diskshadow backup, we will be utilizing either the Information store writer (for active copy backup) or the Replica Writer (Passive copy backup) to provide the metadata information for the database.
When you use the VSSTester script, it prompts you for a database to be selected to perform the Diskshadow backup. When we take a snapshot of the volume all other databases (if present on the same drive) will be part of the snapshot, but post-backup operations will happen only on the selected database. This is because we will be utilizing either the Information store Writer (Active Copy Backup) or the Replica Writer (Passive copy backup) that is associated with the selected database. DB headers get updated based on VSS Requestor interaction with the Exchange writer that was utilized, which in turn leads to log truncation. Hence the header of the selected database will be only updated and logs will be purged (only for that the selected database) without being backed up.
You would be interested to utilize this functionality in almost all scenarios that I discussed at the start of this blog post. In addition to those scenarios another one that is not related to backups sometimes arises:
In the scenario mentioned above (and, by the way, if you have that problem, please go here), Exchange administrators would like to avoid causing a service outage by dismounting the database, removing log files and remounting the database. Another downside to manually removing the log files is breaking replication if the database has replicas across Database Availability Group members.
If you are willing to forgo a backup of the log files you can use the Diskshadow functionality of the script to trigger the backup APIs and tell Exchange to truncate the log files. The truncation commands will replicate to the other database copies and purge log files there as well. If successful, the net result is that the database will not go offline for lack of disk space on the log drive, but you will not have the security of retaining those log files for a future restore.
Let me demonstrate the Diskshadow functionality of the script.
The Script can be downloaded from TechNet gallery here.
The script initializes and gives us the following options.
We select the option 1 to test backup using the built-in Diskshadow function.
If the path does not exist, the script will create the folder for you.
We gather the server name and verify it is an Exchange 2010 server. The script will check for the VSS writer status on the local machine. If we detect, any of the writers are not in a “Stable” state, the script will exit. You will need to restart the service associated with the writer to get the writers to a stable state (The Replication service for the Replica Writer or the Information Store service for the Exchange Writer).
The script then gets a list of databases present on the local server and displays the database name, if database is mounted or not and what is the server that holds the active copy of the database. You will have to select the number of the database.
Note: If the user does not provide an input, the script will automatically select the last database in the list.
In my case, I selected database mdb5. The number to enter would be 8.
The next important check is ensuring that the database’s replicas (if present) are healthy. If we detect that one of the copies is not healthy, the script will exit mentioning that the database copies need to be in healthy status before running the script.
The script next detects the location of the database file and log files. We create the Diskshadow configuration file on the fly every time a database is selected. This configuration file is also saved to the location you had specified earlier (in the example screenshots of this blog c:\vsstesterlogs) to save the configuration and output files. In this case the log files are in a mount point and the database file is on a regular volume. The script will add the appropriate volumes to the disk shadow file.
The script will then prompt you to provide the drive letters to expose the snapshots. A common question that arises is, do I need to initialize the drive before I specify a drive letter? The answer is no!
You will be specifying a drive letter that is currently not in use, so Diskshadow will create a virtual drive and expose the snapshot. Remember, the virtual drive that exposes the shadow copy is a read-only volume. The shadow copy is a read only copy .If the database and logs are in the same mount point / drive only, one drive letter is required to expose the snapshot, otherwise you will need to provide two different drive letters. One for exposing database snapshot and another for log files.
When you select the option to perform the Diskshadow backup, the script will automatically collect Diagnostic logs, ExTRA traces and VSS traces. Also verbose logging is turned on for Diskshadow. Whatever activity the script does is also logged in to transcript log and saved in the output files directory (c:\vsstesterlogs in this example).
Note: If you are performing a passive copy backup, ExTRA tracing will also be turned on in the active node. At the end of the script, we turn off ExTRA tracing in the active node and it will be automatically moved to the passive node. The active node ETL will be placed in the logs folder you had specified at the start of the script. .
Now, the main Diskshadow function will execute.
In the screenshots below we have excluded all other writers on the system that are associated with all other databases on the node (that are mounted or be replicas) and we are ONLY utilizing the writer associated with the selected database. This node hosts the passive copy of the database MDB5. Hence, the writer utilized will be associated with the Replication service aka the Microsoft Exchange Replica Writer.
(please click on above two screenshots to see them)
From the screen shot below, you can see that VSS Provider has taken a successful snapshot of the database and signaled end backup to the replica writer.
Now that we performed a successful snapshot of the database and log files, all the logging that was turned on will be turned off. The log files will be consolidated in the logs folder that you specified earlier at the start of the script. The script checks the VSS writer status after the backup is complete.
When the snapshot operation is complete, you will be prompted for an option to either remove the snapshot or leave the snapshots exposed in Windows Explorer.
(click to view)
I selected the option to remove the snapshot; hence we will be invoking Diskshadow again to delete the snapshot created earlier.
Let us discuss in detail exposing and removing snapshot functionality:
Note: It is highly recommended to take a full backup of the database using your regular backup software after utilizing Diskshadow.
After this, the script collects the application and system logs. The script filters them to cover only the period you started the script to the present. The transcript log is also stopped. The logs will be saved as a text file and saved in the output folder you had specified earlier (c:\vsstesterlogs in this example).
The most reliable method to verify log truncation takes place is to get the log sequence before and after the backup. Hence, before running the script I ran eseutil/ml ENN (the log generation prefix associated with database).
Post-backup, when I ran the same command, and can see:
We can clearly see a difference in the start of the sequence, meaning log truncation has occurred for the database. One more verification that can be done is to check the database header. We can see that the database header got updated to the most recent time, where Diskshadow was run.
If the script finished successfully:
Let us now look in to the other major functionality of the script.
Use this if you do not want to test backup using Diskshadow and you just want to collect diagnostic logs for troubleshooting backup issues.
You may collect the diagnostic logs and have them handy before calling Microsoft Support saving a lot of time in the support incident because you can provide the files at the beginning of the case.
This time we will be selecting option 2 to enable logging.
Selecting this option does the majority of the things that the script did earlier, EXCEPT Diskshadow of course!
After checking the writer status, you can select the database to backup. We will be enabling all the logging like before (Diagnostic Logging, ExTRA, VSS tracing). Remember that, even though you would still be selecting one database - diagnostic logging, ExTRA tracing, VSS tracing are not database specific and are turned on at the server level. When you are utilizing the script to troubleshoot backup issues you can select any one database on the server and it will turn on appropriate logging on the server.
After the logging is turned on and traces enabled, you will see:
Now you will need to start your regular backup. After the backup completes/fails, you will need to come back to the PowerShell window where you are running the script and use the “ENTER” key to terminate the data collection. The script then disables diagnostic logging and tracing that was turned up earlier. If needed it will copy diagnostic logs from the active node for that database copy as well.
The script will again check for writer status after the backup then collect the application and system logs. It will stop the transcript log as well.
At this point, in order to troubleshoot the issue, you can open a case with Microsoft Support and upload the logs.
I hope this script helps you in better understanding the core concepts in Exchange 2010 backups, thus helping you troubleshoot backup issues! You can utilize Diskshadow to test Volume Shadow Copy Service and also check if the Exchange writers are performing as intended. If Diskshadow completes successfully without any error and you are still experiencing issues with backup software, you may need to contact the backup vendor to further troubleshoot the issue.
Your feedback and comments are most welcome.
Special thanks to Michael Barta for his contribution to the script, Theo Browning and Jesse Tedoff for reviewing the content.
Muralidharan Natarajan
These days, some customers are deploying Exchange databases and log files on advanced format (4K) drives. Although these drives support a physical sector size of 4096, many vendors are emulating 512 byte sectors in order to maintain backwards compatibility with application and operating systems. This is known as 512 byte emulation (512e). Windows 2008 and Windows 2008 R2 support native 512 byte and 512 byte emulated advanced format drives. Windows 2012 supports drives of all sector sizes. The sector size presented to applications and the operating system, and how applications respond, directly affects data integrity and performance.
For more information on sector sizes see the following links:
When deploying an Exchange 2010 Database Availability Group (DAG), the sector sizes of the volumes hosting the databases and log files must be the same across all nodes within the DAG. This requirement is outlined in Understanding Storage Configuration.
Support requires that all copies of a database reside on the same physical disk type. For example, it is not a supported configuration to host one copy of a given database on a 512-byte sector disk and another copy of that same database on a 512e disk. Also be aware that 4-kilobyte (KB) sector disks are not supported for any version of Microsoft Exchange and 512e disks are not supported for any version of Exchange prior to Exchange Server 2010 SP1.
Recently, we have noted that some customers have experienced issues with log file replication and replay as the result of sector size mismatch. These issues occur when:
This mismatch can cause one or more database copies in a DAG to fail, as illustrated below. In my example environment, I have a three-member DAG with a single database that resides on a volume labeled Z that is replicated between each member.
[PS] C:\>Get-MailboxDatabaseCopyStatus *
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Mounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 1 3/19/2013 10:27:50 AM Healthy SectorTest\MBX-3 Healthy 0 1 3/19/2013 10:27:50 AM Healthy
If I use FSUTIL to query the Z volume on each DAG member, we can see that the volume currently has 512 logical bytes per sector and a 512 physical bytes per sector. Thus, the the volume is currently seen by the operating system as having a native 512 byte sector size.
On MBX-1:
C:\>fsutil fsinfo ntfsinfo z:
NTFS Volume Serial Number : 0x18d0bc1dd0bbfed6 Version : 3.1 Number Sectors : 0x000000000fdfe7ff Total Clusters : 0x0000000001fbfcff Free Clusters : 0x0000000001fb842c Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Physical Sector : 512 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x0000000000040000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x0000000000000002 Mft Zone Start : 0x00000000000c0040 Mft Zone End : 0x00000000000cc840 RM Identifier: EF486117-9094-11E2-BF55-00155D006BA1
On MBX-3:
NTFS Volume Serial Number : 0x0ad44aafd44a9d37 Version : 3.1 Number Sectors : 0x000000000fdfe7ff Total Clusters : 0x0000000001fbfcff Free Clusters : 0x0000000001fad281 Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Physical Sector : 512 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x0000000000040000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x0000000000000002 Mft Zone Start : 0x00000000000c0000 Mft Zone End : 0x00000000000cc820 RM Identifier: B9B00E32-90B2-11E2-94E9-00155D006BA3
But what happens if there is a change in the way storage is seen on MBX-3, so that the volume now reflects a 512e sector size. This can happen when upgrading storage drivers, upgrading firmware, or presenting new storage that implements advanced format storage.
NTFS Volume Serial Number : 0x0ad44aafd44a9d37 Version : 3.1 Number Sectors : 0x000000000fdfe7ff Total Clusters : 0x0000000001fbfcff Free Clusters : 0x0000000001fad2e7 Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Physical Sector : 4096 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x0000000000040000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x0000000000000002 Mft Zone Start : 0x00000000000c0040 Mft Zone End : 0x00000000000cc840 RM Identifier: B9B00E32-90B2-11E2-94E9-00155D006BA3
When reviewing the database copy status, notice that the copy assigned to MBX-3 has failed.
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Mounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 0 3/19/2013 11:13:05 AM Healthy SectorTest\MBX-3 Failed 0 8 3/19/2013 11:13:05 AM Healthy
The full details of the copy status of MBX-3 can be reviewed to display the detailed error:
[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\MBX-3 | fl
RunspaceId : 5f4bb58b-39fb-4e3e-b001-f8445890f80a Identity : SectorTest\MBX-3 Name : SectorTest\MBX-3 DatabaseName : SectorTest Status : Failed MailboxServer : MBX-3 ActiveDatabaseCopy : mbx-1 ActivationSuspended : False ActionInitiator : Service ErrorMessage : The log copier was unable to continue processing for database 'SectorTest\MBX-3' because an error occurred on the target server: Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after a short delay. ErrorEventId : 2152 ExtendedErrorInfo : SuspendComment : SinglePageRestore : 0 ContentIndexState : Healthy ContentIndexErrorMessage : CopyQueueLength : 0 ReplayQueueLength : 7 LatestAvailableLogTime : 3/19/2013 11:13:05 AM LastCopyNotificationedLogTime : 3/19/2013 11:13:05 AM LastCopiedLogTime : 3/19/2013 11:13:05 AM LastInspectedLogTime : 3/19/2013 11:13:05 AM LastReplayedLogTime : 3/19/2013 10:24:24 AM LastLogGenerated : 53 LastLogCopyNotified : 53 LastLogCopied : 53 LastLogInspected : 53 LastLogReplayed : 46 LogsReplayedSinceInstanceStart : 0 LogsCopiedSinceInstanceStart : 0 LatestFullBackupTime : LatestIncrementalBackupTime : LatestDifferentialBackupTime : LatestCopyBackupTime : SnapshotBackup : SnapshotLatestFullBackup : SnapshotLatestIncrementalBackup : SnapshotLatestDifferentialBackup : SnapshotLatestCopyBackup : LogReplayQueueIncreasing : False LogCopyQueueIncreasing : False OutstandingDumpsterRequests : {} OutgoingConnections : IncomingLogCopyingNetwork : SeedingNetwork : ActiveCopy : False
Using the Exchange Server Error Code Look-up tool (ERR.EXE), we can verify the definition of the error code –546.
D:\Utilities\ERR>err -546
# for decimal -546 / hex 0xfffffdde JET_errLogSectorSizeMismatch esent98.h # /* the log file sector size does not match the current # volume's sector size */ # 1 matches found for "-546"
In addition, the Application event log may contain the following entries:
Log Name: Application Source: MSExchangeRepl Date: 3/19/2013 11:14:58 AM Event ID: 2152 Task Category: Service Level: Error User: N/A Computer: MBX-3.exchange.msft Description: The log copier was unable to continue processing for database 'SectorTest\MBX-3' because an error occured on the target server: Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after a short delay.
Why does this issue occur? Each log file records in the header the sector size of the disk where a log file was created. For example, this is the header of a log file on MBX-1 with a native 512 byte sector size:
Z:\SectorTest>eseutil /ml E0100000001.log
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server Version 14.02 Copyright (C) Microsoft Corporation. All Rights Reserved. Initiating FILE DUMP mode... Base name: E01 Log file: E0100000001.log lGeneration: 1 (0x1) Checkpoint: (0x38,FFFF,FFFF) creation time: 03/19/2013 09:40:14 prev gen time: 00/00/1900 00:00:00 Format LGVersion: (7.3704.16.2) Engine LGVersion: (7.3704.16.2) Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer: Env SystemPath: z:\SectorTest\ Env LogFilePath: z:\SectorTest\ Env Log Sec size: 512 (matches) Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers) ( off, 1227, 61350, 16384, 61350, 2048, 2048, 44204) Using Reserved Log File: false Circular Logging Flag (current file): off Circular Logging Flag (past files): off Checkpoint at log creation time: (0x1,8,0) Last Lgpos: (0x1,A,0) Number of database page references: 0 Integrity check passed for log file: E0100000001.log Operation completed successfully in 0.62 seconds.
The sector size that is chosen is determined through one of two methods:
In theory, since the sector size of disks should not be changing across nodes and the sector size of all disks must match, this should not cause a problem. In our example, and in some customer environments, these sector sizes are actually changing. Since most of these databases already exist, the existing sector size of the log stream is utilized, which in turn causes a mismatch between DAG members.
When a mismatch occurs, the issue only prevents the successful use of block mode replication. It does not affect file mode replication. Block mode replication was introduced in Exchange 2010 Service Pack 1. For more information on block mode replication, see New High Availability and Site Resilience Functionality in Exchange 2010 SP1.
Why does this only affect block mode replication? When a log file is addressed we reference locations within a log file based off a log file position. The log file position is a combination of the log generation, the sector, and offset within that sector. For example, in the previous header dump you can see the “Last LGPOS” is (0x1,A,0) – this just happens to be the last log file position within the log. Let us say we were creating a block for block mode replication within a log file generation 0x1A, sector 8, offset 1 – this would be reflected as an LGPOS of (0x1a,8,1). When this block is transmitted to a host with an advanced sector size disk, the log position would actually have to be translated. On an advanced format disk this same log position would be (0x1a,1,1). As you can see, it could create significant problems if incorrect positions within a log file were written to or read from.
How do I go about correcting this condition? To fix this condition, first ensure that the same sector sizes exist on all disks across all nodes that host Exchange data, and then reset the log stream.
The following steps can show you how to do this with minimal downtime.
Ensure that Exchange 2010 Service Pack 2 or later is installed on all DAG members.
Note: Exchange 2010 Service Pack 1 and earlier do not support 512e volumes).
Restart the Microsoft Exchange Replication service on each member using the Shell: Restart-Service MSExchangeRepl
Validate that all copies of databases across DAG members are healthy at this time:
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Mounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 0 3/19/2013 12:28:34 PM Healthy SectorTest\MBX-3 Healthy 0 0 3/19/2013 12:28:34 PM Healthy
Apply the appropriate hotfix for Windows Server 2008 or Windows Server 2008 R2 and Advanced Format Disks. Windows Server 2012 does not require a hotfix.
Repeat the procedure that caused the disk sector size to change. For example, if the issue arose as a result of upgrading drivers and firmware on a host utilize your maintenance mode procedures to complete the driver and firmware upgrade on all hosts.
Note: If your installation does not allow for you to use the same sector sizes across all DAG members, then the implementation is not supported.
Utilize FSUTIL to ensure that the sector sizes match across all hosts for the log and database volumes.
NTFS Volume Serial Number : 0x18d0bc1dd0bbfed6 Version : 3.1 Number Sectors : 0x000000000fdfe7ff Total Clusters : 0x0000000001fbfcff Free Clusters : 0x0000000001fac6e6 Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Physical Sector : 4096 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x0000000000040000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x0000000000000002 Mft Zone Start : 0x00000000000c0040 Mft Zone End : 0x00000000000cc840 RM Identifier: EF486117-9094-11E2-BF55-00155D006BA1
On MBX-2
NTFS Volume Serial Number : 0xfa6a794c6a790723 Version : 3.1 Number Sectors : 0x000000000fdfe7ff Total Clusters : 0x0000000001fbfcff Free Clusters : 0x0000000001fac86f Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Physical Sector : 4096 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x0000000000040000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x0000000000000002 Mft Zone Start : 0x00000000000c0040 Mft Zone End : 0x00000000000cc840 RM Identifier: 5F18A2FC-909E-11E2-8599-00155D006BA2
On MBX-3
NTFS Volume Serial Number : 0x0ad44aafd44a9d37 Version : 3.1 Number Sectors : 0x000000000fdfe7ff Total Clusters : 0x0000000001fbfcff Free Clusters : 0x0000000001fabfd6 Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Physical Sector : 4096 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x0000000000040000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x0000000000000002 Mft Zone Start : 0x00000000000c0040 Mft Zone End : 0x00000000000cc840 RM Identifier: B9B00E32-90B2-11E2-94E9-00155D006BA3
At this point, the DAG should be stable, and replication should be occurring as expected between databases using file mode. In order to restore block mode replication and fully recognize the new disk sector sizes, the log stream must be reset.
IMPORTANT: Please note the following about resetting the log stream:
You can use the following steps to reset the log stream:
Validate the existence of a replay queue:
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Mounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:34:37 PM Healthy SectorTest\MBX-3 Healthy 0 138 3/19/2013 1:34:37 PM Healthy
Set the replay and truncation lag times values to 0 on all database copies. This will ensure that logs replay to current while allowing the databases to remain online. In this example, MBX-3 is a lagged copy database. When the configuration change is detected, log replay will occur allowing the lagged copy to eventually catch up. Note that depending on the replay lag time, this could take several hours before proceeding to next steps.
[PS] C:\>Set-MailboxDatabaseCopy SectorTest\MBX-3 -ReplayLagTime 0.0:0:0 -TruncationLagTime 0.0:0:0
Validate that the replay queue has caught up and is near zero.
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Mounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:34:37 PM Healthy SectorTest\MBX-3 Healthy 0 0 3/19/2013 1:34:37 PM Healthy
Dismount the database.
CAUTION: Dismounting the database will cause a client interruption, which will continue until the database is mounted.
[PS] C:\>Dismount-Database SectorTest
Confirm Are you sure you want to perform this action? Dismounting database "SectorTest". This may result in reduced availability for mailboxes in the database. [Y] Yes [A] Yes to All [N] No [L] No to All [?] Help (default is "Y"): y [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Dismounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
On each DAG member hosting a database copy, open a command prompt and navigate to the log file directory. Execute eseutil /r ENN to perform a soft recovery. This step is necessary to ensure that all log files are played into all copies.
Z:\SectorTest>eseutil /r e01
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server Version 14.02 Copyright (C) Microsoft Corporation. All Rights Reserved. Initiating RECOVERY mode... Logfile base name: e01 Log files: <current directory> System files: <current directory> Performing soft recovery... Restore Status (% complete) 0 10 20 30 40 50 60 70 80 90 100 |----|----|----|----|----|----|----|----|----|----| ................................................... Operation completed successfully in 0.203 seconds.
On each DAG member hosting a database copy open a command prompt and navigate to the database directory. Execute eseutil /mh <EDB> against the database to dump the header. You must validate that the following information is correct on all database copies:
Here is example output of a full /mh dump followed by a comparison of the data across our three sample copies.
Z:\SectorTest>eseutil /mh SectorTest.edb
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server Version 14.02 Copyright (C) Microsoft Corporation. All Rights Reserved. Initiating FILE DUMP mode... Database: SectorTest.edb DATABASE HEADER: Checksum Information: Expected Checksum: 0x010f4400 Actual Checksum: 0x010f4400 Fields: File Type: Database Checksum: 0x10f4400 Format ulMagic: 0x89abcdef Engine ulMagic: 0x89abcdef Format ulVersion: 0x620,17 Engine ulVersion: 0x620,17 Created ulVersion: 0x620,17 DB Signature: Create time:03/19/2013 09:40:15 Rand:11009066 Computer: cbDbPage: 32768 dbtime: 601018 (0x92bba) State: Clean Shutdown Log Required: 0-0 (0x0-0x0) Log Committed: 0-0 (0x0-0x0) Log Recovering: 0 (0x0) GenMax Creation: 00/00/1900 00:00:00 Shadowed: Yes Last Objid: 3350 Scrub Dbtime: 0 (0x0) Scrub Date: 00/00/1900 00:00:00 Repair Count: 0 Repair Date: 00/00/1900 00:00:00 Old Repair Count: 0 Last Consistent: (0x138,3FB,1A4) 03/19/2013 13:44:11 Last Attach: (0x111,9,86) 03/19/2013 13:42:29 Last Detach: (0x138,3FB,1A4) 03/19/2013 13:44:11 Dbid: 1 Log Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer: OS Version: (6.1.7601 SP 1 NLS ffffffff.ffffffff) Previous Full Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00 Previous Incremental Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00 Previous Copy Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00 Previous Differential Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00 Current Full Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00 Current Shadow copy backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00 cpgUpgrade55Format: 0 cpgUpgradeFreePages: 0 cpgUpgradeSpaceMapPages: 0 ECC Fix Success Count: none Old ECC Fix Success Count: none ECC Fix Error Count: none Old ECC Fix Error Count: none Bad Checksum Error Count: none Old bad Checksum Error Count: none Last checksum finish Date: 03/19/2013 13:11:36 Current checksum start Date: 00/00/1900 00:00:00 Current checksum page: 0 Operation completed successfully in 0.47 seconds.
MBX-1:
State: Clean Shutdown Last Consistent: (0x138,3FB,1A4) 03/19/2013 13:44:11 Last Detach: (0x138,3FB,1A4) 03/19/2013 13:44:11
MBX-2:
State: Clean Shutdown Last Consistent: (0x138,3FB,1A4) 03/19/2013 13:44:12 Last Detach: (0x138,3FB,1A4) 03/19/2013 13:44:12
MBX-3:
State: Clean Shutdown Last Consistent: (0x138,3FB,1A4) 03/19/2013 13:44:13 Last Detach: (0x138,3FB,1A4) 03/19/2013 13:44:13
In this case, the values match across all copies so further steps can be performed.
If the values do not match across copies for any reason, do not continue and please contact Microsoft support.
Reset the log file generation for the database.
Note: Use Get-MailboxDatabaseCopyStatus to record database locations and status prior to performing this activity.
Locate the log file directory for each ACTIVE (DISMOUNTED) database. Remove all log files from this directory first. Failure to remove log files from the ACTIVE (DISMOUNTED) database may result in the Replication service recopying log files, a failure of this procedure, and subsequent need to reseed all database copies.
IMPORTANT: If log files are located in the same location as the database and catalog data folder, take precautions to not remove the database or the catalog data folder.
In our example MBX-1 hosts the ACTIVE (DISMOUNTED) copy.
[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Dismounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
Locate the log file directory for each PASSIVE database. Remove all log files from this directory. Failure to remove all log files could result in this procedure failing, and the need to reseed this or all database copies. If log files are located in the same location as the database and catalog data folder take precautions to not remove the database or the catalog data folder.
In our example MBX-2 and MBX-3 host the passive database copies.
Mount the database using Mount-Database <DBNAME>, and verify it has mounted.
[PS] C:\>Mount-Database SectorTest [PS] C:\>Get-MailboxDatabaseCopyStatus *
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Mounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 1 3/25/2013 5:57:28 AM Healthy SectorTest\MBX-3 Healthy 0 1 3/25/2013 5:57:28 AM Healthy
Suspend and resume all passive database copies.
Note: The error on suspending the active database copy is expected.
[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Suspend-MailboxDatabaseCopy
The suspend operation can't proceed because database 'SectorTest' on Exchange Mailbox server 'MBX-1' is the active mailbox database copy. + CategoryInfo : InvalidOperation: (SectorTest\MBX-1:DatabaseCopyIdParameter) [Suspend-MailboxDatabaseCopy], InvalidOperationException + FullyQualifiedErrorId : 5083D28B,Microsoft.Exchange.Management.SystemConfigurationTasks.SuspendDatabaseCopy + PSComputerName : mbx-1.exchange.msft
Note: The error on resuming the active database copy is expected.
[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Resume-MailboxDatabaseCopy
WARNING: The Resume operation won't have an effect on database replication because database 'SectorTest' hosted on server 'MBX-1' is the active mailbox database.
Validate replication health.
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Mounted 0 0 Healthy SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:56:12 PM Healthy SectorTest\MBX-3 Healthy 0 0 3/19/2013 1:56:12 PM Healthy
Using Set-MailboxDatabaseCopy, reconfigure any replay lag or truncation lag time on the database copy. This example implements a 7 day replay lag time.
set-mailboxdatabasecopy –identity SectorTest\MBX-3 –replayLagTime 7.0:0:0
Repeat the previous steps for all databases in the DAG including those databases that have a single copy.
IMPORTANT: DO NOT proceed to the next step until all databases have been reset.
Enable block mode replication. Using registry editor navigate to HKLM \Software\Microsoft\ExchangeServer \V14 \Replay, and then remove the DisableGranularReplication DWORD value.
Restart the replication service on each DAG member.
Restart-Service MSExchangeREPL
Validate database health using Get-MailboxDatabaseCopyStatus.
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ SectorTest\MBX-1 Healthy 0 0 3/19/2013 2:25:56 PM Healthy SectorTest\MBX-2 Mounted 0 0 Healthy SectorTest\MBX-3 Healthy 0 230 3/19/2013 2:25:56 PM Healthy
Dump the header of a log file and verify that the new sector size is reflected in the log file stream. To do this, open a command prompt and navigate to the log file directory for the database on the active node. Run eseutil /ml against any log within the directory, and verify that the sector size reflects 4096 and (matches).
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server Version 14.02 Copyright (C) Microsoft Corporation. All Rights Reserved. Initiating FILE DUMP mode... Base name: E01 Log file: E0100000001.log lGeneration: 1 (0x1) Checkpoint: (0x17B,FFFF,FFFF) creation time: 03/19/2013 13:56:11 prev gen time: 00/00/1900 00:00:00 Format LGVersion: (7.3704.16.2) Engine LGVersion: (7.3704.16.2) Signature: Create time:03/19/2013 13:56:11 Rand:2996669 Computer: Env SystemPath: z:\SectorTest\ Env LogFilePath: z:\SectorTest\ Env Log Sec size: 4096 (matches) Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers) ( off, 1227, 61350, 16384, 61350, 2048, 256, 44204) Using Reserved Log File: false Circular Logging Flag (current file): off Circular Logging Flag (past files): off Checkpoint at log creation time: (0x1,1,0) Last Lgpos: (0x1,2,0) Number of database page references: 0 Integrity check passed for log file: E0100000001.log Operation completed successfully in 0.250 seconds.
If the above steps have been completed successfully, and the log file sequence recognizes a 4096 sector size, then this issue has been resolved.
This guidance was validated in the following configurations:
Tim McMichael
We’re happy to announce updates to the Exchange Server 2013 Deployment Assistant!
We’ve updated the Deployment Assistant to include the following new scenarios:
These new scenarios provide step-by-step guidance about how to upgrade your existing Exchange 2007 or Exchange 2010 organizations to benefit from the improvements and new features of Exchange 2013. Plus, Exchange 2007 organizations can now configure a hybrid deployment with Office 365 using Exchange 2013 instead of Exchange 2010 SP3 in their on-premises organization.
And, there’s more on the way! We’re also working hard on additional scenarios, such as upgrading from a mixed Exchange Server 2007/2010 organization to Exchange 2013 and configuring Exchange 2013-based hybrid for Exchange 2010 organizations. Keep checking back here for release announcements.
In case you're not familiar with it, the Exchange Server 2013 Deployment Assistant is a web-based tool that helps you deploy Exchange 2013 in your on-premises organization, configure a hybrid deployment between your on-premises organization and Office 365, or migrate to Office 365. The tool asks you a small set of simple questions and then, based on your answers, creates a customized checklist with instructions to deploy or configure Exchange 2013. Instead of trying to find what you need in the Exchange library, the Deployment Assistant gives you exactly the right information you need to complete your task. Supported on most major browsers, the Deployment Assistant is your one-stop shop for deploying Exchange 2013.
Figure 1: The updated Exchange 2013 Deployment Assistant (large screenshot)
And for those organizations that still need to deploy Exchange 2010 or are interested in configuring an Exchange 2010-based hybrid deployment with Office 365, you can continue to access the Exchange Server 2010 Deployment Assistant at http://technet.microsoft.com/exdeploy2010 (short URL: aka.ms/eda2010).
Do you have a deployment success story about the Deployment Assistant? Do you have suggestions on how to improve the tool? We would love your feedback and comments! Feel free to leave a comment here, or send an email to edafdbk@microsoft.com directly or via the 'Feedback' link located in the header of every page of the Deployment Assistant.
Happy deploying!
The Deployment Assistant Team
A few years back, a very detailed blog post was released on Troubleshooting Exchange 2007 Store Log/Database growth issues.
We wanted to revisit this topic with Exchange 2010 in mind. While the troubleshooting steps needed are virtually the same, we thought it would be useful to condense the steps a bit, make a few updates and provide links to a few newer KB articles.
The below list of steps is a walkthrough of an approach that would likely be used when calling Microsoft Support for assistance with this issue. It also provides some insight as to what we are looking for and why. It is not a complete list of every possible troubleshooting step, as some causes are simply not seen quite as much as others.
Another thing to note is that the steps are commonly used when we are seeing “rapid” growth, or unexpected growth in the database file on disk, or the amount of transaction logs getting generated. An example of this is when an Administrator notes a transaction log file drive is close to running out of space, but had several GB free the day before. When looking through historical records kept, the Administrator notes that approx. 2 to 3 GBs of logs have been backed up daily for several months, but we are currently generating 2 to 3 GBs of logs per hour. This is obviously a red flag for the log creation rate. Same principle applies with the database in scenarios where the rapid log growth is associated to new content creation.
In other cases, the database size or transaction log file quantity may increase, but signal other indicators of things going on with the server. For example, if backups have been failing for a few days and the log files are not getting purged, the log file disk will start to fill up and appear to have more logs than usual. In this example, the cause wouldn’t necessarily be rapid log growth, but an indicator that the backups which are responsible for purging the logs are failing and must be resolved. Another example is with the database, where retention settings have been modified or online maintenance has not been completing, therefore, the database will begin to grow on disk and eat up free space. These scenarios and a few others are also discussed in the “Proactive monitoring and mitigation efforts” section of the previously published blog.
It should be noted that in some cases, you may run into a scenario where the database size is expanding rapidly, but you do not experience log growth at a rapid rate. (As with new content creation in rapid log growth, we would expect the database to grow at a rapid rate with the transaction logs.) This is often referred to as database “bloat” or database “space leak”. The steps to troubleshoot this specific issue can be a little more invasive as you can see in some analysis steps listed here (taking databases offline, various kinds of dumps, etc.), and it may be better to utilize support for assistance if a reason for the growth cannot be found.
Once you have established that the rate of growth for the database and transaction log files is abnormal, we would begin troubleshooting the issue by doing the following steps. Note that in some cases the steps can be done out of order, but the below provides general suggested guidance based on our experiences in support.
Use Exchange User Monitor (Exmon) server side to determine if a specific user is causing the log growth problems.
If it appears that the user in Exmon is a ?, then this is representative of a HUB/Transport related problem generating the logs. Query the message tracking logs using the Message Tracking Log tool in the Exchange Management Consoles Toolbox to check for any large messages that might be running through the system. See #15 for a PowerShell script to accomplish the same task.
With Exchange 2007 Service Pack 2 Rollup Update 2 and higher, you can use KB972705 to troubleshoot abnormal database or log growth by adding the described registry values. The registry values will monitor RPC activity and log an event if the thresholds are exceeded, with details about the event and the user that caused it. (These registry values are not currently available in Exchange Server 2010)
Check for any excessive ExCDO warning events related to appointments in the application log on the server. (Examples are 8230 or 8264 events). If recurrence meeting events are found, then try to regenerate calendar data server side via a process called POOF. See http://blogs.msdn.com/stephen_griffin/archive/2007/02/21/poof-your-calender-really.aspx for more information on what this is.
Event Type: Warning Event Source: EXCDO Event Category: General Event ID: 8230 Description: An inconsistency was detected in username@domain.com: /Calendar/<calendar item> .EML. The calendar is being repaired. If other errors occur with this calendar, please view the calendar using Microsoft Outlook Web Access. If a problem persists, please recreate the calendar or the containing mailbox.
Event Type: Warning Event ID : 8264 Category : General Source : EXCDO Type : Warning Message : The recurring appointment expansion in mailbox <someone's address> has taken too long. The free/busy information for this calendar may be inaccurate. This may be the result of many very old recurring appointments. To correct this, please remove them or change their start date to a more recent date.
Important: If 8230 events are consistently seen on an Exchange server, have the user delete/recreate that appointment to remove any corruption
Collect and parse the IIS log files from the CAS servers used by the affected Mailbox Server. You can use Log Parser Studio to easily parse IIS log files. In here, you can look for repeated user account sync attempts and suspicious activity. For example, a user with an abnormally high number of sync attempts and errors would be a red flag. If a user is found and suspected to be a cause for the growth, you can follow the suggestions given in steps 5 and 6.
Once Log Parser Studio is launched, you will see convenient tabs to search per protocol:
Some example queries for this issue would be:
If a suspected user is found via Exmon, the event logs, KB972705, or parsing the IIS log files, then do one of the following:
Set-Casmailbox –Identity <Username> –MapiEnabled $False
If closing the client/devices, or killing their sessions seems to stop the log growth issue, then we need to do the following to see if this is OST or Outlook profile related:
Have the user launch Outlook while holding down the control key which will prompt if you would like to run Outlook in safe mode. If launching Outlook in safe mode resolves the log growth issue, then concentrate on what add-ins could be attributing to this problem.
For a mobile device, consider a full resync or a new sync profile. Also check for any messages in the drafts folder or outbox on the device. A corrupted meeting or calendar entry is commonly found to be causing the issue with the device as well.
If you can gain access to the users machine, then do one of the following:
1. Launch Outlook to confirm the log file growth issue on the server.
2. If log growth is confirmed, do one of the following:
3. Follow the Running Process Explorer instructions in the below article to dump out dlls that are running within the Outlook Process. Name the file username.txt. This helps check for any 3rd party Outlook Add-ins that may be causing the excessive log growth. 970920 Using Process Explorer to List dlls Running Under the Outlook.exe Process http://support.microsoft.com/kb/970920
4. Check the Sync Issues folder for any errors that might be occurring
Let’s attempt to narrow this down further to see if the problem is truly in the OST or something possibly Outlook Profile related:
If renaming the OST causes the problem to recur again, then recreate the users profile to see if this might be profile related.
Ask Questions:
Check to ensure File Level Antivirus exclusions are set correctly for both files and processes per http://technet.microsoft.com/en-us/library/bb332342(v=exchg.141).aspx
If Exmon and the above methods do not provide the data that is necessary to get root cause, then collect a portion of Store transaction log files (100 would be a good start) during the problem period and parse them following the directions in http://blogs.msdn.com/scottos/archive/2007/11/07/remix-using-powershell-to-parse-ese-transaction-logs.aspx to look for possible patterns such as high pattern counts for IPM.Appointment. This will give you a high level overview if something is looping or a high rate of messages being sent. Note: This tool may or may not provide any benefit depending on the data that is stored in the log files, but sometimes will show data that is MIME encoded that will help with your investigation
If nothing is found by parsing the transaction log files, we can check for a rogue, corrupted, and large message in transit:
1. Check current queues against all HUB Transport Servers for stuck or queued messages:
get-exchangeserver | where {$_.IsHubTransportServer -eq "true"} | Get-Queue | where {$_.Deliverytype –eq “MapiDelivery”} | Select-Object Identity, NextHopDomain, Status, MessageCount | export-csv HubQueues.csv
Review queues for any that are in retry or have a lot of messages queued:
Export out message sizes in MB in all Hub Transport queues to see if any large messages are being sent through the queues:
get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,@{Name="Message Size MB";expression={$_.size.toMB()}} | sort-object -property size –descending | export-csv HubMessages.csv
Export out message sizes in Bytes in all Hub Transport queues:
get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,size | sort-object -property size –descending | export-csv HubMessages.csv
2. Check Users Outbox for any large, looping, or stranded messages that might be affecting overall Log Growth.
get-mailbox -ResultSize Unlimited| Get-MailboxFolderStatistics -folderscope Outbox | Sort-Object Foldersize -Descending | select-object identity,name,foldertype,itemsinfolder,@{Name="FolderSize MB";expression={$_.folderSize.toMB()}} | export-csv OutboxItems.csv
Note: This does not get information for users that are running in cached mode.
Utilize the MSExchangeIS Client\Jet Log Record Bytes/sec and MSExchangeIS Client\RPC Operations/sec Perfmon counters to see if there is a particular client protocol that may be generating excessive logs. If a particular protocol mechanism if found to be higher than other protocols for a sustained period of time, then possibly shut down the service hosting the protocol. For example, if Exchange Outlook Web Access is the protocol generating potential log growth, then stopping the World Wide Web Service (W3SVC) to confirm that log growth stops. If log growth stops, then collecting IIS logs from the CAS/MBX Exchange servers involved will help provide insight in to what action the user was performing that was causing this occur.
Run the following command from the Management shell to export out current user operation rates:
To export to CSV File:
get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount, progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| export-csv LogonStats.csv
To view realtime data:
get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount, progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| ft
Key things to look for:
In the below example, the Administrator account was storming the testuser account with email. You will notice that there are 2 users that are active here, one is the Administrator submitting all of the messages and then you will notice that the Windows2000Account references a HUB server referencing an Identity of testuser. The HUB server also has *no* UserName either, so that is a giveaway right there. This can give you a better understanding of what parties are involved in these high rates of operations
UserName : Administrator Windows2000Account : DOMAIN\Administrator Identity : /o=First Organization/ou=First Administrative Group/cn=Recipients/cn=Administrator MessagingOperationCount : 1724 OtherOperationCount : 384 ProgressOperationCount : 0 StreamOperationCount : 0 TableOperationCount : 576 TotalOperationCount : 2684
UserName : Windows2000Account : DOMAIN\E12-HUB$ Identity : /o= First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=testuser MessagingOperationCount : 630 OtherOperationCount : 361 ProgressOperationCount : 0 StreamOperationCount : 0 TableOperationCount : 0 TotalOperationCount : 1091
Enable Perfmon/Perfwiz logging on the server. Collect data through the problem times and then review for any irregular activities. You can reference Perfwiz for Exchange 2007/2010 data collection here http://blogs.technet.com/b/mikelag/archive/2010/07/09/exchange-2007-2010-performance-data-collection-script.aspx
Run ExTRA (Exchange Troubleshooting Assistant) via the Toolbox in the Exchange Management Console to look for any possible Functions (via FCL Logging) that may be consuming Excessive times within the store process. This needs to be launched during the problem period. http://blogs.technet.com/mikelag/archive/2008/08/21/using-extra-to-find-long-running-transactions-inside-store.aspx shows how to use FCL logging only, but it would be best to include Perfmon, Exmon, and FCL logging via this tool to capture the most amount of data. The steps shown are valid for Exchange 2007 & Exchange 2010.
Export out Message tracking log data from affected MBX server.
Download the ExLogGrowthCollector script and place it on the MBX server that experienced the issue. Run ExLogGrowthCollector.ps1 from the Exchange Management Shell. Enter in the MBX server name that you would like to trace, the Start and End times and click on the Collect Logs button.
Note: What this script does is to export out all mail traffic to/from the specified mailbox server across all HUB servers between the times specified. This helps provide insight in to any large or looping messages that might have been sent that could have caused the log growth issue.
Copy/Paste the following data in to notepad, save as msgtrackexport.ps1 and then run this on the affected Mailbox Server. Open in Excel for review. This is similar to the GUI version, but requires manual editing to get it to work.
#Export Tracking Log data from affected server specifying Start/End Times Write-host "Script to export out Mailbox Tracking Log Information" Write-Host "#####################################################" Write-Host $server = Read-Host "Enter Mailbox server Name" $start = Read-host "Enter start date and time in the format of MM/DD/YYYY hh:mmAM" $end = Read-host "Enter send date and time in the format of MM/DD/YYYY hh:mmPM" $fqdn = $(get-exchangeserver $server).fqdn Write-Host "Writing data out to csv file..... " Get-ExchangeServer | where {$_.IsHubTransportServer -eq "True" -or $_.name -eq "$server"} | Get-MessageTrackingLog -ResultSize Unlimited -Start $start -End $end | where {$_.ServerHostname -eq $server -or $_.clienthostname -eq $server -or $_.clienthostname -eq $fqdn} | sort-object totalbytes -Descending | export-csv MsgTrack.csv -NoType Write-Host "Completed!! You can now open the MsgTrack.csv file in Excel for review"
You can also use the Process Tracking Log Tool at http://blogs.technet.com/b/exchange/archive/2011/10/21/updated-process-tracking-log-ptl-tool-for-use-with-exchange-2007-and-exchange-2010.aspx to provide some very useful reports.
Save off a copy of the application/system logs from the affected server and review them for any events that could attribute to this problem.
Enable IIS extended logging for CAS and MB server roles to add the sc-bytes and cs-bytes fields to track large messages being sent via IIS protocols and to also track usage patterns (Additional Details).
Get a process dump the store process during the time of the log growth. (Use this as a last measure once all prior activities have been exhausted and prior to calling Microsoft for assistance. These issues are sometimes intermittent, and the quicker you can obtain any data from the server, the better as this will help provide Microsoft with information on what the underlying cause might be.)
procdump -mp -s 120 -n 2 store.exe d:\DebugData
Open a case with Microsoft Product Support Services to get this data looked at.
2814847 - Rapid growth in transaction logs, CPU use, and memory consumption in Exchange Server 2010 when a user syncs a mailbox by using an iOS 6.1 or 6.1.1-based device
2621266 - An Exchange Server 2010 database store grows unexpectedly large
996191 - Troubleshooting Fast Growing Transaction Logs on Microsoft Exchange 2000 Server and Exchange Server 2003
Kevin Carker (based on a blog post written by Mike Lagase)