Exchange Team Blog

  • Managed Availability and Server Health

    Every second on every Exchange 2013 server, Managed Availability polls and analyzes hundreds of health metrics.  If something is found to be wrong, most of the time it will be fixed automatically.  But of course there will always be issues that Managed Availability won’t be able to fix on its own.  In those cases, Managed Availability will escalate the issue to an administrator by means of event logging, and perhaps alerting if System Center Operations Manager is used in tandem with Exchange 2013. When an administrator needs to get involved and investigate the issue, they can begin by using the Get-HealthReport and Get-ServerHealth cmdlets.

    Server Health Summary

    Start with Get-HealthReport to find out the status of every Health Set on the server:

    Get-HealthReport –Identity <ServerName>

    This will result in the following output (truncated for brevity):

    Server State HealthSet AlertValue LastTransitionTime MonitorCount
    ------ ------ ------ ------ ------ ------
    Server1 NotApplicable AD Healthy 5/21/2013 12:23 14
    Server1 NotApplicable ECP Unhealthy 5/26/2013 15:40 2
    Server1 NotApplicable EventAssistants Healthy 5/29/2013 17:51 40
    Server1 NotApplicable Monitoring Healthy 5/29/2013 17:21 9

    In the above example, you can see that that the ECP (Exchange Control Panel) Health Set is Unhealthy. And based on the value for MonitorCount, you can also see that the ECP Health Set relies on two Monitors. Let's find out if both of those Monitors are Unhealthy.

    Monitor Health

    The next step would be to use Get-ServerHealth to determine which of the ECP Health Set Monitors are in an unhealthy state.

    Get-ServerHealth –Identity <ServerName> –HealthSet ECP

    This results in the following output:

    Server State Name TargetResource HealthSetName AlertValue ServerComponent
    ------ ------ ------ ------ ------ ------ ------
    Server1 NotApplicable EacSelfTestMonitor   ECP Unhealthy None
    Server1 NotApplicable EacDeepTestMonitor   ECP Unhealthy None

     

    As you can see above, both Monitors are Unhealthy.  As an aside, if you pipe the above command to Format-List, you can get even more information about these Monitors.

    Troubleshooting Monitors

    Most Monitors are one of these four types:

     

     

    The EacSelfTestMonitor Probes along the "1" path, while the EacDeepTestMonitor Probes along the "4" path. Since both are unhealthy, it indicates that the problem lies on the Mailbox server in either the protocol stack or the store. It could also be a problem with a dependency, such as Active Directory, which is common when multiple Health Sets are unhealthy. In this case, the Troubleshooting ECP Health Set topic would be the best resource to help diagnose and resolve this issue.

    Abram Jackson

    Program Manager, Exchange Server

  • Adventures in querying the EventHistory table

    Beginning with Exchange 2007 the Exchange database has had an internal table called EventHistory.  This table has been used to track the events upon which several of the assistants are based and for other short term internal record keeping.  The way to query the table hasn’t been publicized before but it has a number of uses:

    • It may tell you the fate of a deleted item (for situations where Audit logging or store tracing was not in place at the time of the delete)
    • It can list accounts who have recently touched a mailbox
    • It can show you the clients that have touched a mailbox

    Events are kept in the EventHistory table for up to 7 days by default.  You can check what your retention period is for all databases by running:

    Get-mailboxdatabase | fl name,event*
    Name                        : MainDB
    EventHistoryRetentionPeriod : 7.00:00:00

    There are a number of approaches to querying the table.  Let’s start with a script (please review my caveats before actually running the script) and review the data that is displayed.  The script is:

    Add-PSSnapin Microsoft.Exchange.Management.Powershell.Support
    $db = (get-mailbox <user alias>).database
    $mb=(get-mailbox <user alias>).exchangeguid
    Get-DatabaseEvent $db -MailboxGuid $mb -resultsize unlimited | ? {$_.documentid -ne 0 -and $_.CreateTime -ge  “<mm/dd/yyyy>”} | fl > c:\temp\EventHistory.txt

    For the CreateTime specify the day of the event you are looking for.  By default a maximum of 7 days are tracked.  Depending on the date range selected and the activity in the mailbox the resulting file size starts at about 5KB and I have seen it rise to nearly 1GB.  You can also replace the “| fl > c:\temp\EventHistory.txt” with “| export-csv c:\temp\EventHistory.csv”.  I am using the FL output because it is easier for illustration purposes.

    Inside the EventHistory.txt file will be events like this one (this one is a bulk delete of emails using OWA):

    Counter          : 15328155
    CreateTime       : 1/28/2013 9:46:16 PM
    ItemType         : MAPI_MESSAGE
    EventName        : ObjectMoved
    Flags            : None
    MailboxGuid      : d05f83c1-255c-42ae-b74f-1ac3329b306a
    ObjectClass      : IPM.Note
    ItemEntryId      : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000BC9C9BA4
    2124CD4F896E8915C86B2BD0000041B6E6570000

    ParentEntryId    : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0100BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000
    OldItemEntryId   : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027BF0000BC9C9BA4
    2124CD4F896E8915C86B2BD0000041B6D6260000

    OldParentEntryId : 000000008CFDF3C2BA873648866A1C17D0E3F1AB0100BC9C9BA42124CD4F896E8915C86B2BD00000006027BF0000
    ItemCount        : 0
    UnreadItemCount  : 0
    ExtendedFlags    : 2147483648
    ClientCategory : WebServices
    PrincipalName : Contoso\TestUser
    PrincipalSid : S-1-5-21-915020002-1829042167-1583638127-1930
    Database         : Mailbox Database 1858470524
    DocumentId       : 10876

    The EventName shows what was done with the object.  End user deletes will be listed as moves.  When you delete an item it is moved to either Deleted Items or to the Recoverable Items subtree

    I highlighted the ItemEntryID because that ties directly to the Item you need to locate.  The subject and other human readable properties are not included in this table.  The ItemEntryID is the database engine’s way of uniquely identifying each item.  You can use this to search the mailbox in MFCMAPI and get properties like Subject, From, To, etc.

    • The ParentEntryID is the folder in which the item presently resides.
    • The OldItemEntryID is the previous ItemEntryID before the item was deleted.
    • The OldParentEntryID is the folder it used to reside in.

    Flags will often show values like SearchFolder.  Many events flagged as being related to search folders or folders are not going to be interesting to your investigations.  If you are researching the fate of a deleted item they can be ignored.

    ClientCategory is the type of client that requested the operation.  In this case webservices means that OWA was used to remove the item as part of a bulk operation conducted against a 2010 mailbox.  If it was deleted individually then Exchange 2010 would list OWA here.   The way ClientCategories are tracked in Exchange 2013 is a little different; you should see OWA for all End User deletes through that tool.

    PrincipalName and PrincipalSid give you the identity of the account that was passed to the information store when the operation was requested.  At the time of writing these are not displayed by Exchange 2013.

    So – we have an output file.  What do we do with it?  The easy uses for the file (once it is imported into your favorite data analysis tool) at this time are:

    • List of all accounts that have caused an event to be logged in the time period you specified
    • Get a summary of operations (deletes, moves, new items, etc.) conducted on the days you specified
    • Get a list of client types that have changed something in the mailbox
    • Search the records returned for a particular ItemEntryID

    In our output the ItemEntryID is not immediately useful.  To find out what the ItemEntryID in each record actually is we need to use MFCMAPI (steps related to MFCMAPI are at the end of this blog).  Once you are in MFCMAPI you can go to the Tools menu, select “Entry ID” and then “Open given entry ID”.  In the dialog that appears paste in the ItemEntryId or the OldItemEntryId that you want to investigate.  When you click OK MFCMAPI will take you to the item you specified (if it is still in the mailbox).  Once MFCMAPI takes you to the mail item you will see the Subject, From, To, Creation date and other meaningful properties.  You will also see there is a property called PR_ENTRYID.  PR_ENTRYID is the MAPI name for ItemEntryID.  This field is our link between the representation of the data in our PowrShell cmdlet and in the more human readable presentation in MFCMAPI.

    Pulling ItemEntryIDs from the PowerShell output and looking them up one at a time in MFCMAPI may be a little too tedious for most Exchange administrators.  If you have more than a handful of items you want to check (to see if they are useful and meaningful) it will take a long time to locate them all. 

    The alternative is to start in MFCMAPI.  If you can find the item you want there by looking at the subject line, date or other properties you can use the content of the PR_ENTRYID field in MFCMAPI to modify the Get-DatabaseEvent query to pull up the history for just that item.  To do this you need access to either a restored copy of the mailbox in a lab or the item of interest must still be in the mailbox (possibly in deleted items or recoverable items).  Here is a sample of how the get-databaseevent cmdlet would be used if you have the PR_ENTRYID:

    Get-DatabaseEvent $db -MailboxGuid $mb -resultsize unlimited | ? {$_.ItemEntryID -eq 
    “000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000BC9C9
    BA42124CD4F896E8915C86B2BD0000041B6E6570000”
    –or $_.OldItemEntryId –eq
    “000000008CFDF3C2BA873648866A1C17D0E3F1AB0700BC9C9BA42124CD4F896E8915C86B2BD00000006027C20000BC9C9
    BA42124CD4F896E8915C86B2BD0000041B6E6570000”} | export-csv c:\temp\SingleItemEventHistory.txt

    Sometimes I have not been able to locate an item using this technique.  If that happens it is useful to note that the PR_ENTRYID contains the ID of the mailbox, the folder and the item.  For example here is the PR_ENTRYID of an item in the Inbox followed by the PR_ENTRYID of the Inbox itself:

    000000006064986ABA58DF40A86C0C67E716264807004885B50069B1D04994374C02417D45A100000000324E00003DEF8F7
    FFC1E3448B9D276F022E0E42D0000396D1B280000
    000000006064986ABA58DF40A86C0C67E716264801004885B50069B1D04994374C02417D45A100000000324E0000

    For the sake of comparison here are the PR_ENTRYIDs of two more folders in the same mailbox:

    000000006064986ABA58DF40A86C0C67E716264801004885B50069B1D04994374C02417D45A10000000032510000 - deleted items folder
    000000006064986ABA58DF40A86C0C67E716264801004885B50069B1D04994374C02417D45A100000000324B0000 - ipm_subtree folder

    From this you should be able to get an idea of how the field is divided up by looking at where the repeated digits end.  For the purpose of tracking down an individual item that may be in a different folder (because of multiple moves) we want to be able to isolate the portion of the PR_ENTRYID that is specific to the item and modify our PowerShell statement appropriately.  The final statement would look like this:

    Get-DatabaseEvent $db –MailboxGuid $mb -resultsize unlimited | ? {$_.ItemEntryID -like  “*3DEF8F7FFC1E3448B9D276F022E0E42D0000396D1B280000” –or $_.OldItemEntryId –like “*3DEF8F7FFC1E3448B9D276F022E0E42D0000396D1B280000”} | export-csv c:\temp\SingleItemEventHistory.txt

    At this point if we still can’t find the item we want then our last chances are to remove the $_.MailboxGuid from the conditions (meaning we will search all mailboxes in the database – a very expensive operation please review the caveats) or to search other databases in the organization (databases containing delegates of the current user would be the ones to start with).  If the data still can’t be found you have either made an error or the records are no longer present.  If the records are present you should see all actions taken on the item recently.

    Caveats:

    • At the time of writing Exchange 2013 is not reporting the account information in the EventHistory records.  You can use the technique – you just won’t get any account names or SIDs from it.
    • You can change the length of time items stay in the EventHistory table with Set-MailboxDatabase -EventHistoryRetentionPeriod.  You can choose a period from 1 second up to 30 days.  I don’t recommend setting a time that is too short as I have not tested how Event based assistants would react to that.  For the full syntax of Set-MailboxDatabase please check the TechNet article for your Exchange version.
    • If you choose to direct your output to a variable instead of a text file you should make sure you are running the PowerShell cmdlets from a workstation with the management tools installed.  The variable (and the PowerShell session) are likely to consume a substantial amount of memory. 
    • These queries of the EventHistory table are expensive to run.  Use good judgment in when you choose to run them based on the demands of your environment.  In the labs I use all these queries take a second or two, but on a busy server with large databases  you can easily be looking at 20-30 minutes per query.  There will also be an I/O impact, but I don’t have a way to estimate that for you in advance.

    You can make the operation less expensive by lowering the number of records returned by Get-DatabaseEvent.  We are already including the database and mailbox to look for.  You can also add the EventNames and the StartCounter.  The latter of these might be a little tricky.  The StartCounter is an internal number that is specific to this table in the current database.  You probably won’t know what counter value to use until you have already run a query and noted the counter values.  This means StartCounter is mostly useful for reducing the impact of your second and subsequent queries of the same table in the same database.

    Assuming you know a relevant StartCounter value here is an example of doing this:

    Get-DatabaseEvent $db -MailboxGuid $mb –EventNames objectmodified, objectdeleted –StartCounter 15328155 -resultsize unlimited | ? {$_.documentid -ne 0 -and $_.CreateTime -ge  “<mm/dd/yyyy>”} | fl > c:\temp\EventHistory.txt

    The example above searches a mailbox on a particular database for the event types specified and ignores any rows with a lower counter value than specified.  This smaller dataset is then passed to the PowerShell pipeline for additional filtering and is ultimately saved to a CSV file that you can import into your favorite analysis tool.  If you prefer to conduct your analysis in PowerShell you also have the option of assigning the result of Get-DatabaseEvent to a PowerShell variable (just remember the variable and the PowerShell session will consume memory proportional to the resultset returned).

    So how do you find the PR_ENTRYIDs I mentioned above in MFCMAPI?

    You can download MFCMAPI from https://mfcmapi.codeplex.com.

    1. We need an Outlook profile for the mailbox we are searching.  That profile should NOT be configured for Cached mode.  If you are doing this from your machine make sure you have Full Access to the mailbox of the user.  You can then create a profile for that specific user.

    2. Once you have the profile open MFCMAPI and Log on

    image

    3. Select the profile you created for Step 1.  You will see a screen like this one:

    image

    4. Double-click the mailbox which will open a window showing you the mailbox details.

    5. If you already know the ItemEntryID you want to open and inspect you can locate it with this menu option:

    image

    6. If you don’t have the ItemEntryID expand the Root Container, Recoverable Item and Top of Information Store.  If you are trying to locate details on a deleted item look in the Deleted Items folder and the Recoverable Items folder (and it’s subfolders)

    image

    7. Double-click Deleted Items to open a window that looks like this one:

    image

    8. Click the item to fill in the lower half of the window with the properties

    9. Locate the PR_EntryID property and double-click it

    image

    10. The Binary box contains the value of the PR_ENTRYID field that you can use to search the EventHistory table in the Store.  If you locate this value with MFCMAPI first you can use it to limit the search as I described above.  If you don’t have this value you can pull the full history and use the ItemEntryIDs as a basis to search MFCMAPI.

    Thanks to Jesse Tedoff for the idea!

    Chris Pollitt

  • Log Parser Studio 2.0 is now available

    Since the initial release of Log Parser Studio (LPS) there have been over 30,000 downloads and thousands of customers use the tool on a daily basis. In Exchange support many of our engineers use the tool to solve real world issues every day and in turn share with our customers, empowering them to solve the same issues themselves moving forward. LPS is still an active work in progress; based on both engineer and customer feedback many improvements have been made with multiple features added during the last year. Below is a short list of new features:

    Improved import/export functionality

    For those who create their own queries this is a real time-saver. We can now import from multiple XML files simultaneously only choosing the queries we wish to import from multiple query libraries or XML files.

    Search Query Results

    The existing feature allowing searching of queries in the library is now context aware meaning if you have a completed query in the query window, the search option searches that query. If you are in the library it searches the library and so on. This allows drilling down into existing query results without having to run a new query if all you want to do is narrow down existing result sets.

    Input/Output Format Support

    All LP 2.2 Input and Output formats contain preliminary support in LPS. Each format has its own property window containing all known LP 2.2 settings which can be modified to your liking.

    Exchange Extensible Logging Support

    Custom parser support was added for most all Exchange logs. These are covered by the EEL and EELX log formats included in LPS which cover Exchange logs from Exchange 2003 through Exchange 2013.

    Query Logging

    I can't tell you how many times myself or another engineer spent lots of time creating the perfect query for a particular issue we were troubleshooting, forgetting to save the query in the heat of the moment and losing all that work. No longer! We now have the capability to log every query that is executed to a text file (Query.log). What makes this so valuable is if you ran it, you can retrieve it.

    Queries

    There are now over 170 queries in the library including new sample queries for Exchange 2013.

    image

    image

    PowerShell Export

    You can now export any query as a standalone PowerShell script. The only requirement of course is that Log Parser 2.2 is installed on the machine you run it on but LPS is not required. There are some limitations but you can essentially use LPS as a query editor/test bed for PowerShell scripts that run Log Parser queries for you!

    image

    Query Cancellation

    The ability to submit a request to cancel a running query has been added which will allow you to cancel a running query in many cases.

    Keyboard Shortcuts

    There are now 23 Keyboard shortcuts. Be sure to check these out as they will save you lots of time. To display the short cuts use CTRL+K or Help > Keyboard Shortcuts.

    There are literally hundreds of improvements and features; far too many to list here so be sure and check out our blog series with existing and upcoming tutorials, deep dives and more. If you are installing LPS for the first time you'll surely want to review the getting started series:

    If you are already familiar with LPS and are installing this latest version, you'll want to check out the upgrade blog post here:

    Additional LPS articles can be found here:

    http://blogs.technet.com/b/karywa/

    LPS doesn't require an install so just extract to the folder of your choice and run LPS.EXE. If you have the previous version of LPS and you have added your own custom queries to the library, be sure to export those queries as a backup before running the newest version. See the "Upgrading to LPS V2" blog post above when upgrading.

    Kary Wall

  • What Did Managed Availability Just Do To This Service?

    We in the Exchange product group get this question from time to time. The first thing we ask in response is always, “What was the customer impact?” In some cases, there is customer impact; these may indicate bugs that we are motivated to fix. However, in most cases there was no customer impact: a service restarted, but no one noticed. We have learned while operating the world’s largest Exchange deployment that it is fantastic when something is fixed before customers even notice. This is so desirable that we are willing to have a few extra service restarts as long as no customers are impacted.

    You can see this same philosophy at work in our approach to database failovers since Exchange 2007. The mantra we have come to repeat is, “Stuff breaks, but the user experience doesn’t!” User experience is our number one priority at all times. Individual service uptime on a server is a less important goal, as long as the user experience remains satisfactory.

    However, there are cases where Managed Availability cannot fix the problem. In cases like these, Exchange provides a huge amount of information about what the problem might be. Hundreds of things are checked and tested every minute. Usually, Get-HealthReport and Get-ServerHealth will be sufficient to find the problem, but this blog post will walk you through getting the full details from an automatic recovery action to the results of all the probes by:

    1. Finding the Managed Availability Recovery Actions that have been executed for a given service.
    2. Determining the Monitor that triggered the Responder.
    3. Retrieving the Probes that the Monitor uses.
    4. Viewing any error messages from the Probes.

    Finding Recovery Actions

    Every time Managed Availability takes a recovery action, such as restarting a service or failing over a database, it logs an event in the Microsoft.Exchange.ManagedAvailability/RecoveryActions crimson channel. Event 500 indicates that a recovery action has begun. Event 501 indicates that the action that was taken has completed. These can be collected via the MMC Event Viewer, but we usually find it more useful to use PowerShell. All of these Managed Availability recovery actions can be collected in PowerShell with a simple command:

    $RecoveryActionResultsEvents = Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ManagedAvailability/RecoveryActionResults

    We can use the events in this format, but it is easier to work with the event properties if we use PowerShell’s native XML format:

    $RecoveryActionResultsXML = ($RecoveryActionResultsEvents | Foreach-object -Process {[XML]$_.toXml()}).event.userData.eventXml

    Some of the useful properties for this Recovery Action event are:

    • Id: The action that was taken. Common values are RestartService, RecycleAppPool, ComponentOffline, or ServerFailover.
    • State: Whether the action has started (event 500) or finished (event 501).
    • ResourceName: The object that was affected by the action. This will be the name of a service for RestartService actions, or the name of a server for server-level actions.
    • EndTime: The time the action completed.
    • Result: Whether the action succeeded or not.
    • RequestorName: The name of the Responder that took the action.

    So for example, if you wanted to know why MSExchangeRepl was restarted on your server around 9:30PM, you could run a command like this:

    $RecoveryActionResultsXML | Where-Object {$_.State -eq "Finished" -and $_.ResourceName –eq "MSExchangeRepl" -and $_.EndTime -like "2013-06-12T21*"}| ft -AutoSize StartTime,RequestorName

    This results in the following output:

    StartTime

    RequestorName

    ---------

    -------------

    2013-05-12T21:49:18.2113618Z

    ServiceHealthMSExchangeReplEndpointRestart

    The RequestorName property indicates the name of the Responder that took the action. In this case, it was ServiceHealthMSExchangeReplEndpointRestart. Often, the responder name will give you an indication of the problem. Other times, you will want more details.

    Finding the Monitor that Triggers a Responder

    Monitors are the central part of Managed Availability. They are the primary means, through Get-ServerHealth and Get-HealthReport, by which an administrator can learn the health of a server. Recall that a Health Set is a grouping of related Monitors. This is why much of our troubleshooting documentation is focused on these objects. It will often be useful to know what Monitors and Health Sets are repeatedly unhealthy in your environment.

    Every time the Health Manager service starts, it logs events to the Microsoft.Exchange.ActiveMonitoring/ResponderDefinition crimson channel, which we can use to get the properties of the Responders we found in the last step by the RequestorName property. First, we need to collect the Responders that are defined:

    $DefinedResponders = (Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/ResponderDefinition | % {[xml]$_.toXml()}).event.userData.eventXml

    One of these Responder Definitions will match the Recovery Action’s RequestorName. The Monitor that controls the Responder we are interested in is defined by the AlertMask property of that Definition. Here are some of the useful Responder Definition properties:

    • TypeName: The full code name of the recovery action that will be taken when this Responder executes.
    • Name: The name of the Responder.
    • TargetResource: The object this Responder will act on.
    • AlertMask: The Monitor for this Responder.
    • WaitIntervalSeconds: The minimum amount of time to wait before this Responder can be executed again. There are other forms of throttling that will also affect this Responder.

    To get the Monitor for the ServiceHealthMSExchangeReplEndpointRestart Responder, you run:

    $DefinedResponders | ? {$_.Name –eq "ServiceHealthMSExchangeReplEndpointRestart"} | ft -a Name,AlertMask

    This results in the following output:

    Name

    AlertMask

    ----

    ---------

    ServiceHealthMSExchangeReplEndpointRestart

    ServiceHealthMSExchangeReplEndpointMonitor

    Many Monitor names will give you an idea of what to look for. In this case, the ServiceHealthMSExchangeReplEndpointMonitor Monitor does not tell you much more than the Responder name did. The Technet article on Troubleshooting DataProtection Health Set lists this Monitor and suggests running Test-ReplicationHealth. However, you can also get the exact error messages of the Probes for this Monitor with a couple more commands.

    Finding the Probes for a Monitor

    Remember that Monitors have their definitions written to the Microsoft.Exchange.ActiveMonitoring/MonitorDefinition crimson channel. Thus, you can get these in a similar way as the Responder definitions in the last step. You can run:

    $DefinedMonitors = (Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/MonitorDefinition | % {[xml]$_.toXml()}).event.userData.eventXml

    Some useful properties of a Monitor definition are:

    • Name: The name of this Monitor. This is the same name reported by Get-ServerHealth.
    • ServiceName: The name of the Health Set for this Monitor.
    • SampleMask: The substring that all Probes for this Monitor will have in their names.
    • IsHaImpacting: Whether this Monitor should be included when HaImpactingOnly is specified by Get-ServerHealth or Get-HealthReport.

    To get the SampleMask for the identified Monitor, you can run:

    ($DefinedMonitors | ? {$_.Name -eq ‘ServiceHealthMSExchangeReplEndpointMonitor’}).SampleMask

    This results in the following output:

    ServiceHealthMSExchangeReplEndpointProbe

     

    Now that we know what Probes to look for, we can search the Probes’ definition channel. Useful properties for Probe Definitions are:

    • Name: The name of the Probe. This will begin with the SampleMask of the Probe’s Monitor.
    • ServiceName: The Health Set for this Probe.
    • TargetResource: The object this Probe is validating. This is appended to the Name of the Probe when it is executed to become a Probe Result ServiceName.
    • RecurrenceIntervalSeconds: How often this Probe executes.
    • TimeoutSeconds: How long this Probe should wait before failing.

    To get definitions of this Monitor’s Probes, you can run:

    (Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/ProbeDefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ? {$_.Name -like “ServiceHealthMSExchangeReplEndpointProbe*”} | ft -a Name, TargetResource

    This results in the following output:

    Name

    TargetResource

    ----

    --------------

    ServiceHealthMSExchangeReplEndpointProbe/ServerLocator

    MSExchangeRepl

    ServiceHealthMSExchangeReplEndpointProbe/RPC

    MSExchangeRepl

    ServiceHealthMSExchangeReplEndpointProbe/TCP

    MSExchangeRepl

    Remember, not all Monitors use synthetic transactions via Probes. See this blog post for the other ways Monitors collect their information.

    This Monitor has three Probes that can cause it to become Unhealthy. You’ll see that they are named such that each is named with the Monitor’s SampleMask, but are then differentiated. When getting the Probe Results in the next step, the Probes will also have the TargetResource in their ServiceName.

    Now that we know all the Probes that could have failed, but we don’t yet know which did or why.

    Getting Probe Error Messages

    There are many Probes and they execute often, so the channel where they are logged (Microsoft.Exchange.ActiveMonitoring/ProbeResult) generates a lot of data. There will often only be a few hours of data, but the Probes we are interested in will probably have a few hundred Result entries. Here are some of the Probe Result properties you may be interested in for troubleshooting:

    • ServiceName: The Health Set of this Probe.
    • ResultName: The Name of this Probe, including the Monitor’s SampleMask, an identifier of the code this Probe executes, and the resource it verifies. The target resource is appended to the Probe’s name we found in the previous step. In this example, we append /MSExchangeRepl to get ServiceHealthMSExchangeReplEndpointProbe/RPC/MSExchangeRepl.
    • Error: The error returned by this Probe, if it failed.
    • Exception: The callstack of the error, if it failed.
    • ResultType: An integer that indicates one of these values:
      • 1: Timeout
      • 2: Poisoned
      • 3: Succeeded
      • 4: Failed
      • 5: Quarantined
      • 6: Rejected
    • ExecutionStartTime: When the Probe started.
    • ExecutionEndTime: When the Probe completed.
    • ExecutionContext: Additional information about the Probe’s execution.
    • FailureContext: Additional information about the Probe’s failure.

    Some Probes may use some of the other available fields to provide additional data about failures.

    We can use XPath to filter the large number of events to just the ones we are interested in; those with the ResultName we identified in the last step and with a ResultType of 4 indicating that they failed:

    $replEndpointProbeResults = (Get-WinEvent –ComputerName <Server> -LogName Microsoft-Exchange-ActiveMonitoring/ProbeResult -FilterXPath "*[UserData[EventXML[ResultName='ServiceHealthMSExchangeReplEndpointProbe/RPC/MSExchangeRepl'][ResultType='4']]]" | % {[XML]$_.toXml()}).event.userData.eventXml

    To get a nice graphical view of the Probe’s errors, you can run:

    $replEndpointProbeResults | select -Property *Time,Result*,Error*,*Context,State* | Out-GridView

    image

    In this case, the full error message for both Probe Results suggests making sure the MSExchangeRepl service is running. This actually is the problem, as for this scenario I restarted the service manually.

    Summary

    This article is a detailed look at how you have access to an incredible amount of information about the health of Exchange Servers.  Hopefully, you will not often need it! In most cases, the alerts will be enough notification and the included cmdlets will be sufficient for investigation.

    Managed Availability is built and hardened at scale, and we continuously analyze these same events collected in this article so that we can either fix root causes or write Responders to fix more problems before users are impacted. In those cases where you do need to investigate a problem in detail, we hope this post is a good starting point.

    Abram Jackson

  • Exchange Server 2013 Architecture Poster PDF Download Available

    ExchangePoster_Final_052313

    We just released a downloadable PDF version of the Exchange Server 2013 Architecture Poster. This is the poster that we handed out at the Office booth and in various Exchange 2013 breakout sessions last week at TechEd North America 2013 in New Orleans, LA.  We’ll also be handing out printed copies of the poster at TechEd Europe 2013 in Madrid, Spain in a couple of weeks.

    While we cannot provide printed copies for everyone, you can download the PDF file and take it to your favorite printer/copy center, and have them print it for you.  It is designed to be printed in 36” x 24” format.

    This poster highlights the significantly updated and modernized architecture in Exchange 2013, and highlights the new technologies in Exchange 2013, such as Managed Availability, the new storage and high availability features, and integration with SharePoint and Lync.  In addition, it illustrates the new transport architecture in Exchange 2013.

    A zoom.it version of the poster can be found at http://zoom.it/BuoF.

    We welcome your feedback on the poster.  If you have any, please feel free to send it to  eapf@microsoft.com.

    Scott Schnoll

  • Outlook Connectivity to Office 365 Troubleshooter Now Available

    It is no secret that if you are an Exchange/Office 365 administrator you will no doubt have to troubleshoot Outlook connectivity at some point. Whether you use Exchange Online, on-premises, or some combination of both, you will inevitably have an issue with Outlook performance, connectivity, profile corruption, or some other unknown Outlook disease before retirement.

    To assist you with these issues, we have released a Guided Walk Through (GWT) for troubleshooting Outlook Connectivity issues in Office 365.  There are a couple of ways to access the troubleshooter.  You can access it directly at: http://aka.ms/outlookconnectivity

    In addition, it will be embedded in various Outlook connectivity technical resources such as the following:

    The purpose of this walk through is to assist you in resolving these complex issues by focusing on the scoping and steps used to isolate and resolve problems. Therefore the walk through starts by focusing on commonly encountered symptoms related to Outlook connectivity.

    image

    Consider that there might not be a single solution, but a combination of factors contributing to the problem. Following the walk through will allow you to isolate and remedy the most common causes of Outlook connectivity issues to Office 365.

    This walk through is not meant to replace all of the data that helps you understand Outlook connectivity issues, but rather quickly give you the steps you need to help find the solution. The walk through focuses on all version of Office 365.

    I wanted to thank the people who helped make this a reality. Here are the parties involved (that I am aware of):

    Exchange/Outlook support:

    • Kevyn Pietsch
    • Timothy Heeney
    • Nitin Shukla
    • Nagesh Mahadev
    • Jeff Miller
    • Jon Bradley
    • Jeremy Hayes

    Documentation / content creation teams:

    • Charlotte Raymundo
    • Serdar Soysal
    • Geoffrey Crisp
    • Star Li
    • Chen Jiang

    Nagesh Mahadev

  • Per-Server Database Limits Explained

    Over the past year, we have discussed the architectural changes that have been introduced in Exchange Server 2013. I wrote about the reduction in complexity that the new server role architecture introduces, as well as, the one of the new capabilities introduced in Exchange 2013, Managed Availability’s recovery oriented computing. However, we haven’t been clear on other architectural changes that have shaped decisions we’ve made about the Exchange 2013 product. For example, the decision on reducing the number of databases supported per-server from 100 to 50. There were three main reasons for this:

    1. Server architecture changes
    2. Use of commodity hardware
    3. Testing

    Let me explain each of these in more detail.

    Server Architecture

    Exchange 2013 includes fundamental changes to the search and store components and data is processed and rendered.

    The old content indexing service was replaced with Search Foundation. Search Foundation is an actively developed search platform that is used across the Office Server products. Search Foundation allows us to have notification-driven content indexing which improves indexing performance; in addition, we now annotate during transport, reducing the number of times a message must be indexed significantly.

    The monolithic store.exe process was re-written; store is now written in managed code and there are now at least three processes that make up the Information Store service: The Microsoft Exchange Replication service, the Information Store service process controller, and the Information store worker process. By utilizing the worker process model, each database is now isolated from every other database (e.g., a database crashing due to a malformed message will not bring down the rest of the databases on the server).

    In addition, there is a core shift in the server role architecture such that the protocol responsible for servicing a user’s request is the protocol instance that is local to the user’s active mailbox database copy. This means that the Mailbox server role now performs more work when compared to its Exchange 2010 counterpart.

    The end result is that with the server architecture changes we introduced in Exchange 2013, search, store, and the protocols typically can be CPU and memory bound, as opposed to disk IO or capacity bound.

    Commodity Hardware

    As discussed in our server sizing guidance, we are big fans of commodity server hardware. Office 365 is designed to run on commodity hardware that leverages 2 processor sockets and 12 disks – we do not leverage external storage chassis as this increases the operational complexity in the environment. Our Exchange 2013 Mailbox servers have less than 50 database copies per-server in Office 365.

    Testing

    The last reason as to why we limited support to 50 databases per-server is that we did not have actual deployments at any scale to validate that store, search, the protocols, and Managed Availability could handle 100 databases per-server. Automation and lab testing can only take you so far; the lack of real world usage was one of the key reasons why we chose to limit the database count.

    Moving Forward

    The Exchange Product Group takes pride in the feedback mechanisms we have invested in with the Exchange community. Since the release of Exchange 2013, we’ve received an inordinate amount of feedback regarding the reduction in supported databases per-server. The driving response has been “we currently deploy more than 50 databases per-server in Exchange 2010; with this change, this means we will need to deploy more servers, which increases our capital expenditures significantly.” Rest assured, that is not the message we want with Exchange 2013. It is true that Exchange 2013 utilizes more CPU and memory than its predecessors – this is due to the architecture changes we’ve made, as well as the changes we’ve made to reduce disk IO, so that you can deploy more mailboxes per disk. But we do not want to see architectures artificially limited by the supported databases per-server constraint.

    Over the last several months, we’ve been working to resolve our concerns and improve our test matrices to validate supporting more databases/server.

    As a result of the work done by the Mailbox Intelligence team and Operations teams, I am pleased to announce that when Exchange Server 2013 RTM Cumulative Update 2 (CU2) releases we are increasing the number of databases per-server back to 100. Both the Exchange 2013 Server Role Calculator and our sizing guidance will be updated to include this architectural change in tandem with CU2’s release.  CU2 will release later this summer.

    As always, we continue to identify ways to better serve your needs through our regular servicing releases. We hope you find this architectural change useful. Please keep the feedback coming, we are listening.

    Ross Smith IV
    Principal Program Manager
    Exchange Customer Experience

  • The Hybrid Free Busy Troubleshooter Now Available

    As customers move their organization into the Cloud or choose to coexist, there is a need to ensure that some of the basic functionalities users have grown accustomed to, continue to work. While some of you will move all of the users in a cutover fashion which reduces complexity, others will choose a more gradual approach. This troubleshooter is for administrators that have chosen the hybrid approach.

    Are you seeing the hash marks in your hybrid Exchange environment as depicted below and want to get rid of them? Then this troubleshooter is for you.

    image

    The reason we focused on a troubleshooter for Free Busy is because it is the most commonly used “feature set” in a hybrid deployment. If you were to resolve issues with Free Busy lookups, many of the other potential issues you have with your hybrid deployment would be resolved as well.

    What is a Hybrid Deployment?

    A Hybrid Deployment consists of an on-premises Exchange server environment that has at least one Exchange 2010 or Exchange 2013 server. In this environment there is also a DirSync (Directory Synchronization) server, and in many cases, a deployment of ADFS (Active Directory Federation Services) to provide single sign-on capabilities to the users.

    The idea of the hybrid environment is to allow two separate organizations (Exchange Online and Exchange On-Premises) to feel like one organization. To accomplish this, we rely on a token authorization process that is made possible through a combination of Organizations Relationships and Federation Trusts with the Microsoft Federation Gateway.

    When this is configured properly, you can do basic things like redirect OWA requests to their proper destination, see “MailTips” for a user, and of course the most common feature, view availability information for another user cross-premises.

    To read more about Hybrid Deployments click here.

    This sounds hard to configure. How can I avoid issues?

    If you are the type that does not like running into issues you can attempt to avoid them, all you have to do is deploy using the Hybrid Configuration Wizard and the Exchange Deployment Assistant. These tools have been designed to get you into an optimal Hybrid configuration which should limit the amount of issues you face. However, with all of the moving parts involved and numerous variants in the on-premises deployments you could still run into issues.

    You may ask, “Why do I need a troubleshooter? I use Bing or I get Scroogled.”

    When working with customers and engineers, we have found that the troubleshooting steps that need to be followed are not very clear. There is confusion on what steps are applicable when free busy works in one direction (Cloud to on-premises), but not in the other (on-premises to Cloud). While searching Bing for answers can definitely lead to a solution, we believe we can be more expedient by using the troubleshooter to target solutions at your specific symptom.

    The troubleshooter can be found here or at the following simple URL: http://aka.ms/hybridfreebusy

    Thanks to Charlotte Raymundo, Nagesh Mahadev, Edgar Quevedo, John Chappelle, Geoffrey Crisp, Star Li and Chen Jiang for their help in creation and review of this troubleshooter.

    Timothy Heeney

  • Exchange at TechEd North America 2013

    TechEd North America 2013 happens next week in New Orleans, Louisiana. This year, there are several Exchange and Office 365 break-out sessions and hands-on labs for IT pros and developers, including sessions on Exchange 2013 high availability, virtualization, hybrid deployments, managed availability, retention, archiving & eDiscovery, DLP, site mailboxes, modern public folders, transport, unified messaing, Outlook Web App, EWS, and more!

    Recorded sessions are now available on Channel 9. Use the links below to view a session, or head over to TechEd North America 2013 on Channel 9 for more, including the keynote presentation by Brad Anderson.

    Monday, June 3, 2013
    1:15 PM-2:30 PM OUC-B206 - A Look Inside Microsoft Office 365 Alistair Speirs
    OUC-B215 - Understanding Compliance in Microsoft Exchange, SharePoint, and Office Bharat Suneja
    3:00 PM-4:15 PM SES-B205 - Overview of eDiscovery across the Microsoft Office Platform Georgiana Badea
    OUC-B202 - Choosing the Right Cloud Service Alexander Bradley & Danny Burlage
    OUC-B315 - Microsoft Exchange Server 2013 Managed Availability Ross Smith IV
    3:00 PM-4:15 PM OUC-B313 - Microsoft Exchange Server 2013 Client Access Server Role Greg Taylor
    4:45 PM-6:00 PM OUC-B334 - Migration and Coexistence with Microsoft Lync Server 2013 Justin Morris
    Tuesday, June 4, 2013
    8:30 AM-9:45 AM OUC-B211 - Overview of Microsoft Office 365 Identity Management Paul Andrew
    OUC-B314 - Microsoft Exchange Server 2013 High Availability and Site Resilience Scott Schnoll
    OUC-B327 - Microsoft Lync Hybrid Scenarios Abi Maggu
    10:15 AM-11:30 AM OUC-B203 - Collaborating with the New Microsoft Office Web Apps Amanda Lefebvre, Nick Simons & Dan Zarzar
    OUC-B305 - Enterprise Network Requirements for Microsoft Lync Server 2013 Bryan Nyce
    OUC-B317 - Microsoft Exchange Server 2013 Sizing Jeff Mealiffe
    1:30 PM-2:45 PM OUC-B201 - Become a Microsoft Office Ninja in 60 Minutes Tal Kryzpow
    OUC-B319 - Microsoft Exchange Server 2013 Transport Architecture Ross Smith IV
    OUC-B333 - Lap Around the Microsoft Lync 2013 Developer Platform Girija Bhagavatula, & Albert Kooiman
    3:15 PM-4:30 PM OUC-B208 - Deploying Microsoft Office? Begin Here! Jill Maguire & Curtis Sawin
    OUC-B304 - Developing Mobile Apps with Microsoft Exchange Web Services Paul Robichaux
    OUC-B324 - Planning and Deploying Your Enterprise Voice Geoff Clark
    OUC-B326 - Virtualization in Microsoft Exchange Server 2013 Jeff Mealiffe
    5:00 PM-6:15 PM OUC-B217 - Microsoft Office 365 Pro Plus Adoption and Change Management Brent Whichel
    OUC-B307 - Get Moving with Your Mailbox!Jaap Wesselius
    OUC-B332 - Planning and Deploying Conferencing in Microsoft Lync Server 2013 Scott Johnson & Andrew Sniderman
    OUC-B316 - Microsoft Exchange Server 2013 On-Premises Upgrade and Coexistence Robert Gillies/span>
    Wednesday June 5, 2013
    8:30 AM-9:45 AM OUC-B209 - Microsoft Office 365 for Education: Overview and Upgrades Jim Lucey
    OUC-B311 - Microsoft Exchange Hybrid Deployment and Migration On Your Terms Neil Axelrod
    OUC-B405 - Deep Dive into New Unified Communications Web API of Lync 2013 Girija Bhagavatula & Albert Kooiman
    10:15 AM-11:30 AM OUC-B328 - Planning and Deployment for Edge Server with Microsoft Lync Server 2013 Bryan Nyce
    OUC-B205 - Security in Microsoft Office 365 Paul Andrew & Andy O'Donald
    OUC-B310 - Microsoft Exchange Archiving Policy: Move, Delete, or Hold Dheepak Ramaswamy
    OUC-B210 -Team Collaboration with Site Mailboxes Alfons Staerk
    1:30 PM-2:45 PM OUC-B214 - The Deep Dark Secrets of Unified Messaging J. Peter Bruzzese
    OUC-B331 - Voice Interoperability Fundamentals Francois Doremieux & Scott Johnson
    OUC-B218 - Understanding Immersive Productivity and Collaboration Experiences with Perceptive Pixel Devices Tim Bakke
    3:15 PM-4:30 PM OUC-B322 - Using Windows PowerShell Magic to Manage Microsoft Office 365 Danny Burlage
    OUC-B330 - Mobile Devices Deep Dive with Microsoft Lync Server 2013 Geoff Clark
    OUC-B318 - Microsoft Exchange Server 2013 Tips & Tricks Scott Schnoll
    5:00 PM-6:15 PM OUC-B212 - Help Small Businesses Seize the Day with Microsoft Office 365Andy O'Donald
    OUC-B301 - Data Loss Prevention in Microsoft Exchange and Microsoft Outlook 2013 Jack Kabat
    OUC-B320 - Microsoft System Center Advisor and System Center 2012 - Operations Manager: Better Together Nick Rosenfeld
    Thursday June 6, 2013
    8:30 AM-9:45 AM OUC-B02 - Deploying and Updating Microsoft Office 365 ProPlus with Click-to-Run Daniel H. Brown & Jeremy Chapman
    OUC-B312 - Microsoft Exchange in the Cloud: Scared of Losing Your Job? Jaap Wesselius
    OUC-B401 - Microsoft Lync Server 2013 Dial Plan and Voice Routing Deep Dive Geoff Clark & Bryan Nyce
    10:15 AM-11:30 AM OUC-B335 - Scripting and Automation for Microsoft Lync Kevin Peters
    OUC-B308 - Internals of Deploying the In-Place Archive: Online, On-Premises, or Hybrid Dheepak Ramaswamy & Bharat Suneja
    OUC-B207 - The New Outlook Web App: Designed for Touch and Offline Too! Kip Fern & Paul Limont
    1:00 PM-2:15 PM OUC-B216 - Microsoft Office 365 Service Communications Katy Olmstead
    OUC-B204 - Network Design and Deployment Strategies to Ensure Success for Microsoft Lync Server 2013 Enterprise Voice Manfred Arndt
    OUC-B329 - Modern Public Folders Overview, Migration and Microsoft Office 365 Siegfried Jagott
    2:45 PM-4:00 PM OUC-B321 - All about Archiving with Microsoft Lync Server 2013 Jason Collier
    OUC-B222 - Introducing Lync Room System David Groom
    OUC-B306 - Exchange Online Protection Wendy Wilkes
    OUC-B341 - Microsoft Office 365 Directory and Access Management with Windows Azure Active Directory Ross Adams, Paul Andrew & Jono Luk

    You can use the Schedule Builder on the TechEd web site to select the sessions you want to attend and sync session info with your Outlook calendar (and have the info handy on your mobile device). For more info, head over to the TechEd North America 2013 web site.

    If you’re attending, swing by the Micosoft Office booths to meet Exchange, SharePoint & Office team folks. We’d love to hear from you and answer your Exchange-related questions.

    Microsoft TechEd 2013 TLC Floor map

    Also check out the following posts from our friends in the Office team:

    We look forward to seeing you in New Orleans next week!

    Exchange Team

  • Released: Update Rollup 1 for Exchange Server 2010 SP3

    Update 6/13/13: we added a known issue with transport rules to the blog post below.

    Today the Exchange CXP team released Update Rollup 1 for Exchange Server 2010 SP3 to the Download Center.

    Note: Some of the following KB articles may not be available at the time of publishing this post.

    This update contains fixes for a number of customer-reported and internally found issues. For more details, including a list of fixes included in this update, see KB 2803727. We would like to specifically call out the following fixes which are included in this release:

    • 2561346 Mailbox storage limit error when a delegate uses the manager's mailbox to send an email message in an Exchange Server 2010 environment
    • 2756460 You cannot open a mailbox that is located in a different site by using Outlook Anywhere in an Exchange Server 2010 environment
    • 2802569 Mailbox synchronization fails on an Exchange ActiveSync device in an Exchange Server 2010 environment
    • 2814847 Rapid growth in transaction logs, CPU use, and memory consumption in Exchange Server 2010 when a user syncs a mailbox by using an iOS 6.1 or 6.1.1-based device
    • 2822208 Unable to soft delete some messages after installing Exchange 2010 SP2 RU6 or SP3

    For DST changes, see Daylight Saving Time Help and Support Center (microsoft.com/time).

    A known issue with Exchange 2010 SP3 RU1 Setup

    You cannot install or uninstall Update Rollup 1 for Exchange Server 2010 SP3 on the double-byte character set (DBCS) version of Windows Server 2012 if the language preference for non-Unicode programs is set to the default language. To work around this issue, you must first change this setting. To do this, follow these steps:

    1. In Control Panel, open the Clock, Region and Language item, and then click Region.
    2. Click the Administrative tab.
    3. In the Language for non-Unicode programs area, click Change system locale.
    4. On the Current system locale list, click English (United States), and then click OK.

    After you successfully install or uninstall Update Rollup 1, revert this language setting, as appropriate.

    We have identified the cause of this problem and plan to resolve it in a future rollup, but did not want to further delay the release of RU1 for customers who are not impacted by it.

    A known issue with transport rules after E2010 SP3 RU1 is installed

    We have an issue where the messages stick in poison queue and transport continually crashes after this rollup is applied.

    We have gathered enough information and have determined the issue.  Specifically, the issue is caused by a transport rule (disclaimer) attempting to append the disclaimer to the end of HTML formatted messages.   When this occurs, messages will be placed in the poison queue and the transport service will crash with an exception.  We are investing resources to develop a code fix.  You can either disable or reconfigure the disclaimer transport rule.

    Exchange Team

  • Comparing public folder item counts

    A question that is often asked of Support in regard to legacy Public Folders is whether they're replicating and how much progress they're making.  The most common scenario arises when the administrator is adding a new Public Folder database to the organization and replicating a large amount of data to it.  What commonly happens is that the administrator calls Support and says:

    The database on the old server is 300GB, but the new database is only 150GB!  How can I tell what still needs to be replicated?  Is it still progressing??

    You can raise diagnostic logging for public folders, but reading the events to see which folders are replicating is tedious.  Most administrators want a more detailed way of estimating the progress of replication than comparing file sizes.  They also want to avoid checking all the individual replication events.

    There are a number of ways to monitor replication progress so that one can make an educated guess as to how long a particular environment will take to complete an operation.  In this post, I'm going to provide a detailed example of one approach to estimating the progress of replication by comparing item counts between different public folder stores.

    Getting Public Folder item counts

    To get the item counts in an Exchange 2003 Public folder database you can use PFDAVAdmin.  The process is outlined in this previous EHLO blog post.  For what we're doing below, you'll need the DisplayName, Folderpath and the total number of items in the folder. The rest of the fields aren't necessary.

    To get the item counts on an Exchange 2007 server, use (remember there is only one Pub per server):

    Get-PublicFolderStatistics -Server <servername> | Export-Csv c:\file1.txt

    To get the item counts on an Exchange 2010 server, you use:

    Get-PublicFolderStatistics -Server <servername> -ResultSize unlimited | Export-Csv c:\file1.txt

    Comparing item counts

    There are some very important caveats to this whole procedure.  The things you need to watch out for are:

    • We're only checking item counts.  If you delete 10 items and add 10 items between executions of the statistics data, gathering this type of query will not reveal whether they have replicated.  Therefore, having the same number on both sides is not necessarily an assurance that the folders are in sync
    • If you're comparing folders that contain recurring meetings, the item counts can be different on Exchange 2007 and older because of the way WebDAV interacts with those items.
    • I've seen many administrators try to compare the size of one Public Folder database to the size of another.  Such an approach to checking on replication does not take into account space for deleted items, overhead and unused space.  Checking item counts is more reliable than simply comparing item sizes
    • The two databases might be at very different stages of processing replication messages.  It is unlikely that both pubs will present the same numbers of items if the folders are continuously active.  Even if the folders are seeing relatively low activity levels, it's not uncommon for the item count to be off by one or two items because the replication cycle (which defaults to every 15 minutes) simply hasn’t gotten to the latest post
    • If you really want to know if two replicas are in sync, try to remove one.  If Exchange lets you remove the instance, you know Exchange believes the folders are in sync.  If Exchange cannot confirm the folders are in sync, it'll keep the instance until it can complete the backfill from it.  In most cases, the administrators I have spoken with are not in a position where they can use this approach.

    For the actual comparison you can use any number of products.  For this blog I have chosen Microsoft Access for demonstrating the process of comparing the CSV files from the different servers.  To keep things simple I am going to use the Access database.  There are some limitations to my approach:

    • Access databases have a maximum file size of 2GB. If your public folder infrastructure is particularly large (i.e.  your CSV files are over 500MB) you may have to switch to using Microsoft SQL.
    • I am not going to compare public folders with a Folder path greater than 254 characters because the Jet database engine that ships with Access cannot join memo fields in a query.  Working around the join limitation by splitting the path across multiple text fields is beyond the scope of this blog.
    • I am going to look at folders that exist in both CSV files.   If the instance has not been created and its data exported into the CSV file the folder will not be listed.

    An outline of the process is:

    1. Export the item counts from the two servers you wish to compare
    2. Import the resulting text files
    3. Clean up the data for the final query
    4. Run a query to list the item counts for all folders that are in Both files and the difference in the item counts between the originally imported files

    Assumptions for the steps below:

    • You have exported the public folder statistics with the PowerShell commands presented above
    • You have fields named FolderPath, ItemCount and Name in the CSV file

    If your file is different than expected you will have to modify the steps as you go along

    Here are the steps for conducting the comparison:

    1. Create a new blank Microsoft Access database in a location that has more than double the size of your CSV files available as free space.

    2. By default, the Export-Csv cmdlet includes the .NET type information in the first line of the CSV output. Because this line will interfere with the import, we'll need to remove it.  Open each CSV file in notepad (this can take a while for larger files) and remove the line highlighted below.  In this example the line starting with “AdminDisplayName” would become the topmost line of the file.  Once the top line is deleted close and save the file.

    image
    Figure 1

    TIP You can avoid this step by including the -NoTypeInformation switch when using the Export-CSV cmdlet, which filters out the .NET object type information from the CSV output. For details, see Using the Export-Csv cmdlet on TechNet. (Thanks to #MSExchange MVP @SteveGoodman for the tip!)

    3. Import the CSV file to a new table:

    • Click on the External Data tab as highlighted in Figure 2
    • Browse to the CSV file and select it (or type in its path and name directly)
    • Make sure the “Import the source data into a new table in the current database’ option is selected
    • Click OK

    image
    Figure 2

    4. In the wizard that starts specify the file is delimited as shown and then click Next.

    image
    Figure 3

    5. Tell the wizard that the text qualifier is the double quote (character 34 in ASCII), the delimiter is the comma and that the “First Row Contains Field Names” as shown in Figure 4.

    Note:  It is possible that you will receive a warning when you click “First Row Contains Field Names”.  If any of the field names violate the rules for a field name Access will display a warning.  Don’t panic.  Access will replace the non-conforming names with ones it considers appropriate (typically Field1, Field2, etc.).  You can change the names if you wish on the Advanced screen.

    image
    Figure 4

    6. Switch to Advanced view (click the Advanced button highlighted in Figure 4) so that we can change the data type of the FolderPath field.  In Access 2010 and older the data type needs to be changed from Text to Memo.  In Access 2013 it needs to be changed from Short Text to Long Text.  While we are in this window you have the option to exclude columns that are not needed by placing a checkmark in the box from the skip column.  In this blog we are only going to use the FolderPath, name and the item count.  You can also exclude fields earlier in the process by specifying what fields will be exported when you do the export-csv.  The following screenshots show the Advanced properties window.

    image
    Figure 5a: Access 2010 and older

    image
    Figure 5b: Access 2013

    Note:  If you think you will be doing this frequently you can use the Save As button to save your settings.  The settings will be saved inside the Access database and can then be selected during future imports by clicking on the Specs button.

    7. Click OK on the Advanced dialog and then click Finish in the wizard.

    8. When prompted to save the Import steps click Close.  If you think you will be repeating this process in the future feel free to explore saving the import steps.

    9. Access will import the data into a table.  By default the table will have the same name as the source CSV file.  The files used in creating this blog were called 2007PF_120301 and 2010 PF_120301.  If there are any import errors they will be saved in a separate table.  Take a moment to examine what they are.  The most common is that a field got truncated.  If that field is the folderpath it will affect the comparisons later.  If there are other problems you will have to troubleshoot what is wrong with the highlighted lines (typically there should be no import errors as long as the FolderPath is set as a Memo field).

    10. Go back to Step 2 to import the second file that will be used in the comparison. 

    11. Now a query must be run to determine if any folderpath exceeds 255 characters.  Fields longer than 255 characters cannot be used for a join in an Access query.  If we have values that exceed 255 characters in this field we will need to exclude them from the comparison.  Additional work to split a long path across multiple fields can be done, but that is being left as an exercise for any Access savvy readers. 

    12. To get started select the options highlighted in Yellow in Figure 6:

    image
    Figure 6

    13. Highlight the table where we want to check the length of the folderpath field as shown in Figure 7.  Once you have selected the table click Add and then Close:

    image
    Figure 7

    14. Switch to SQL view as shown in Figure 8:

    image
    Figure 8

    15. Replace the default select statement with one that looks like this (please make sure you substitute your own table name for the one that I have Bolded in the example):

    SELECT Len([FolderPath]) AS Expr1, [2007PF_120301].FolderPath
    FROM 2007PF_120301
    WHERE (((Len([FolderPath]))>254));

    Note:  Be sure the semi-colon is the last character in the statement.

    16. Run the query using the red “!” as shown in Figure 9: 

    image
    Figure 9

    image
    Figure 10

    17. If the result is a single empty row (as shown in Figure 10) then skip down to step 19.  If the result is at least one row then go back to SQL view (as shown in Figure 8) and change the statement to look like this one (as before please make sure 2007PF_120301 is replaced with the table name actually being used in your database):

    SELECT [2007PF_120301].FolderPath, [2007PF_120301].ItemCount,
    [2007PF_120301].Name, [2007PF_120301].Identity INTO 2007PF_120301_trimmed
    FROM 2007PF_120301
    WHERE (((Len([FolderPath]))<255));

    18. You will get a prompt like the one in Figure 11 when you run the query.  Select Yes:

    image
    Figure 11

    19. After it is done repeat steps 11-18 for the other CSV file that was imported to be part of the comparison.  If you have done steps 11-18 for both files you will be comparing then advance to step 20.

    20. Originally the FolderPath was imported as a memo field (Long Text if using Access 2013).  However we cannot join memo fields in a query.  We need to convert them to a text field with a length of 255. 

    If you got a result greater than zero rows in step 16 this step and the subsequent steps will all be carried out on the table specified in the INTO clause of the SQL statement (in this blog that table is named 2007PF_120301_trimmed). 

    If you were able to skip steps 17 and 18 this step and the subsequent steps will be carried out on the table you imported (2007PF_120301 in this example).

    Open the table in Design view by right-clicking on it and selecting Design View as shown in Figure 12.  If you select the wrong tables for the subsequent steps you will get a lot of unwanted duplicates in your final comparison output.

    image
    Figure 12

    21. Change the folderpath from Memo to Text as shown in Figure 13.  If you are using Access 2013 change it from Long Text to Short Text.

    image
    Figure13

    22. With the FolderPath field highlighted look to the lower part of the Design window where the properties of the currently selected field are displayed.  Change the field size of folderpath to 255 characters as shown in Figure 14.

    image
    Figure 14

    23. Save the table and close its design view.  You will be prompted as shown in Figure 15.  Don’t panic.  All the folderpaths should be shorter than the 255 characters specified in the properties of the table.  The dialog is just a standard warning from Access.  No data should be truncated (the earlier queries should have seen to that).  Say Yes and repeat steps 20-23 for the other table being used in this comparison.  If you make a mistake here remember that you will still have your original CSV files and can always fix the mistake by removing the tables and redoing the import.

    image
    Figure 15

    24. We have been on a bit of a journey to make sure we prepared the tables.  Now for the comparison.  Create a new query (as shown in Figure 6) and highlight both tables that have had the FolderPath shortened to 255 characters as shown in Figure 16.  Once they are highlight click Add and then close.

    image
    Figure 16

    25. Drag Folderpath from the table that is the source of your replication to Folderpath on the other database.  The result will look like Figure 17.

    image
    Figure 17

    26.   In the top half of the Query Design window we have the tables with their fields listed.  In the bottom half we have the query grid.  You can make fields appear in the grid in 3 ways:

    • Switch the SQL view and add them to the Select statement
    • Double-click the field in the top half of the window
    • Drag the field from the top half of the window to the grid
    • Click in the Field line of the grid and a drop down will appear that you can use to select the fields
    • Type the field name you want on the Field in the grid

    For this step we need to add:

    • One copy of the folderpath field from one table (doesn’t matter which one)
    • The ItemCount field from each table

    27.   Go to an empty column in the grid.  We need to enter the text that will tells us the difference between the two item counts.  Type the following text into the column (be sure to use the table names from your own database and not my example): 

    Expr1:  Abs([2007PF_120301_trimmed].[itemcount]-[2010pf_120301_trimmed].[itemcount])

    Note:  After steps 25-27 the final result should look like  Figure 18.  The equivalent SQL looks like this:

    SELECT [2007PF_120301_trimmed].FolderPath, [2007PF_120301_trimmed].ItemCount, [2010PF_120301_trimmed].ItemCount, Abs([2007PF_120301_TRIMMED].[ItemCount]-[2010PF_120301_TRIMMED].[ItemCount]) AS Expr1
    FROM 2007PF_120301_trimmed INNER JOIN 2010PF_120301_trimmed ON [2007PF_120301_trimmed].FolderPath = [2010PF_120301_trimmed].FolderPath;

    image
    Figure 18

    28. Run the query using the red “!” shown in Figure 9.  The results will show you all the folders that exist in BOTH public folder databases, the itemscount in each database and the difference between them.  I like the difference reported as a positive number, but you might prefer to remove the absolute value function.

    There is more that can be done with this.  You can use Access to run a Find Unmatched query to find all items from one table that are not in the other table (thus locating folders that have an instance in one database, but not the other).  You can experiment with different Join types in the query and you can deal with Folderpaths longer than a single text field can accommodate.  These and any other additional functionality you desire are left as an exercise for the reader to tackle.  I hope this provides you with a process that can be used to compare the item counts between two Public Folder stores (just remember the caveats at the top of the article).

    Thanks To Bill Long for reviewing my caveats and Oscar Goco for reviewing my steps with Access.

    Chris Pollitt

  • Exchange 2013 Server Role Requirements Calculator Release Notes

    Change log for the Exchange Server 2013 Server Role Requirements calculator.

    Version 6.6

    • Fixed circular logic issue with initial mailbox size calculation

    Version 6.5

    • New Functionality – The calculator now includes mailbox space modeling graphs that extrapolates (for each mailbox tier) the projected amount of time it will take to consume the mailbox quota.
    • Fixed "Number of Exchange Data Volumes per Server" to support more than 50 volumes.
    • Optimized memory sizing for FAST which reduces memory requirements for small mailbox server designs.
    • Added the ability to specify multiple AutoReseed volumes per DAG server to calculator and scripts.
    • Fixed 3 database/volume layout scenario involving 100 copies/server.
    • Fixed rounding error in calculating number of databases/volume in "2 Volumes / Backup Set"
    • Log isolation is now a calculated property to align with best practices guidance.
    • Changed "Disk" to "Vol" in left column of Distribution tab to align with scenarios that do not involve JBOD configurations.
    • Added additional processor core options.
    • Fixed JBOD storage design results table to accurately account for Restore Disk capacity being set to "--" and for differences between PDC and SDC Restore Disk capacity settings.
    • Fixed Backup Requirements worksheet to expose Weekly Full backups correctly.
    • Various comment changes/corrections.

    Version 6.3

    • Fixed Backup Requirements calculations to include greater than 50 databases.
    • Added additional processor core support.
    • Fixed the number of database volumes calculation when disk count is specified.
    • Fixed the database size calculation for A/P scenarios to match A/A scenario calculations.
    • Fixed the calculator to take into account halving database number per volume in non-site resilient scenarios.
    • Fixed conditional formatting errors on transport configuration settings.
    • Fixed transport sizing to take into account mailbox growth.
    • Updated CAS megacycle calculations to align with SP1 guidance.
    • Revised Dispart.ps1 script to create database mount points consistent with JetStress performance counters.
    • Added Calculator version number to record one field three of CSV export files.

    Version 6.1

    • Fixed operator mistake in calcNumActiveDBsSF formula
    • Fixed missed validation scenario where the calculator could recommend a copy count that could not be deployed on the custom disk architecture
    • Optimized remaining servers formula
    • Fixed an issue where single datacenter, multiple databases per volume architecture with lagged copies didn't calculate the correct number of
      copies/volume
    • Fixed VirtCPUFactor reference
    • Various comment changes
    • Improved cell highlights for insufficient disk capacity on storage design tab
    • Added additional storage capacities – 1.2TB, 6TB and 8TB
    • Fixed database count validation logic to take into consideration dedicated lagged copy servers

    Version 5.9

    • DAG script fix

    Version 5.8

    • Fixed VBA error "The object invoked is disconnected from its client" error when recalculating Distribution tab
    • Added validation check for per-server database limit
    • Improved conditional formatting for JBOD disk capacity/type alerts
    • Fixed conditional formatting bug on custom databases input
    • Fixed bugs and improved the CreateMBDatabases.ps1 and CreateDAG.ps1 scripts
    • Fixed disk function display name problem
    • Revised calulation of TotDBVolSpaceDAG on Volume Requirements tab to account for multiple databases per volume
    • Fixed bug when custom database size is set to zero
    • Fixed number of volumes for 48 copies/server in 2 volumes/backup set scenario
    • Removed 2nd site dependency for setting Alt FSW
    • Added support for 100 databases / server
    • Fixed bug with circular logging export
    • Fixed transcript bug in CreateMBDatabaseCopies script
    • Adjusted CI memory calculation for corner case scenarios
    • Fixed Shadow Effect calculation
    • Fixed mistakes in comments
    • Disabled AutoReseed when Log Isolation is enabled

    Version 5.6

    • Optimized Volume Design Architecture Formula.
    • Fixed Recommended Min Number of GC Cores (Secondary Datacenter)" calculation to use SDC instead of PDC CAS count.
    • Fixed CPU comments and removed erroneous information.
    • Fixed multiple conditional formatting bugs.
    • Fixed problem where this workbook had to be the active workbook at all times.
    • Fixed problem with extra-wide Fail Server button on the distribution worksheet.
    • Enabled variable based tracing.
    • Resolved VBA Divide by Zero error caused by DiskGroup = 0.
    • Fixed problem with lagged copies in conjunction with multiple databases per volume.
    • Fixed missing "\" character in path names in MailboxDatabases.csv file.
    • Fixed problem with WAN failure simulation.
    • Fixed calcNumAMBXServersDC2 to ensure it cannot have more servers that primary site.
    • Fixed calculated IO Multiplication factor formulas to take into consideration IOPS override.
    • Added condition to validate that there are enough copies for multiple databases/volume scenario.
    • Fixed conditional formatting and JBOD storage results when JBOD evaluation is disabled.
  • Released: Calculator Updates Galore!

    Today, we have released an updated version of the Exchange 2013 Server Role Requirements Calculator that addresses several issues found since its initial release.  You can view what changes have been made, or download the update directly.

    In addition, we are releasing an updated version of the Exchange 2010 Server Role Requirements Calculator as well. You can view what changes have been made, or download the update directly.

    Ross Smith IV
    Principal Program Manager
    Exchange Customer Experience

  • Ambiguous URLs and their effect on Exchange 2010 to Exchange 2013 Migrations

    With the recent releases of Exchange Server 2013 RTM CU1, Exchange 2013 sizing guidance, Exchange 2013 Server Role Requirements Calculator, and the updated Exchange 2013 Deployment Asistant, on-premises customers now have the tools you need to begin designing and performing migrations to Exchange Server 2013. Many of you have introduced Exchange 2013 RTM CU1 into your test environments alongside Exchange 2010 SP3 and/or Exchange 2007 SP3 RU10, and are readying yourselves for the production migrations.

    There's one particular Exchange 2010 design choice some customers made that could throw a monkey wrench into your upgrade plans to Exchange 2013, and we want to walk you through how to mitigate it so you can move forward. If you're still in the design or deployment phase of Exchange Server 2010, we recommend you continue reading this article so you can make some intelligent design choices which will benefit you when you migrate to Exchange 2013 or later.

    What is the situation we need to look for?

    In Exchange 2010, all Outlook clients in the most typical configurations will utilize MAPI/RPC or Outlook Anywhere (RPC over HTTPS) connections to a Client Access Server. The MAPI/RPC clients connect to the CAS Array Object FQDN (also known as the RPC endpoint) for Mailbox access and the HTTPS based clients connect to the Outlook Anywhere hostname (also known as the RPC proxy endpoint) for all Mailbox and Public Folder access. In addition to these primary connections, other HTTPS based workloads such as EAS, ECP, OAB, and EWS may be sharing the same FQDN as Outlook Anywhere. In some environments you may also be sharing the same FQDN with POP/IMAP based clients and using it as an SMTP endpoint for internal mail submissions.

    In Exchange 2010, the recommendation was to utilize split DNS and ensure that the CAS Array Object FQDN was only resolvable via DNS by internal clients. External clients should never be able to resolve the CAS Array Object FQDN. This was covered previously in item #4 of Demystifying the CAS Array Object - Part 2. If you put those two design rules together you come to the conclusion your ClientAccessArray FQDN used by the mailbox database RpcClientAccessServer property should have been an internal-only unique FQDN not utilized by any workload besides MAPI/RPC clients.

    Take the following chart as an example of what a suggested configuration in a split DNS configuration would have looked like.

    FQDNUsed ByInternal DNS resolves toExternal DNS resolves to
    mail.contoso.com All HTTPS Workloads Internal Load Balancer IP Perimeter Network Device
    outlook.contoso.com MAPI/RPC Workloads Internal Load Balancer IP N/A

    If your do not utilize split DNS, then a suggested configuration may have been.

    FQDNUsed ByDNS resolves to
    mail.contoso.com External HTTPS Workloads Perimeter Network Device
    mail-int.contoso.com Internal HTTPS Workloads Internal Load Balancer IP
    outlook.contoso.com Internal MAPI/RPC Workloads Internal Load Balancer IP

    In speaking with our Premier Field Engineers and MCS consultants, we learned that some of our customers did not choose to use a unique ClientAccessArray FQDN. This design choice may manifest itself in one of two ways. The MAPI/RPC and HTTPS workloads may both utilize the mail.contoso.com FQDN internally and externally, or a unique external FQDN of mail.contoso.com is used while internal MAPI/RPC and HTTPS workloads share mail-int.contoso.com. The shared FQDN in either situation is ambiguous because we can't look at it and immediately understand the workload type that's using it. Perhaps we were not clear enough in our original guidance, or customers felt fewer names would help reduce overall design complexity since everything appeared to work with this configuration.

    Take a look at the figure below and the FQDNs in use for some of the different workloads. Shown are EWS, ECP, OWA, CAS Array Object, and Outlook Anywhere External Hostname. The yellow arrow specifically points out the CAS Array Object, the value used as the RpcClientAccessServer for Exchange 2010 mailbox databases, and seen in the Server field of an Outlook profile for an Exchange 2010 mailbox.

    image
    An Exchange 2010 deployment with a single ambiguous URL for all workloads.

    Let us pause for a moment to visualize what we have talked about so far. If we were to compare an Exchange 2010 environment using ambiguous URLs to one not using ambiguous URLs, it would look like the following diagrams. Notice the first diagram below uses the same FQDN for Outlook MAPI/RPC based traffic and HTTPS based traffic.

    image

    If we were to then look at an environment not utilizing ambiguous URLs, we see the clients utilize unique FQDNs for MAPI/RPC based traffic and HTTPS based traffic. In addition, the FQDN utilized for MAPI/RPC based traffic is only resolvable via internal DNS.

    image

    If your environment does not look like the one above using ambiguous URLs, then you can go hit the coffee shop for a while or play some XBOX 360. Tell your boss we gave the okay. If your environment does look similar to the first example using ambiguous URLs or you are in the planning stages for Exchange 2010, then please read on as we need you to perform some extra steps when migrating to Exchange 2013.

    So what’s the big deal? It is functional this way isn’t it?

    While this may be working for you today, it certainly will not work tomorrow if you migrate to Exchange 2013. In this scenario where both the MAPI/RPC and HTTP workloads are using the same FQDN you cannot successfully move the FQDN to CAS 2013 without breaking your MAPI/RPC client connectivity entirely. I repeat, your MAPI/RPC clients will start failing to connect via MAPI/RPC once their DNS cache expires after the shared FQDN is moved to CAS 2013. The MAPI/RPC clients will fail to connect because CAS 2013 does not know how to handle direct MAPI/RPC connections as all Windows based Outlook clients utilize MAPI over a RPC over HTTPS connection in Exchange 2013. There is a chance your Outlook clients may successfully fall back to HTTPS only if Outlook Anywhere is currently enabled for Exchange 2010 when the failure to connect via MAPI/RPC takes place, but this article will help with the following.

    1. Ensure you are in full control of what will take place
    2. Ensure you are in full control of when #1 takes place
    3. Ensure you are in a supported server + client configuration
    4. Ensure environments with Outlook Anywhere disabled for Exchange 2010 know their path forward
    5. Help remove the possibility of any clients not automatically falling back to HTTPS
    6. Remove the potentially long delay when Outlook does fail to via MAPI/RPC even though it can resolve the MAPI/RPC URL and then falls back to HTTPS

    Shoot… this looks like us. What should we do immediately?

    First off, if you are still in the planning stages of Exchange 2010 you need to take our warning to heart and immediately change your design to use a specific internal-only FQDN for MAPI/RPC clients. If you are in the middle of a 2010 deployment using an Ambiguous URL I recommend you change your ClientAccessArray FQDN to a unique name and update the mailbox database RpcClientAccessServer values on all Exchange 2010 mailbox databases accordingly. Fixing this item mid-migration to Exchange 2010 or even in your fully migrated environment will ensure any newly created or manually repaired Outlook profiles are protected, but it will not automatically fix existing Outlook clients with the old value in the server field.

    While not necessary as long as you go through our mitigation steps below, any existing Outlook profiles could be manually repaired to reflect the new value. If you are curious why a manual repair is necessary you can refer to items #5 and #6 in Demystifying the CAS Array Object - Part 2. Again, forcing this update is not necessary if you follow our mitigation steps later in this article. However, if you were to choose to update some specific Outlook profiles we suggest you perform those steps in your test environment first to make sure you have the process down correctly.

    Additionally as we previously discussed in item #3 of Demystifying the CAS Array Object – Part 1, the ClientAccessArray FQDN is not needed in your SSL certificate as it is not being used for HTTPS based traffic. Because of this, the only thing you would need to do is create a new internal DNS record, update your ClientAccessArray FQDN, and finally update your Exchange 2010 Mailbox Database RpcClientAccessServer values. It bears repeating that you do not have to get a new SSL certificate only to fix an Ambiguous URL situation.

    Ok, fixed that… now what about the clients we don’t want to repair manually?

    Our suggestion is to implement Outlook Anywhere internally for all users prior to introducing Exchange Server 2013 to the environment.

    Many of our customers have already moved to Outlook Anywhere internally for all Windows Outlook clients. In fact, those of you reading this with OA in use internally are good to proceed to the coffee shop or go play XBOX 360 with the other folks if you’d like to.

    Now for the rest of you… sit a little closer. Go ahead and fill in, there are plenty of seats in the front row like usual.

    In Exchange Server 2013 all Windows Outlook clients operate in Outlook Anywhere mode internally. By following these mitigation steps you will be one step ahead of where you will end up after your migration to Exchange Server 2013 anyways.

    If you do not have Outlook Anywhere enabled at all in your environment, please see Enable Outlook Anywhere on TechNet for steps on how to enable it in Exchange 2010. If your company does not wish to provide external access for Outlook Anywhere that is ok. By simply enabling Outlook Anywhere you will not be providing remote access unless you also publish the /rpc virtual directory to the Internet.

    It is suggested customers, especially very large ones, consider enabling Kerberos authentication to avoid any potential performance issues you may run into utilizing the default NTLM authentication. Information on how to configure Kerberos Authentication can be found here on TechNet for Exchange Server 2010 and the steps for Exchange Server 2013 are similar which we will have documentation for in the near future. However, please keep in mind Kerberos authentication with Outlook Anywhere is only supported with Windows Vista or later.

    By default with Outlook Anywhere enabled in the environment your clients prefer RPC/TCP connections when on Fast Networks as seen below.

    image

    The trick we use to force Outlook Anywhere to also be used internally is via Autodiscover. Using Autodiscover we can make Windows Outlook clients prefer RPC/HTTPS on both Fast and Slow networks as seen here.

    image

    The method used to make clients always prefer HTTPS is configuring the OutlookProviderFlags option via the Set-OutlookProvider cmdlet. The following commands are executed from the Exchange 2010 Management Shell.

    Set-OutlookProvider EXPR -OutlookProviderFlags:ServerExclusiveConnect

    Set-OutlookProvider EXCH -OutlookProviderFlags:ServerExclusiveConnect

    If for any reason you need to put the configuration back to its default settings, issue the following commands and clients will no longer prefer HTTP on Fast Networks.

    Set-OutlookProvider EXPR -OutlookProviderFlags:None

    Set-OutlookProvider EXCH -OutlookProviderFlags:None

    You can prepare to introduce Exchange Server 2013 to your environment once all of your Windows Outlook clients are preferring HTTP on both fast and slow networks and are connecting through mail.contoso.com for RPC over HTTPS connections.

    There are a small number of things we would like to call out as you plan this migration to enable Outlook Anywhere for all internal clients.

    First, your front end infrastructure (CAS 2013, Load Balancer, etc…) must ready to immediately handle the full production load of Windows Outlook clients when you re-point the mail.contoso.com FQDN in DNS.

    Second, if your Exchange 2010 Client Access Servers were not scaled for 100% Outlook Anywhere connections then performance should be monitored when OA is enabled and all clients are moved from MAPI/RPC based to HTTPS based workloads. You should be ready to scale out your CAS 2010 infrastructure if necessary to mitigate any possible performance issues.

    Lastly, Windows Outlook clients older than Outlook 2007 are not supported going through CAS 2013 even if their mailbox is on an older Exchange version. All Windows Outlook clients going through CAS 2013 have to be at least the minimum versions supported by Exchange 2013. Any unsupported clients, such as Outlook 2003, do not support Autodiscover and would have to be manually with a new MAPI/RPC specific endpoint to assure they continue communicating with Exchange 2010 until the client can be updated and the mailbox migrated to Exchange 2013.

    Note: The easiest way to confirm what major/minor version of Outlook you have is to look at the version of OUTLOOK.EXE and EMSMDB32.DLL via Windows Explorer or to run an inventory report through Microsoft System Center Configuration Manager or similar software. The minimum version numbers Exchange Server 2013 supports for on-premises deployments are provided below.

    • Outlook 2007: 12.0.6665.5000 (SP3 + the November 2012 Public Update or any later PU)
    • Outlook 2010: 14.0.6126.5000 (SP1 + the November 2012 Public Update or any later PU)
    • Outlook 2013: 15.0.4420.1017 (RTM or later)

    If we were to visualize the mitigation steps from start to end we need to compare it between phases.

    First, the upper area of the below diagram depicts the start state of the environment with internal Windows Outlook clients utilizing MAPI/RPC and ambiguous URLs for their HTTPS based workloads. The lower area of the diagram depicts the same environment, but we have now forced Outlook Anywhere to be used by internal Windows Outlook clients. This change has forced all mailbox and public folder access traffic over HTTPS through the mail.contoso.com Outlook Anywhere FQDN.

    image

    We now have all Windows Outlook clients utilizing Outlook Anywhere internally by levering Autodiscover to force the preference of HTTPS. Now that all Windows Outlook traffic is routed through mail.contoso.com via HTTPS, the ambiguous URL problem has been mitigated. However, you may have other applications integrating with Exchange whom are unable to utilize Outlook Anywhere and/or Autodiscover. These applications will also be affected if you were to update the mail.contoso.com DNS entry to point at Exchange 2013. Before moving onto the second step it may be most efficient to add a HOSTS file entry on the servers hosting these external applications to force resolution of mail.contoso.com to the Layer-7 Load Balancer used by Exchange 2010. This should allow you to temporarily continue routing external application traffic that needs to talk to only Exchange 2010 via MAPI/RPC while you work on updating the applications to be Outlook Anywhere compatible, which they will need to be before they can ever connect to Exchange 2013.

    Having dealt with both the Windows Outlook clients and third-party applications whom cannot utilize Outlook Anywhere, we can now move onto the second step. The second step is executed when you are ready to introduce Exchange 2013 to the environment.

    The below diagram starts by showing where we finished after executing step one. The lower area of the below diagram shows that we have updated DNS to point the mail.contoso.com entry to the new IP of the new Exchange 2013 load balancer configuration. Because of the HOSTS entry we made our application server continues talking to the old Layer-7 load balancer for its MAPI over RPC/TCP connections. Exchange 2013 CAS will now receive all client traffic and then we proxy traffic for users still on Exchange 2010 back to the Exchange 2010 CAS infrastructure. The redundant CAS was removed from the diagram to simplify the view and simply show traffic flow.

    image

    In summary, we hope those of you in this unique configuration will be able to smoothly migrate from Exchange 2010 to Exchange 2013 now that you have these mitigation steps. Some of you may identify other potential methods to use and wonder why we are offering only a single mitigation approach. There were many methods investigated, but this mitigation approach came back every time as the most straightforward method to implement, maintain, and support. Given the potential complexity of this change we invite you to ask follow-up questions at the follow Exchange Server Forum where we can often better interact with you than the comments format allows.

    Exchange Server Forum: Exchange Server 2013 – Setup, Deployment, Updates, and Migration

    Brian Day
    Senior Program Manager
    Exchange Customer Experience

  • Using Exchange Web Services to Apply a Personal Tag to a Custom Folder

    In Exchange 2010, we introduced Retention Tags, a Messaging Records Management (MRM) feature that allows you to manage email lifecycle. You can use retention policies to retain mailbox data for as long as it’s required to meet business or regulatory requirements, and delete items older than the specified period.

    One of the design goals for MRM 2.0 was to simplify administration compared to Managed Folders, the MRM feature introduced in Exchange 2007, and allow users more flexibility. By applying a Personal Tag to a folder, users can have different retention settings apply to items in that folder than the default tag applied to the entire mailbox(known as a Default Policy Tag). Similarly, users can apply a different tag to a subfolder than the one applied to the parent folder. Users can also apply a Personal Tag to individual items, allowing them the freedom to organize messages based on their work habits and preference, rather than forcing them to move messages, based on the retention requirement, to an admin-controlled Managed Folder.

    You can still use Managed Folders in Exchange 2010, but they’re not available in Exchange 2013.

    For a comparison of Retention Tags with Managed Folders and migration details, see Migrate Managed Folders.

    If you like the Managed Folders approach of being able to create a folder in the user’s mailbox and configure a retention setting for that folder, you can use Exchange Web Services (EWS) to accomplish something similar, with some caveats mentioned later in this post. You can write your own code or even a PowerShell script to create a folder in the user’s mailbox and apply a Personal Tag to it. There are scripts available on the interwebs, including some code samples on MSDN to accomplish this. For example:

    Note: The above scripts are examples for your reference. They’re not written or tested by the Exchange product group.

    But is it supported?

    We frequently get questions about whether this is supported by Microsoft. Short answer: Yes. Exchange Web Services (EWS) is a supported and documented API, which allows ISVs and customers to create custom solutions for Exchange.

    When using EWS in your code or PowerShell script to apply a Personal Tag to a folder, it’s important to consider the following:

    For Developers

    • EWS is meant for developers who can write custom code or scripts to extend Exchange’s functionality. As a developer, you must have a good understanding of the functionality available via the API and what you can do with it using your code/script.
    • Support for EWS API is offered through our Exchange Developer Support channels.

    For IT Pros

    • If you’re an IT Pro writing your own code or scripts, you’re a developer too! Above applies to you.
    • If you’re an IT Pro using 3rd-party code or scripts, including the code samples & scripts available on MSDN, TechNet or elsewhere on the interwebs, we recommend that you follow the general best practices for using such code or scripts, including (but not limited to)the following:
      • Do not use code/scripts from untrusted sources in a production environment.
      • Understand what the script or code does. (This is easy for scripts – you can look at the source in a text editor.)
      • Test the script or code thoroughly in a non-production environment, including all command-line options/parameters available in it, before installing or executing it in your production environment.
      • Although it’s easy to change the PowerShell execution policy on your servers to allow unsigned scripts to execute, it’s recommended to allow only signed scripts in production environments. You can easily sign a script if it's unsigned, before running it in a production environment.

    So should I do it?

    If using EWS to apply a Personal Tag to custom folders helps you meet your business requirements, absolutely! However, do note and consider the following:

    • You’re replicating some of the functionality available via Managed Folders, but it doesn’t turn the folder into a Managed Folder.
    • Remember - it’s a Personal Tag! Users can remove the tag from the folder using Outlook or Outlook Web App.
    • If you have additional Personal Tags available in your environment, users can change the tag on the custom folder.
    • Users can tag individual items with a different Personal Tag. There is no way to enforce inheritance of retention tag if Personal Tags have been provisioned and available to the user.
    • Users can rename or delete custom folders. Unlike Managed Folders, which are protected from changes or deletion by users, custom folders created by users or by admin are just like any other (non-default) folder in the mailbox.

    Provisioning custom folders with different retention settings (by applying Personal Tags) may help you meet your organization’s retention requirements. As an IT Pro, make sure you understand the above and follow the best practices.

    Bharat Suneja

  • Released: Exchange Server 2013 Management Pack

    The Microsoft Exchange Server 2013 Management Pack (SCOM MP) is now live!

    As I discussed in my Managed Availability article, the key difference between this management pack and previous releases, is that our health logic is now built into Exchange, as opposed to the management pack. This means updates to Exchange 2013 (like our cumulative updates), will include changes to the probes, monitors, and responders. Any issues that Managed Availability cannot solve are bubbled up to SCOM via an event monitor.

    You can download the management pack via Microsoft Download Center at http://www.microsoft.com/en-us/download/details.aspx?id=39039.

    You can also view the following documentation:

    More information can be found at the SCOM team’s blog - http://blogs.technet.com/b/momteam/archive/2013/05/14/exchange-2013-management-pack-released.aspx.

    Ross Smith IV
    Principal Program Manager
    Exchange Customer Experience

  • Released: Exchange 2013 Server Role Requirements Calculator

    It’s been a long road, but the initial release of the Exchange 2013 Server Role Requirements Calculator is here. No, that isn’t a mistake, the calculator has been rebranded.  Yes, this is no longer a Mailbox server role calculator; this calculator includes recommendations on sizing Client Access servers too! Originally, marketing wanted to brand it as the Microsoft Exchange Server 2013 Client Access and Mailbox Server Roles Theoretical Capacity Planning Calculator, On-Premises Edition.  Wow, that’s a mouthful and reminds me of this branding parody.  Thankfully, I vetoed that name (you’re welcome!).

    The calculator supports the architectural changes made possible with Exchange 2013:

    Client Access Servers

    Like with Exchange 2010, the recommendation in Exchange 2013 is to deploy multi-role servers. There are very few reasons you would need to deploy dedicated Client Access servers (CAS); CPU constraints, use of Windows Network Load Balancing in small deployments (even with our architectural changes in client connectivity, we still do not recommend Windows NLB for any large deployments) and certificate management are a few examples that may justify dedicated CAS.

    When deploying multi-role servers, the calculator will take into account the impact that the CAS role has and make recommendations for sizing the entire server’s memory and CPU. So when you see the CPU utilization value, this will include the impact both roles have!

    When deploying dedicated server roles, the calculator will recommend the minimum number of Client Access processor cores and memory per server, as well as, the minimum number of CAS you should deploy in each datacenter.

    Transport

    Now that the Mailbox server role includes additional components like transport, it only makes sense to include transport sizing in the calculator. This release does just that and will factor in message queue expiration and Safety Net hold time when calculating the database size. The calculator even makes a recommendation on where to deploy the mail.que database, either the system disk, or on a dedicated disk!

    Multiple Databases / JBOD Volume Support

    Exchange 2010 introduced the concept of 1 database per JBOD volume when deploying multiple database copies. However, this architecture did not ensure that the drive was utilized effectively across all three dimensions – throughput, IO, and capacity. Typically, the system was balanced from an IO and capacity perspective, but throughput was where we saw an imbalance, because during reseeds only a portion of the target disk’s total capable throughput was utilized. In addition, capacity on the 7.2K disks continue to increase with 4TB disks now available, thus impacting our ability to remain balanced along that dimension. In addition, Exchange 2013 includes a 33% reduction in IO when compared to Exchange 2010. Naturally, the concept of 1 database / JBOD volume needed to evolve. As a result, Exchange 2013 made several architectural changes in the store process, ESE, and HA architecture to support multiple databases per JBOD volume. If you would like more information, please see Scott’s excellent TechEd session in a few weeks on Exchange 2013 High Availability and Site Resilience or the High Availability and Site Resilience topic on TechNet.

    By default, the calculator will recommend multiple databases per JBOD volume. This architecture is supported for single datacenter deployments and multi-datacenter deployments when there is copy and/or server symmetry. The calculator supports highly available database copies and lagged database copies with this volume architecture type. The distribution algorithm will lay out the copies appropriately, as well as, generate the deployment scripts correctly to support AutoReseed.

    High Availability Architecture Improvements

    The calculator has been improved in several ways for high availability architectures:

    • You can now specify the Witness Server location, either primary, secondary, or tertiary datacenter.
    • The calculator allows you to simulate WAN failures, so that you can see how the databases are distributed during the worst failure mode.
    • The calculator allows you to name servers and define a database prefix which are then used in the deployment scripts.
    • The distribution algorithm supports single datacenter HA deployments, Active/Passive deployments, and Active/Active deployments.
    • The calculator includes a PowerShell script to automate DAG creation.
    • In the event you are deploying your high availability architecture with direct attached storage, you can now specify the maximum number of database volumes each server will support. For example, if you are deploying a server architecture that can support 24 disks, you can specify a maximum support of 20 database volumes (leaving 2 disks for system, 1 disk for Restore Volume, and 1 disks as a spare for AutoReseed).

    Additional Mailbox Tiers (sort of!)

    Over the years, a few, but vocal, members of the community have requested that I add more mailbox tiers to the calculator. As many of you know, I rarely recommend sizing multiple mailbox tiers, as that simply adds operational complexity and I am all about removing complexity in your messaging environments. While, I haven’t specifically added additional mailbox tiers, I have added the ability for you to define a percentage of the mailbox tier population that should have the IO and Megacycle Multiplication Factors applied. In a way, this allows you to define up to eight different mailbox tiers.

    Processors

    I’ve received a number of questions regarding processor sizing in the calculator.  People are comparing the Exchange 2010 Mailbox Server Role Requirements Calculator output with the Exchange 2013 Server Role Requirements Calculator.  As mentioned in our Exchange 2013 Performance Sizing article, the megacycle guidance in Exchange 2013 leverages a new server baseline, therefore, you cannot directly compare the output from the Exchange 2010 calculator with the Exchange 2013 calculator.

    Conclusion

    There are many other minor improvements sprinkled throughout the calculator.  We hope you enjoy this initial release.  All of this work wouldn’t have occurred without the efforts of Jeff Mealiffe (for without our sizing guidance there would be no calculator!), David Mosier (VBA scripting guru and the master of crafting the distribution worksheet), and Jon Gollogy (deployment scripting master).

    As always we welcome feedback and please report any issues you may encounter while using the calculator by emailing strgcalc AT microsoft DOT com.

    Ross Smith IV
    Principal Program Manager
    Exchange Customer Experience

  • Use Exchange Web Services and PowerShell to Discover and Remove Direct Booking Settings

    Update 7/15/2013: we have made a few updates to the below blog post to adjust the instructions for the release of the version 2 of the script.

    Prior to Exchange 2007, there were two primary methods of implementing automated resource scheduling – Direct Booking and the AutoAccept Agent (a store event sink released as a web download for Exchange 2003). In Exchange 2007, we changed how automated resource scheduling is implemented. The AutoAccept Agent is no longer supported, and the Direct Booking method, technically an Outlook function, has been replaced with server-side calendar booking function called the Resource Booking Attendant.

    Note: There are various terms associated with this new Resource Booking function, such as: Calendar Processing, Automatic Resource Booking, Calendar Attendant Processing, Automated Processing and Resource Booking Assistant. We will be using the “Resource Booking Attendant” nomenclature for this article.

    While the Direct Booking method for resource scheduling can indeed work on Exchange Server 2007/2010/2013, we strongly recommend that you disable Direct Booking for resource mailboxes and use the Resource Booking Attendant instead. Specifically, we are referring to the “AutoAccept” Automated Processing feature of the Resource Booking Attendant, which can be enabled for a mailbox after it has been migrated to Exchange 2007 or later and upgraded to a Resource Mailbox.

    Note: The published resource mailbox upgrade guidance on TechNet specifies to disable Direct Booking in the resource mailbox while still on Exchange 2003, move the mailbox, and then enable the AutoAccept functionality via the Resource Booking Attendant. This order of steps can introduce an unnecessary amount of time where the resource mailbox may be without automated scheduling capabilities.

    We are currently working to update that guidance to reflect moving the mailbox first, and only then proceed with disabling the Direct Booking functionality, after which the AutoAccept functionality via the Resource Booking Attendant can be immediately enabled. This will shorten the duration where the mailbox is without automated resource scheduling capabilities.

    This conversion process to resource mailboxes utilizing the Resource Booking Attendant is sometimes an honest oversight or even deliberately ignored when migrating away from Exchange 2003 due to Direct Booking’s ability to continue to work with newer versions of Exchange, even Exchange Online. This will often result in resource mailboxes (or even user mailboxes!) with Direct Booking functionality remaining in place long after Exchange 2003 is ancient history in the environment.

    Why not just leave Direct Booking enabled?

    There are issues that can arise from leaving Direct Booking enabled, from simple administrative burden scenarios all the way to major calendaring issues. Additionally, Resource Booking Attendant offers advantages over Direct Booking functionality:

    1. Direct Booking capabilities, technically an Outlook function, has been deprecated from the product as of Outlook 2013. It was already on the deprecation list in Outlook 2010 and required a registry modification to reintroduce the functionality.
    2. Direct Booking and Resource Booking Attendant are conflicting technologies, and if simultaneously enabled, unexpected behavior in calendar processing and item consistency can occur.
    3. Outlook Web App (as well as any non-MAPI clients, like Exchange ActiveSync (EAS) devices) cannot use Direct Booking for automated resource scheduling. This is especially relevant for Outlook Web App-only environments where the users do not have Microsoft Outlook as a mail client.
    4. The Resource Booking Attendant AutoAccept functionality is a server-side solution, eliminating the need for client-side logic in order to automatically process meeting requests.

    How do I check which mailboxes have Direct Booking Enabled?

    How does one validate if Direct Booking settings are enabled on mailboxes in the organization, especially if mailboxes had previously been hosted on Exchange 2003?

    Screenshot: Resource Scheduling properties
    Figure 1: Checking Direct Booking settings in Microsoft Outlook 2010

    Unfortunately, the manual steps involve assigning permissions to all mailboxes, creating MAPI profiles for each mailbox, logging into each mailbox, checking Tools > Options > Calendar > Resource Scheduling, note which of the three Direct Booking checkboxes are checked, click OK/Cancel a few times, log out of mailbox. Whew! That can be a major undertaking even for a small to midsize company that has more than a handful of mailboxes! Having staff perform this type of activity manually can be a costly and tedious endeavor. Once you have discovered which mailboxes have the Direct Booking settings enabled, you would then have to repeat this entire process to disable these settings unless you removed them at the time of discovery.

    Having an automated method to discover, track, and even disable Direct Booking settings would be nice right?

    Look no further, we have the solution for you!

    Using Exchange Web Services (EWS) and PowerShell, we can automate the discovery of Direct Booking settings that are enabled, track the results, and even disable them! We wrote Remove-DirectBooking.ps1, a sample script, to do exactly that and even more to aid in automating this manual effort.

    After you've downloaded it, rename the file and remove the .txt extension.

    IMPORTANT:  The previously uploaded script had the last line truncated to Stop-Tran (instead of Stop-Transcript). We've uploaded an updated version to TechNet Gallery. If you downloaded the previous version of the script, please download the updated version. Alternatively, you can open the previously downloaded version in Notepad or other text editor and correct the last line to Stop-Transcript.

    Let’s break down the major tasks the PowerShell script does:

    1. Uses EWS Application Impersonation to tap into a mailbox (or set of mailboxes) and read the three MAPI properties where the Direct Booking settings are stored. It does this by accessing the localfreebusy item sitting in the NON_IPM_SUBTREE\FreeBusy Data folder, which resides in the root of the Information Store in the mailbox. The three MAPI properties and their equivalent Outlook settings the script looks at are:

      • 0x686d Automatically accept meeting requests and remove canceled meetings
      • 0x686f Automatically decline meeting requests that conflict with an existing appointment or meeting
      • 0x686e Automatically decline recurring meeting requests

      These three properties contain Boolean values mirroring the Resource Scheduling checkboxes found in Outlook (see Figure 1 above).

    2. For each mailbox processed, the script attempts to locate the corresponding free/busy message stored in the ‘Schedule+ Free/Busy’ system Public Folder representing the user.  This item must be updated just like the user’s local mailbox item – the two items must be consistent in their settings. We need to do this because Outlook only considers the settings in the Public Folder free/busy item when a user attempts to Direct Book a resource.  Therefore, it is critical that the script checks for the Public Folder item’s existence and its settings are in sync with the localfreebusy item stored in the mailbox itself.
    3. For mailboxes where Direct Booking settings were detected, it checks for conflicts by determining if the mailbox also has Resource Booking Attendant enabled with AutomateProcessing set to AutoAccept.
    4. Optionally, disables any enabled Direct Booking settings encountered.

      Note: It is important to understand that by default the script runs in a read-only mode. Additional command line switches are available to run the script to disable Direct Booking settings.

    5. Writes a detailed runtime processing log to console and log file.
    6. Creates a simple output text file containing a list of mailboxes that can be later leveraged as an input file to feed the script for disabling the Direct Booking functionality.
    7. Creates a CSV file containing statistics of the list of mailboxes processed with detailed information, such as what was discovered, any errors encountered, and optionally what was disabled. This is useful for performing analysis in the discovery phase and can also be used as another source to create an input file to feed into the script for disabling the Direct Booking functionality.

    Example Scenarios

    Here are a couple of example scenarios that illustrate how to use the script to discover and remove enabled Direct Booking settings.

    Scenario 1

    You've recently migrated from Exchange 2003 to Exchange 2010 and would like to disable Direct Booking for your company’s conference room mailboxes as well as any user mailboxes that may have Direct Booking settings enabled. The administrator’s logged in account has Application Impersonation rights and the View-Only Recipients RBAC role assigned.

    1. On a machine that has the Exchange management tools & the Exchange Web Services API 1.2 or greater installed, open the Exchange Management Shell, navigate to the folder containing the script, and run the script using the following syntax:

      .\Remove-DirectBooking.ps1 –identity * -UseDefaultCredentials

    2. The script will process all mailboxes in the organization with detailed logging sent to the shell on the console. Note, depending the number of mailboxes in the org, this may take some time to complete
    3. When the script completes, open the Remove-DirectBooking_<timestamp>.txt file in Notepad, which will contain list of mailboxes that have Direct Booking enabled:

      Screnshot: The Remove-Directbooking log generated by the script
      Figure 2: Output file containing list of mailboxes with Direct Booking enabled

    4. After reviewing the list, rerun the script with the InputFile parameter and the RemoveDirectBookingswitch:

      .\Remove-DirectBooking.ps1 –InputFile ‘.\Remove-DirectBooking_<timestamp>.txt’ –UseDefaultCredentials -RemoveDirectBooking

    5. The script will process all the mailboxes listed in the input file with detailed logging sent to the shell on the console. Because you specified the RemoveDirectBooking switch, it does not run in read-only mode and disables all currently enabled Direct Booking settings encountered.
    6. When the script completes, you can check the status of the removal operation by checking the Remove-DirectBooking_<timestamp>.csv file. A column called Direct Booking Removed? will record if the removal was successful. You can also check the runtime processing log file RemoveDirectBooking_<timestamp>.logas well.

      image
      Figure 3: Reviewing runtime log file in Excel

    Note The Direct Booking Removed? column now shows Yes where applicable, but the three Direct Booking settings columns still show their various values as “Yes”; this is because we record those three values pre-removal. If you were to run the script again in read-only mode against the same input file, those columns would reflect a value of N/A since there would no longer be any Direct Booking settings enabled. The Resource Room?, AutoAccept Enabled?, and Conflict Detected all have a value of N/A regardless because they are not relevant when disabling the Direct Booking settings.

    Scenario 2

    You're an administrator who's new to an organization. You know that they migrated from Exchange 2003 to Exchange 2007 in the distant past and are currently in the process of implementing Exchange 2010, having already migrated some users to Exchange 2010. You have no idea what resources mailboxes or even user mailboxes may be using Direct Booking and would like to discover who has what Direct Booking settings enabled. You would then like to selectively choose which mailboxes to pilot for Direct Booking removal before taking action on the majority of found mailboxes.

    Here's how you would accomplish this using the Remove-DirectBooking.ps1 script:

    1. Obtain a service account that has Application Impersonation rights for all mailboxes in the org.
    2. Ensure service account has at least Exchange View-Only Administrator role (2007) and at least have an RBAC Role Assignment of View Only Recipients (2010/2013).
    3. On a machine that has the Exchange management tools & the Exchange Web Services API 1.2 or greater installed, preferably an Exchange 2010 server, open the Exchange Management Shell, navigate to the folder containing the script, and run the script using the following syntax:

      .\Remove-DirectBooking.ps1 –Identity *

    4. The script will prompt you for the domain credentials of the account you wish to use because no credentials were specified. Enter the service account’s credentials.
    5. The script will process all mailboxes in the organization with detailed logging sent to the shell on the console. Note, depending the number of mailboxes in the org, this may take some time to complete.
    6. When the script completes, open the Remove-DirectBooking_<timestamp>.csv in Excel, which will looks something like:

      image

      Figure 4: Reviewing the Remove-DirectBooking_<timestamp>.csv in Excel

    7. Filter or sort the table by the Direct Booking Enabled? column. This will provide a list that can be scrutinized to determine which mailboxes are to be piloted with Direct Booking removal, such as those that have conflicts with already having the Resource Booking Attendant’s Automated Processing set to AutoAccept (which you can also filter on using the AutoAccept Enabled? column).
    8. Once the list has been reviewed and the targeted mailboxes isolated, simply copy their email addresses into a text file (one address per line), save the text file, and use it as the input source for the running the script to disable the Direct Booking settings:

      .\Remove-DirectBooking.ps1 –InputFile ‘.\’ -RemoveDirectBooking

    9. As before, the script will prompt you for the domain credentials of the account you wish to use. Enter the service account’s credentials.
    10. The script will process all the mailboxes listed in the input file with detailed logging sent to the shell on the console. It will disable all enabled Direct Booking settings encountered.
    11. Use the same validation steps at the end of the previous example to verify the removal was successful.

    Script Options and Caveats

    Please see the script’s help section (via “get-help .\remove-DirectBooking.ps1 -full”) for full information on all the available parameters. Here are some additional options that may be useful in certain scenarios:

    1. EWSURL switch parameter. By default, the script will attempt to retrieve the EWS URL for each mailbox via AutoDiscover. This is preferred, especially in complex multi-datacenter or hybrid Exchange Online/On-premises environments where different EWS URLs may be in play for any given mailbox depending on where it resides in the org. However, there may be times where one would want to supply an EWS URL manually, such as when AutoDiscover is having “issues”, or the response time for AutoDiscover requests is introducing delays in overall script execution (think very large quantity of number of mailbox identities to churn through) and the EWS URL is the same across the org, etc. In these situations, one can use the EWSURL parameter to feed the script a static EWS URL.
    2. UseDefaultCredentials If the current user is the service account or perhaps simply has both the Impersonation and the necessary Exchange Admin rights per the script’s requirements and they don’t wish to be prompted to type in a credential (another great example is scheduling the script to run as a job for instance), you can use the UseDefaultCredentials to run the script under that security context.
    3. RemoveDirectBooking By default, the script runs in read-only mode. In order to make changes and disable Direct Booking settings on the mailbox, you mus specify the RemoveDirectBooking switch.

    The script does have several prerequisites and caveats to ensure proper operation and meaningful results:

    1. Application Impersonation rights and minimum Exchange Admin rights must be used
    2. Exchange Web Services Managed API 1.2 or later must be installed on the machine running the script
    3. Exchange management tools must be installed on the machine running the script
    4. Script must be executed from within the Exchange Management Shell
    5. The Shell session must have the appropriate execution policy to allow the script to be executed (by default, you can't execute unsigned scripts).
    6. AutoDiscover must be configured correctly (unless the EWS URL is entered manually)
    7. Exchange 2003-based mailboxes cannot be targeted due to lack of EWS capabilities
    8. In an Exchange 2010/2013 environment that also has Exchange 2007 mailboxes present, the script should be executed from a machine running Exchange 2010/2013 management tools due to changes in the cmdlets in those versions
    9. Due to limitations in the EWS architecture, the Schedule+ Free/Busy System Folder subfolders must contain a replica on each version of Exchange that users are being processed for; additionally, the user's Exchange mailbox database 'Default public folder database' property must be set to a Public Folder database that is on the same version of Exchange as the mailbox database.

    Summary

    The discovery and removal of Direct Booking settings can be a tedious and costly process to perform manually, but you can avoid and automate it using current functions and features via PowerShell and EWS in Microsoft Exchange Server 2007, 2010, & 2013. With careful use, the Remove-DirectBooking.ps1 script can be a valuable tool to aid Exchange administrators in maintaining automated resource scheduling capabilities in their Microsoft Exchange environments.

    Your feedback and comments are welcome.

    Thank you to Brian Day and Nino Bilic for their guidance in content review, and to our customers (you know who you are) for piloting the script.

    Seth Brandes & Dan Smith

  • Ask the Perf Guy: Sizing Exchange 2013 Deployments

    Since the release to manufacturing (RTM) of Exchange 2013, you have been waiting for our sizing and capacity planning guidance. This is the first official release of our guidance in this area, and updates to our TechNet content will follow in a future milestone.

    As we continue to learn more from our own internal deployments of Exchange 2013, as well as from customer feedback, you will see further updates to our sizing and capacity planning guidance in two forms: changes to the numbers mentioned in this document, as well as further guidance on specific areas not covered here. Let us know what you think we are missing and we will do our best to respond with better information over time.

    First, some context

    Historically, the Exchange Server product group has used various sources of data to produce sizing guidance. Typically, this data would come from scale tests run early in the product development cycle, and we would then fine-tune that guidance with observations from production deployments closer to final release. Production deployments have included Exchange Dogfood (our internal pre-release deployment that hosts the Exchange team and various other groups at Microsoft), Microsoft IT’s corporate Exchange deployment, and various early adopter programs.

    For Exchange 2013, our guidance is primarily based on observations from the Exchange Dogfood deployment. Dogfood hosts some of the most demanding Exchange users at Microsoft, with extreme messaging profiles and many client sessions per user across multiple client types. Many users in the Dogfood deployment send and receive more than 500 messages per day, and typically have multiple Outlook clients and multiple mobile devices simultaneously connected and active. This allows our guidance to be somewhat conservative, taking into account additional overhead from client types that we don’t regularly see in our internal deployments as well as client mixes that might be different from what's considered “normal” at Microsoft.

    Does this mean that you should take this conservative guidance and adjust the recommendations such that you deploy less hardware? Absolutely not. One of the many things we have learned from operating our own very high-scale service is that availability and reliability are very dependent on having capacity available to deal with those unexpected peaks.

    Sizing is both a science and an art form. Attempting to apply too much science to the process (trying to get too accurate) usually results in not having enough extra capacity available to deal with peaks, and in the end, results in a poor user experience and decreased system availability. On the other hand, there does need to be some science involved in the process, otherwise it’s very challenging to have a predictable and repeatable methodology for sizing deployments. We strive to achieve the right balance here.

    Impact of the new architecture

    From a sizing and performance perspective, there are a number of advantages with the new Exchange 2013 architecture. As many of you are aware, a couple of years ago we began recommending multi-role deployment for Exchange 2010 (combining the Mailbox, Hub Transport, and Client Access Server (CAS) roles on a single server) as a great way to take advantage of hardware resources on modern servers, as well as a way to simplify capacity planning and deployment. These same advantages apply to the Exchange 2013 Mailbox role as well. We like to think of the services running on the Mailbox role as providing a balanced utilization of resources rather than having a set of services on a role that are very disk intensive, and a set of services on another role that are very CPU intensive.

    Another example to consider for the Mailbox role is cache effectiveness. Software developers use in-memory caching to prevent having to use higher-latency methods to retrieve data (like LDAP queries, RPCs, or disk reads). In the Exchange 2007/2010 architecture, processing for operations related to a particular user could occur on many servers throughout the topology. One CAS might be handling Outlook Web App for that user, while another (or more than one) CAS might be handling Exchange ActiveSync connections, and even more CAS might be processing Outlook Anywhere RPC proxy load for that same user. It’s even possible that the set of servers handling that load could be changing on a regular basis. Any data associated with that user stored in a cache would become useless (effectively a waste of memory) as soon as those connections moved to other servers. In the Exchange 2013 architecture, all workload processing for a given user occurs on the Mailbox server hosting the active copy of that user’s mailbox. Therefore, cache utilization is much more effective.

    The new CAS role has some nice benefits as well. Given that the role is totally stateless from a user perspective, it becomes very easy to scale up and down as demands change by simply adding or removing servers from the topology. Compared to the CAS role in prior releases, hardware utilization is dramatically reduced meaning that fewer CAS role machines will be required. Additionally, it may make sense for many customers to consider a multi-role deployment in which CAS and Mailbox are co-located – this allows further simplification of capacity planning and deployment, and also increases the number of available CAS which has a positive effect on service availability. Look for a follow up post on the benefits of a multi-role deployment soon.

    Start to finish, what’s the process?

    Sizing an Exchange deployment has six major phases, and I will go through each of them in this post in some detail.

    1. You begin the process by making sure you fully understand the available guidance on this topic. If you are reading this post, that’s a great start. There may have been updates posted either here on the Exchange team blog, or over on TechNet. Make sure you take a look before proceeding.
    2. The second step is to gather any available data on the existing messaging deployment (if there is one) or estimate user profile requirements if this is a totally new solution.
    3. The third step is perhaps the most difficult. At this point, you need to figure out all of the requirements for the Exchange solution that might impact the sizing process. This can include decisions like the desired mailbox size (mailbox quota), service level objectives, number of sites, number of mailbox database copies, storage architecture, growth plans, deployment of 3rd party products or line-of-business applications, etc. Essentially, you need to understand any aspect of the design that could impact the number of servers, user count, and utilization of servers.
    4. Once you have collected all of the requirements, constraints, and user profile data, it’s time to calculate Exchange requirements. The easiest way to do this is with the calculator tool, but it can also be done manually as I will describe in this post. Clearly the calculator makes the process much easier, so if the calculator is available, use it!
    5. Once the Exchange requirements have been calculated, it’s time to consider various options that are available. For example, there may be a choice between scaling up (deploying fewer larger servers) and scaling out (deploying a larger number of smaller servers), and the options could have various implications on high availability, as well as the total number of hardware or software failures that the solution can sustain while remaining available to users. Another typical decision is around storage architecture, and this often comes down to cost. There are a range of costs and benefits to different storage choices, and the Exchange requirements can often be met by more than one of these options.
    6. The last step is to finalize the design. At this point, it’s time to document all of the decisions that were made, order some hardware, use Jetstress to validate that the storage requirements can be met, and perform any other necessary pre-production lab testing to ensure that the production rollout and implementation will go smoothly.

    Gather requirements and user data

    The primary input to all of the calculations that you will perform later is the average user profile of the deployment, where the user profile is defined as the sum of total messages sent and total messages received per-user, per-workday (on average). Many organizations have quite a bit of variability in user profiles. For example, a segment of users might be considered “Information Workers” and spend a good part of their day in their mailbox sending and reading mail, while another segment of users might be more focused on other tasks and use email infrequently. Sizing for these segments of users can be accomplished by either looking at the entire system using weighted averages, or by breaking up the sizing process to align with the various segments of users. In general it’s certainly easier to size the whole system as a unit, but there may be specific requirements (like the use of certain 3rd party tools or devices) which will significantly impact the sizing calculation for one or more of the user segments, and it can be very difficult to apply sizing factors to a user segment while attempting to size the entire solution as a unit.

    The obvious question in your mind is how to go get this user profile information. If you are starting with an existing Exchange deployment, there are a number of options that can be used, assuming that you aren’t the elusive Exchange admin who actually tracks statistics like this on an ongoing basis. If you are using Exchange 2007 or earlier, you can utilize the Exchange Profile Analyzer (EPA) tool, which will provide overall user profile statistics for your Exchange organization as well as detailed per-user statistics if required. If you are on Exchange 2010, the EPA tool is not an option for you. One potential option is to evaluate message traffic using performance counters to come up with user profile averages on a per-server basis. This can be done by monitoring the MSExchangeIS\Messages Submitted/sec and MSExchangeIS\Messages Delivered/sec counters during peak average periods and extrapolating the recorded data to represent daily per-user averages. I will cover this methodology in a future blog post, as it will take a fair amount of explanation. Another option is to use message tracking logs to generate these statistics. This could be done via some crafty custom PowerShell scripting, or you could look for scripts that attempt to do this work for you already. One of our own consultants points to an example on his blog.

    Typical user profiles range from 50-500 messages per-user/per-day, and we provide guidance for those profiles. When in doubt, round up.

    image001

    The other important piece of profile information for sizing is the average message size seen in the deployment. This can be obtained from EPA, or from the other mentioned methods (via transport performance counters, or via message tracking logs). Within Microsoft, we typically see average message sizes of around 75KB, but we certainly have worked with customers that have much higher average message sizes. This can vary greatly by industry, and by region.

    Start with the Mailbox servers

    Just as we recommended for Exchange 2010, the right way to start with sizing calculations for Exchange 2013 is with the Mailbox role. In fact, those of you who have sized deployments for Exchange 2010 will find many similarities with the methodology discussed here.

    Example scenario

    Throughout this article, we will be referring to an example deployment. The deployment is for a relatively large organization with the following attributes:

    • 100,000 mailboxes
    • 200 message/day profile, with 75KB average message size
    • 10GB mailbox quota
    • Single site
    • 4 mailbox database copies, no lagged copies
    • 2U commodity server hardware platform with internal drive bays and an external storage chassis will be used (total of 24 available large form-factor drive bays)
    • 7200 RPM 4TB midline SAS disks are used
    • Mailbox databases are stored on JBOD direct attached storage, utilizing no RAID
    • Solution must survive double failure events

    High availability model

    The first thing you need to determine is your high availability model, e.g., how you will meet the availability requirements that you determined earlier. This likely includes multiple database copies in one or more Database Availability Groups, which will have an impact on storage capacity and IOPS requirements. The TechNet documentation on this topic provides some background on the capabilities of Exchange 2013 and should be reviewed as part of the sizing process.

    At a minimum, you need to be able to answer the following questions:

    • Will you deploy multiple database copies?
    • How many database copies will you deploy?
    • Will you have an architecture that provides site resilience?
    • What kind of resiliency model will you deploy?
    • How will you distribute database copies?
    • What storage architecture will you use?

    Capacity requirements

    Once you have an understanding of how you will meet your high availability requirements, you should know the number of database copies and sites that will be deployed. Given this, you can begin to evaluate capacity requirements. At a basic level, you can think of capacity requirements as consisting of storage for mailbox data (primarily based on mailbox storage quotas), storage for database log files, storage for content indexing files, and overhead for growth. Every copy of a mailbox database is a multiplier on top of these basic storage requirements. As a simplistic example, if I was planning for 500 mailboxes of 1GB each, the storage for mailbox data would be 500GB, and then I would need to apply various factors to that value to determine the per-copy storage requirement. From there, if I needed 3 copies of the data for high availability, I would then need to multiply by 3 to obtain the overall capacity requirement for the solution (all servers). In reality, the storage requirements for Exchange are far more complex, as you will see below.

    Mailbox size

    To determine the actual size of a mailbox on disk, we must consider 3 factors: the mailbox storage quota, database white space, and recoverable items.

    The mailbox storage quota is what most people think of as the “size of the mailbox” – it’s the user perceived size of their mailbox and represents the maximum amount of data that the user can store in their mailbox on the server. While this is certainly represents the majority of space utilization for Exchange databases, it’s not the only element by which we have to size.

    Database whitespace is the amount of space in the mailbox database file that has been allocated on disk but doesn’t contain any in-use database pages. Think of it as available space to grow into. As content is deleted out of mailbox databases and eventually removed from the mailbox recoverable items, the database pages that contained that content become whitespace. We recommend planning for whitespace size equal to 1 day worth of messaging content.

    Estimated Database Whitespace per Mailbox = per-user daily message profile x average message size

    This means that a user with the 200 message/day profile and an average message size of 75KB would be expected to consume the following whitespace:

    200 messages/day x 75KB = 14.65MB

    When items are deleted from a mailbox, they are really “soft-deleted” and moved temporarily to the recoverable items folder for the duration of the deleted item retention period. Like Exchange 2010, Exchange 2013 has a feature known as single item recovery which will prevent purging data from the recoverable items folder prior to reaching the deleted item retention window. When this is enabled, we expect to see a 1.2 percent increase in mailbox size for a 14 day deleted item retention window. Additionally, we expect to see a 3 percent increase in the size of the mailbox for calendar item version logging which is enabled by default. Given that a mailbox will eventually reach a steady state where the amount of new content will be approximately equal to the amount of deleted content in order to remain under quota, we would expect the size of the items in the recoverable items folder to eventually equal the size of new content sent & received during the retention window. This means that the overall size of the recoverable items folder can be calculated as follows:

    Recoverable Items Folder Size = (per-user daily message profile x average message size x deleted item retention window) + (mailbox quota size x 0.012) + (mailbox quota size x 0.03)

    If we carry our example forward with the 200 message/day profile, a 75KB average message size, a deleted item retention window of 14 days, and a mailbox quota of 10GB, the expected recoverable items folder size would be:

    (200 messages/day x 75KB x 14 days) + (10GB x 0.012) + (10GB x 0.03)
    = 210,000KB + 125,819.12K + 314,572.8KB = 635.16MB

    Given the results from these calculations, we can sum up the mailbox capacity factors to get our estimated mailbox size on disk:

    Mailbox Size on disk = 10GB mailbox quota + 14.65MB database whitespace + 635.16MB Recoverable Items Folder = 10.63GB

    Content indexing

    The space required for files related to the content indexing process can be estimated as 20% of the database size.

    Per-Database Content Indexing Space = database size x 0.20

    In addition, you must additionally size for one additional content index (e.g. an additional 20% of one of the mailbox databases on the volume) in order to allow content indexing maintenance tasks (specifically the master merge process) to complete. The best way to express the need for the master merge space requirement would be to look at the average database file size across all databases on a volume and add 1 database worth of disk consumption to the calculation when determining the per-volume content indexing space requirement:

    Per-Volume Content Indexing Space = (average database size x (databases on the volume + 1) x 0.20)

    As a simple example, if we had 2 mailbox databases on a single volume and each database consumed 100GB of space, we would compute the per-volume content indexing space requirement like this:

    100GB database size x (2 databases + 1) x 0.20 = 60GB

    Log space

    The amount of space required for ESE transaction log files can be computed using the same method as Exchange 2010. You can find details on the process in the Exchange 2010 TechNet guidance. To summarize the process, you must first determine the base guideline for number of transaction logs generated per-user, per-day, using the following table. As in Exchange 2010, log files are 1MB in size, making the math for log capacity quite straightforward.

    Message profile (75 KB average message size) Number of transaction logs generated per day
    50 10
    100 20
    150 30
    200 40
    250 50
    300 60
    350 70
    400 80
    450 90
    500 100

    Once you have the appropriate value from the table which represents guidance for a 75KB average message size, you may need to adjust the value based on differences in the target average message size. Every time you double the average message size, you must increase the logs generated per day by an additional factor of 1.9. For example:

    Transaction logs at 200 messages/day with 150KB average message size = 40 logs/day (at 75KB average message size) x 1.9 = 76

    Transaction logs at 200 messages/day with 300KB average message size = 40 logs/day (at 75KB average message size) x (1.9 x 2) = 152

    While daily log volume is interesting, it doesn’t represent the entire requirement for log capacity. If traditional backups are being used, logs will remain on disk for the interval between full backups. When mailboxes are moved, that volume of change to the target database will result in a significant increase in the amount of logs generated during the day. In a solution where Exchange native data protection is in use (e.g., you aren’t using traditional backups), logs will not be truncated if a mailbox database copy is failed or if an entire server is unreachable unless an administrator intervenes. There are many factors to consider when sizing for required log capacity, and it is certainly worth spending some time in the Exchange 2010 TechNet guidance mentioned earlier to fully understand these factors before proceeding. Thinking about our example scenario, we could consider log space required per database if we estimate the number of users per database at 65. We will also assume that 1% of our users are moved per week in a single day, and that we will allocate enough space to support 3 days of logs in the case of failed copies or servers.

    Log Capacity to Support 3 Days of Truncation Failure = (65 mailboxes/database x 40 logs/day x 1MB log size) x 3 days = 7.62GB

    Log Capacity to Support 1% mailbox moves per week = 65 mailboxes/database x 0.01 x 10.63GB mailbox size = 6.91GB

    Total Local Capacity Required per Database = 7.62GB + 6.91GB = 14.53GB

    Putting all of the capacity requirements together

    The easiest way to think about sizing for storage capacity without having a calculator tool available is to make some assumptions up front about the servers and storage that will be used. Within the product group, we are big fans of 2U commodity server platforms with ~12 large form-factor drive bays in the chassis. This allows for a 2 drive RAID array for the operating system, Exchange install path, transport queue database, and other ancillary files, and ~10 remaining drives to use as mailbox database storage in a JBOD direct attached storage configuration with no RAID. Fill this server up with 4TB SATA or midline SAS drives, and you have a fantastic Exchange 2013 server. If you need even more storage, it’s quite easy to add an additional shelf of drives to the solution.

    Using the large deployment example and thinking about how we might size this on the commodity server platform, we can consider a server scaling unit that has a total of 24 large form-factor drive bays containing 4TB midline SAS drives. We will use 2 of those drives for the OS & Exchange, and the remaining drive bays will be used for Exchange mailbox database capacity. Let’s use 12 of those drive bays for databases – that leaves 10 remaining drive bays that could contain spares or remain empty. For this sizing exercise, let’s also plan for 4 databases per drive. Each of those drives has a formatted capacity of ~3725GB. The first step in figuring out the number of mailboxes per database is to look at overall capacity requirements for the mailboxes, content indexes, and required free space (which we will set to 5%).

    To calculate the maximum amount of space available for mailboxes, let’s apply a formula (note that this doesn’t consider space for logs – we will make sure that the volume will have enough space for logs later in the process). First, we can remove our required free space from the available storage on the drive:

    Available Space (excluding required free space) = Formatted capacity of the drive x (1 – free space)

    Then we can remove the space required for content indexing. As discussed above, the space required for content indexing will be 20% of the database size, with an additional 20% of one database for content indexing maintenance tasks. Given the additional 20% requirement, we can’t model the overall space requirement as a simple 20% of the remaining space on the volume. Instead we need to compute a new percentage that takes the number of databases per-volume into consideration.

    image016

    Now we can remove the space for content indexing from our available space on the volume:

    image017

    And we can then divide by the number of databases per-volume to get our maximum database size:

    image018

    In our example scenario, we would obtain the following result:

    image019

    Given this value, we can then calculate our maximum users per database (from a capacity perspective, as this may change when we evaluate the IO requirements):

    image020

    Let’s see if that number is actually reasonable given our 4 copy configuration. We are going to use 16-node DAGs for this deployment to take full advantage of the scalability and high-availability benefits of large DAGs. While we have many drives available on our selected hardware platform, we will be limited by the maximum of 50 database copies per-server in Exchange 2013. Considering this maximum and our desire to have 4 databases per volume, we can calculate the maximum number of drives for mailbox database usage as:

    image021

    With 12 database volumes and 4 database copies per-volume, we will have 48 total database copies per server.

    image022

    With 66 users per database and 100,000 total users, we end up with the following required DAG count for the user population:

    image023

    In this very large deployment, we are using a DAG as a unit of scale or “building block” (e.g. we perform capacity planning based on the number of DAGs required to meet demand, and we deploy an entire DAG when we need additional capacity), so we don’t intend to deploy a partial DAG. If we round up to 8 DAGs we can compute our final users per database count:

    image024

    With 65 users per-database, that means we will expect to consume the following space for mailbox databases:

    Estimated Database Size = 65 users x 10.63GB = 690.95GB
    Database Consumption / Volume = 690.95GB x 4 databases = 2763.8GB

    Using the formula mentioned earlier, we can compute our estimated content index consumption as well:

    690.95GB database size x (4 databases + 1) x 0.20 = 690.95GB

    You’ll recall that we computed transaction log space requirements earlier, and it turns out that we magically computed those values with the assumption that we would have 65 users per-database. What a pleasant coincidence! So we will need 14.53GB of space for transaction logs per-database, or to get a more useful result:

    Log Space Required / Volume = 14.53GB x 4 databases = 58.12GB

    To sum it up, we can estimate our total per-volume space utilization and make sure that we have plenty of room on our target 4TB drives:

    image029

    Looks like our database volumes are sized perfectly!

    IOPS requirements

    To determine the IOPS requirements for a database, we look at the number of users hosted on the database and consider the guidance provided in the following table to compute total required IOPS when the database is active or passive.

    Messages sent or received per mailbox per day Estimated IOPS per mailbox (Active or Passive)
    50 0.034
    100 0.067
    150 0.101
    200 0.134
    250 0.168
    300 0.201
    350 0.235
    400 0.268
    450 0.302
    500 0.335

    For example, with 50 users in a database, with an average message profile of 200, we would expect that database to require 50 x 0.134 = 6.7 transactional IOPS when the database is active, and 50 x 0.134 = 6.7 transactional IOPS when the database is passive. Don’t forget to consider database placement which will impact the number of databases with IOPS requirements on a given storage volume (which could be a single JBOD drive or might be a more complex storage configuration).

    Going back to our example scenario, we can evaluate the IOPS requirement of the solution, recalling that the average user profile in that deployment is the 200 message/day profile. We have 65 users per database and 4 databases per JBOD drive, so we can estimate our IOPS requirement in worst-case (all databases active) as:

    65 mailboxes x 4 databases per-drive x 0.134 IOPS/mailbox at 200 messages/day profile = ~34.84 IOPS per drive

    Midline SAS drives typically provide ~57.5 random IOPS (based on our own internal observations and benchmark tests), so we are well within design constraints when thinking about IOPS requirements.

    Storage bandwidth requirements

    While IOPS requirements are usually the primary storage throughput concern when designing an Exchange solution, it is possible to run up against bandwidth limitations with various types of storage subsystems. The IOPS sizing guidance above is looking specifically at transactional (somewhat random) IOPS and is ignoring the sequential IO portion of the workload. One place that sequential IO becomes a concern is with storage solutions that are running a large amount of sequential IO through a common channel. A common example of this type of load is the ongoing background database maintenance (BDM) which runs continuously on Exchange mailbox databases. While this BDM workload might not be significant for a few databases stored on a JBOD drive, it may become a concern if all of the mailbox database volumes are presented through a common iSCSI or Fibre Channel interface. In that case, the bandwidth of that common channel must be considered to ensure that the solution doesn’t bottleneck due to these IO patterns.

    In Exchange 2013, we expect to consume approximately 1MB/sec/database copy for BDM which is a significant reduction from Exchange 2010. This helps to enable the ability to store multiple mailbox databases on the same JBOD drive spindle, and will also help to avoid bottlenecks on networked storage deployments such as iSCSI. This bandwidth utilization is in addition to bandwidth consumed by the transactional IO activity associated with user and system workload processes, as well as storage bandwidth consumed by the log replication and replay process in a DAG.

    Transport storage requirements

    Since transport components (with the exception of the front-end transport component on the CAS role) are now part of the Mailbox role, we have included CPU and memory requirements for transport with the general Mailbox role requirements described later. Transport also has storage requirements associated with the queue database. These requirements, much like I described earlier for mailbox storage, consist of capacity factors and IO throughput factors.

    Transport storage capacity is driven by two needs: queuing (including shadow queuing) and Safety Net (which is the replacement for transport dumpster in this release). You can think of the transport storage capacity requirement as the sum of message content on disk in a worst-case scenario, consisting of three elements:

    • The current day’s message traffic, along with messages which exist on disk longer than normal expiration settings (like poison queue messages)
    • Queued messages waiting for delivery
    • Messages persisted in Safety Net in case they are required for redelivery

    Of course, all three of these factors are also impacted by shadow queuing in which a redundant copy of all messages is stored on another server. At this point, it would be a good idea to review the TechNet documentation on Transport High Availability if you aren’t familiar with the mechanics of shadow queuing and Safety Net.

    In order to figure out the messages per day that you expect to run through the system, you can look at the user count and messaging profile. Simply multiplying these together will give you a total daily mail volume, but it will be a bit higher than necessary since it is double counting messages that are sent within the organization (i.e. a message sent to a coworker will count towards the profile of the sending user as well as the profile of the receiving user, but it’s really just one message traversing the system). The simplest way to deal with that would be to ignore this fact and oversize transport, which will provide additional capacity for unexpected peaks in message traffic. An alternative way to determine daily message flow would be to evaluate performance counters within your existing messaging system.

    To determine the maximum size of the transport database, we can look at the entire system as a unit and then come up with a per-server value.

    Overall Daily Messages Traffic = number of users x message profile

    Overall Transport DB Size = average message size x overall daily message traffic x (1 + (percentage of messages queued x maximum queue days) + Safety Net hold days) x 2 copies for high availability

    Let’s use the 100,000 user sizing example again and size the transport database using the simple method.

    Overall Transport DB Size = 75KB x (100,000 users x 200 messages/day) x (1 + (50% x 2 maximum queue days) + 2 Safety Net hold days) x 2 copies = 11,444GB

    In our example scenario, we have 8 DAGs, each containing 16-nodes, and we are designing to handle double node failures in each DAG. This means that in a worst-case failure event we would have 112 servers online with 2 failed servers in each DAG. We can use this value to determine a per-server transport DB size:

    image034

    Sizing for transport IO throughput requirements is actually quite simple. Transport has taken advantage of many of the IO reduction changes to the ESE database that have been made in recent Exchange releases. As a result, the number of IOPS required to support transport is significantly lower. In the internal deployment we used to produce this sizing guidance, we see approximately 1 DB write IO per message and virtually no DB read IO, with an average message size of ~75KB. We expect that as average message size increases, the amount of transport IO required to support delivery and queuing would increase. We do not currently have specific guidance on what that curve looks like, but it is an area of active investigation. In the meantime, our best practices guidance for the transport database is to leave it in the Exchange install path (likely on the OS drive) and ensure that the drive supporting that directory path is using a protected write cache disk controller, set to 100% write cache if the controller allows optimization of read/write cache settings. The write cache allows transport database log IO to become effectively “free” and allows transport to handle a much higher level of throughput.

    Processor requirements

    Once we have our storage requirements figured out, we can move on to thinking about CPU. CPU sizing for the Mailbox role is done in terms of megacycles. A megacycle is a unit of processing work equal to one million CPU cycles. In very simplistic terms, you could think of a 1 MHz CPU performing a megacycle of work every second. Given the guidance provided below for megacycles required for active and passive users at peak, you can estimate the required processor configuration to meet the demands of an Exchange workload. Following are our recommendations on the estimated required megacycles for the various user profiles.

    Messages sent or received per mailbox per day Mcycles per User, Active DB Copy or Standalone (MBX only) Mcycles per User, Active DB Copy or Standalone (Multi-Role) Mcycles per User, Passive DB Copy
    50 2.13 2.93 0.69
    100 4.25 5.84 1.37
    150 6.38 8.77 2.06
    200 8.50 11.69 2.74
    250 10.63 14.62 3.43
    300 12.75 17.53 4.11
    350 14.88 20.46 4.80
    400 17.00 23.38 5.48
    450 19.13 26.30 6.17
    500 21.25 29.22 6.85

    The second column represents the estimated megacycles required on the Mailbox role server hosting the active copy of a user’s mailbox database. In a DAG configuration, the required megacycles for the user on each server hosting passive copies of that database can be found in the fourth column. If the solution is going to include multi-role (Mailbox+CAS) servers, use the value in the third column rather than the second, as it includes the additional CPU requirements for the CAS role.

    It is important to note that while many years ago you could make an assumption that a 500 MHz processor could perform roughly double the work per unit of time as a 250 MHz processor, clock speeds are no longer a reliable indicator of performance. The internal architecture of modern processors is different enough between manufacturers as well as within product lines of a single manufacturer that it requires an additional normalization step to determine the available processing power for a particular CPU. We recommend using the SPECint_rate2006 benchmark from the Standard Performance Evaluation Corporation.

    The baseline system used to generate this guidance was a Hewlett-Packard DL380p Gen8 server containing Intel Xeon E5-2650 2 GHz processors. The baseline system SPECint_rate2006 score is 540, or 33.75 per-core, given that the benchmarked server was configured with a total of 16 physical processor cores. Please note that this is a different baseline system than what was used to generate our Exchange 2010 guidance, so any tools or calculators that make assumptions based on the 2010 baseline system would not provide accurate results for sizing an Exchange 2013 solution.

    Using the same general methodology we have recommended in prior releases, you can determine the estimated available Exchange workload megacycles available on a different processor through the following process:

    1. Find the SPECint_rate2006 score for the processor that you intend to use for your Exchange solution. You can do this the hard way (described below) or use Scott Alexander’s fantastic Processor Query Toolto get the per-server score and processor core count for your hardware platform.
      1. On the website of the Standard Performance Evaluation Corporation, select Results, highlight CPU2006, and select Search all SPECint_rate2006 results.
      2. Under Simple Request, enter the search criteria for your target processor, for example Processor Matches E5-2630.
      3. Find the server and processor configuration you are interested in using (or if the exact combination is not available, find something as close as possible) and note the value in the Result column and the value in the # Cores column.
    2. Obtain the per-core SPECint_rate2006 score by dividing the value in the Result column by the value in the # Cores column. For example, in the case of the Hewlett-Packard DL380p Gen8 server with Intel Xeon E5-2630 processors (2.30GHz), the Result is 430 and the # Cores is 12, so the per-core value would be 430 / 12 = 35.83.
    3. To determine the estimated available Exchange workload megacycles on the target platform, use the following formula:

      image035

      Using the example HP platform with E5-2630 processors mentioned previously, we would calculate the following result:

      image036
      x 12 processors = 25,479 available megacycles per-server

    Keep in mind that a good Exchange design should never plan to run servers at 100% of CPU capacity. In general, 80% CPU utilization in a failure scenario is a reasonable target for most customers. Given that caveat that the high CPU utilization occurs during a failure scenario, this means that servers in a highly available Exchange solution will often run with relatively low CPU utilization during normal operation. Additionally, there may be very good reasons to target a lower CPU utilization as maximum, particularly in cases where unanticipated spikes in load may result in acute capacity issues.

    Going back to the example I used previously of 100,000 users with the 200 message/day profile, we can estimate the total required megacycles for the deployment. We know that there will be 4 database copies in the deployment, and that will help to calculate the passive megacycles required. We also know that this deployment will be using multi-role (Mailbox+CAS) servers. Given this information, we can calculate megacycle requirements as follows:

    100,000 users ((11.69 mcycles per active mailbox) + (3 passive copies x 2.74 mcycles per passive mailbox)) = 1,991,000 total mcycles required

    You could then take that number and attempt to come up with a required server count. I would argue that it’s actually a much better practice to come up with a server count based on high availability requirements (taking into account how many component failures your design can handle in order to meet business requirements) and then ensure that those servers can meet CPU requirements in a worst-case failure scenario. You will either meet CPU requirements without any additional changes (if your server count is bound on another aspect of the sizing process), or you will adjust the server count (scale out), or you will adjust the server specification (scale up).

    Continuing with our hypothetical example, if we knew that the high availability requirements for the design of the 100,000 user example resulted in a maximum of 16 databases being active at any time out of 48 total database copies per server, and we know that there are 65 users per database, we can determine the per-server CPU requirements for the deployment.

    (16 databases x 65 mailboxes x 11.69 mcycles per active mailbox) + (32 databases x 65 mailboxes x 2.74 mcycles per passive mailbox) = 12157.6 + 5699.2 = 17,856.8 mcycles per server

    Using the processor configuration mentioned in the megacycle normalization section (E5-2630 2.3 GHz processors on an HP DL380p Gen8), we know that we have 25,479 available mcycles on the server, so we would estimate a peak average CPU in worst-case failure of:

    17.857 / 25,479 = 70.1%

    That is below our guidance of 80% maximum CPU utilization (in a worst-case failure scenario), so we would not consider the servers to be CPU bound in the design. In fact, we could consider adjusting the CPU selection to a cheaper option with reduced performance getting us closer to a peak average CPU in worst-case failure of 80%, reducing the cost of the overall solution.

    Memory requirements

    To calculate memory per server, you will need to know the per-server user count (both active and passive users) as well as determine whether you will run the Mailbox role in isolation or deploy multi-role servers (Mailbox+CAS). Keep in mind that regardless of whether you deploy roles in isolation or deploy multi-role servers, the minimum amount of RAM on any Exchange 2013 server is 8GB.

    Memory on the Mailbox role is used for many purposes. As in prior releases, a significant amount of memory is used for ESE database cache and plays a large part in the reduction of disk IO in Exchange 2013. The new content indexing technology in Exchange 2013 also uses a large amount of memory. The remaining large consumers of memory are the various Exchange services that provide either transactional services to end-users or handle background processing of data. While each of these individual services may not use a significant amount of memory, the combined footprint of all Exchange services can be quite large.

    Following is our recommended amount of memory for the Mailbox role on a per mailbox basis that we expect to be used at peak.

    Messages sent or received per mailbox per day Mailbox role memory per active mailbox (MB)
    50 12
    100 24
    150 36
    200 48
    250 60
    300 72
    350 84
    400 96
    450 108
    500 120

    To determine the amount of memory that should be provisioned on a server, take the number of active mailboxes per-server in a worst-case failure and multiply by the value associated with the expected user profile. From there, round up to a value that makes sense from a purchasing perspective (i.e. it may be cheaper to configure 128GB of RAM compared to a smaller amount of RAM depending on slot options and memory module costs).

    Mailbox Memory per-server = (worst-case active database copies per-server x users per-database x memory per-active mailbox)

    For example, on a server with 48 database copies (16 active in worst-case failure), 65 users per-database, expecting the 200 profile, we would recommend:

    16 x 65 x 48MB = 48.75GB, round up to 64GB

    It’s important to note that the content indexing technology included with Exchange 2013 uses a relatively large amount of memory to allow both indexing and query processing to occur very quickly. This memory usage scales with the number of items indexed, meaning that as the number of total items stored on a Mailbox role server increases (for both active and passive copies), memory requirements for the content indexing processes will increase as well. In general, the guidance on memory sizing presented here assumes approximately 15% of the memory on the system will be available for the content indexing processes which means that with a 75KB average message size, we can accommodate mailbox sizes of 3GB at 50 message profile up to 32GB at the 500 message profile without adjusting the memory sizing. If your deployment will have an extremely small average message size or an extremely large average mailbox size, you may need to add additional memory to accommodate the content indexing processes.

    Multi-role server deployments will have an additional memory requirement beyond the amounts specified above. CAS memory is computed as a base memory requirement for the CAS components (2GB) plus additional memory that scales based on the expected workload. This overall CAS memory requirement on a multi-role server can be computed using the following formula:

    image044

    Essentially this is 2GB of memory for the base requirement, plus 2GB of memory for each processor core (or fractional processor core) serving active load at peak in a worst-case failure scenario. Reusing the example scenario, if I have 16 active databases per-server in a worst-case failure and my processor is providing 2123 mcycles per-core, I would need:

    image045

    If we add that to the memory requirement for the Mailbox role calculated above, our total memory requirement for the multi-role server would be:

    48.75GB for Mailbox + 5.12GB for CAS = 53.87GB, round up to 64GB

    Regardless of whether you are considering a multi-role or a split-role deployment, it is important to ensure that each server has a minimum amount of memory for efficient use of the database cache. There are some scenarios that will produce a relatively small memory requirement from the memory calculations described above. We recommend comparing the per-server memory requirement you have calculated with the following table to ensure you meet the minimum database cache requirements. The guidance is based on total database copies per-server (both active and passive). If the value shown in this table is higher than your calculated per-server memory requirement, adjust your per-server memory requirement to meet the minimum listed in the table.

    Per-Server DB Copies Minimum Physical Memory (GB)
    1-10 8
    11-20 10
    21-30 12
    31-40 14
    41-50 16

    In our example scenario, we are deploying 48 database copies per-server, so the minimum physical memory to provide necessary database cache would be 16GB. Since our computed memory requirement based on per-user guidance including memory for the CAS role (53.87GB) was higher than the minimum of 16GB, we don’t need to make any further adjustments to accommodate database cache needs.

    Unified messaging

    With the new architecture of Exchange, Unified Messaging is now installed and ready to be used on every Mailbox and CAS. The CPU and memory guidance provided here assumes some moderate UM utilization. In a deployment with significant UM utilization with very high call concurrency, additional sizing may need to be performed to provide the best possible user experience. As in Exchange 2010, we recommend using a 100 concurrent call per-server limit as the maximum possible UM concurrency, and scale out the deployment if the sizing of your deployment becomes bound on this limit. Additionally, voicemail transcription is a very CPU-intensive operation, and by design will only transcribe messages when there is enough available CPU on the machine. Each voicemail message requires 1 CPU core for the duration of the transcription operation, and if that amount of CPU cannot be obtained, transcription will be skipped. In deployments that anticipate a high amount of voicemail transcription concurrency, server configurations may need to be adjusted to increase CPU resources, or the number of users per server may need to be scaled back to allow for more available CPU for voicemail transcription operations.

    Sizing and scaling the Client Access Server role

    In the case where you are going to place the Mailbox and CAS roles on separate servers, the process of sizing CAS is relatively straightforward. CAS sizing is primarily focused on CPU and memory requirements. There is some disk IO for logging purposes, but it is not significant enough to warrant specific sizing guidance.

    CAS CPU is sized as a ratio from Mailbox role CPU. Specifically, we need to get 37.5% of the megacycles used to support active users on the Mailbox role. You could think of this as a 3:8 ratio (CAS CPU to active Mailbox CPU) compared to the 3:4 ratio we recommended in Exchange 2010. One way to compute this would be to look at the total active user megacycles required for the solution, take 37.5% of that, and then determine the required CAS server count based on high availability requirements and multi-site design constraints. For example, consider the 100,000 user example using the 200 message/day profile:

    Total CAS Required Mcycles = 100,000 users x 8.5 mcycles x 0.375 = 318,750 mcycles

    Assuming that we want to target a maximum CPU utilization of 80% and the servers we plan to deploy have 25,479 available megacycles, we can compute the required number of servers quite easily:

    image048

    Obviously we would need to then consider whether the 16 required servers meet our high availability requirements considering the maximum CAS server failures that we must design for given business requirements, as well as the site configuration where some of the CAS servers may be in different sites handling different portions of the workload. Since we specified in our example scenario that we want to survive a double failure in the single site, we would increase our 16 CAS to 18 such that we could sustain 2 CAS failures and still handle the workload.

    To size memory, we will use the same formula that was used for Exchange 2010:

    Per-Server CAS Memory = 2GB + 2GB per physical processor core

    image050

    Using the example scenario we have been using, we can calculate the per-server CAS memory requirement as:

    image051

    In this example, 20.77GB would be the guidance for required CAS memory, but obviously you would need to round-up to the next highest possible (or highest performing) memory configuration for the server platform: perhaps 24GB.

    Active Directory capacity for Exchange 2013

    Active Directory sizing remains the same as it was for Exchange 2010. As we gain more experience with production deployments we may adjust this in the future. For Exchange 2013, we recommend deploying a ratio of 1 Active Directory global catalog processor core for every 8 Mailbox role processor cores handling active load, assuming 64-bit global catalog servers:

    image052

    If we revisit our example scenario, we can easily calculate the required number of GC cores required.

    image053

    Assuming that my Active Directory GCs are also deployed on the same server hardware configuration as my CAS & Mailbox role servers in the example scenario with 12 processor cores, then my GC server count would be:

    image054

    In order to sustain double failures, we would need to add 2 more GCs to this calculation, which would take us to 7 GC servers for the deployment.

    As a best practice, we recommend sizing memory on the global catalog servers such that the entire NTDS.DIT database file can be contained in RAM. This will provide optimal query performance and a much better end-user experience for Exchange workloads.

    Hyperthreading: Wow, free processors!

    Turn it off. While modern implementations of simultaneous multithreading (SMT), also known as hyperthreading, can absolutely improve CPU throughput for most applications, the benefits to Exchange 2013 do not outweigh the negative impacts. It turns out that there can be a significant impact to memory utilization on Exchange servers when hyperthreading is enabled due to the way the .NET server garbage collector allocates heaps. The server garbage collector looks at the total number of logical processors when an application starts up and allocates a heap per logical processor. This means that the memory usage at startup for one of our services using the server garbage collector will be close to double with hyperthreading turned on vs. when it is turned off. This significant increase in memory, along with an analysis of the actual CPU throughput increase for Exchange 2013 workloads in internal lab tests has led us to a best practice recommendation that hyperthreading should be disabled for all Exchange 2013 servers. The benefits don’t outweigh the negative impact.

    There’s an important caveat to this recommendation for customers who are virtualizing Exchange. Since the number of logical processors visible to a virtual machine is determined by the number of virtual CPUs allocated in the virtual machine configuration, hyperthreading will not have the same impact on memory utilization described above. It’s certainly acceptable to enable hyperthreading on physical hardware that is hosting Exchange virtual machines, but make sure that any capacity planning calculations for that hardware are based purely on physical CPUs. Follow the best practice recommendations of your hypervisor vendor on whether or not to enable hyperthreading. Note that the extra logical CPUs that are added when hyperthreading is enabled must not be considered when allocating virtual machine resources during the sizing and deployment process. For example, on a physical host running Hyper-V with 40 physical processor cores and hyperthreading enabled, 80 logical processor cores will be visible to the root operating system. If your Exchange design required 16-core servers, you could place 2 Exchange VMs on the physical host as those 2 VMs would consume 32 physical processor cores without enough physical processor cores to host another 16-core VM (32+16 = 48, which is greater than 40).

    You are going to give me a calculator, right?

    Now that you have digested all of this guidance, you are probably thinking about how much more of a pain it will be to size a deployment compared to using the Mailbox Role Requirements Calculator for Exchange 2010. UPDATE: You can now read about and download the calculator from here.

    Hopefully that leaves you with enough information to begin to properly size your Exchange 2013 deployments. If you have further questions, you can obviously post comments here, but I’d also encourage you to consider attending one of the upcoming TechEd events. I’ll be at TechEd North America as well as TechEd Europe with a session specifically on this topic, and would be happy to answer your questions in person, either in the session or at the “Ask the Experts” event. Recordings of those sessions will also be posted to MSDN Channel9 after the events have concluded.

    Jeff Mealiffe
    Principal Program Manager Lead
    Exchange Customer Experience

    Updates

    • 4/3/2014: Updated CAS CPU and Memory sizing with SP1 guidance
    • 1/6/2015: Updated the hyperthreading section
  • Public Folders and Exchange Online

    Update 6/5/2013: We have updated the blog post to add the link to the first TechNet document on public folder Hybrid scenarios.

    “You mean… this is really happening?”

    Last November we gave you a teaser about public folders in the new Exchange. We explained how public folders were given a lot of attention to bring their architecture up-to-date, and as a result of this work they would take advantage of the other excellent engineering work put into Exchange mailbox databases over the years. Many of you have given the new public folders a try in Exchange Online and Exchange Server 2013 in your on-premises environments. At this time we would like to give you a bit more detail surrounding the Exchange Online public folder feature set so you can start planning what makes sense for your environment. So, yes, we really meant our beloved public folders were coming to Exchange Online!

    How do we move our public folders to Exchange Online?

    We are still putting the finishing touches on some of our migration documentation for on-premises Exchange Server environments to Exchange Online. We know there is a lot of interest in this documentation and we are making sure it is as easy to follow as possible. We will update this article with links to the content when more documentation becomes available on TechNet. The following two articles are available now.

    Important

    Before we cover the migration process at a high level (and very deeply in those TechNet articles!), we want to be very clear everyone understands the following few important points.

    • Public Folder migrations to Exchange Online should not be performed unless all of your users are located in Exchange Online, and/or all of your on-premises users are on Exchange Server 2013.

    • Public folder migrations are a cutover migration. You cannot have some public folders on-premises and some public folders in Exchange Online. There will be a small window of public folder access downtime required when the migration is completed and all public folder connections are moved from on-premises to Exchange Online.

    • Public folder migrations are entirely PowerShell based at this time. Once the migration has completed you can then perform your public folder management in the tool of your choice, EAC or PowerShell.

    So what are the steps I can expect to go through?

    In the TechNet content we walk you through exactly how to use PowerShell and some scripts provided by the product group to help automate the analysis and content location mapping in Exchange 2013 or Exchange Online. The migration process is similar whether you are doing an on-premises to on-premises migration, or an on-premises to Exchange Online migration with the latter having a couple more twists. Both scenarios will include a few major steps you will go through to migrate your legacy public folder infrastructure. Again, the following section is meant to be an overview and not a complete rendering of what the more detailed step-by-step TechNet documentation contains. Consider this section an appetizer to get you thinking about your migration and what potential caveats may or may not affect you. The information below is tailored more to an Exchange Online migration, but our on-premises customers will also be facing many of the same steps and considerations.

    Prepare Your Environment

    • Are my on-premises servers at the necessary patch levels?
      • Exchange 2007 SP3 RU10 or later
      • Exchange 2010 SP3 or later
      • Exchange 2013 RTM CU1 or later
        • The CU1 released on April 2nd 2013 is necessary. Because there is no Service Pack released for Exchange 2013 at this time it is referred to as RTM CU1.
    • Are my Windows Outlook users using client versions at the necessary patch levels?
      • Outlook 2007, 12.0.6665.5000 or later
      • Outlook 2010, 14.0.6126.5000 or later
      • Outlook 2013, 15.0.4420.1017 or later
    • Are all on-premises users on Exchange Server 2013 or have been moved to Exchange Online?

    Analyze Your Current Public Folders and Content

    (Size limits pertain to Exchange Online)

    • What does my current public folder infrastructure look like?
      • Who has access to what?
      • What is my total content size?
        • Is the total public folder content on Exchange 2007/2010 over 950 GB when Get-PublicFolderStatistics is run? (“Why” is discussed later)
        • Is the total public folder content on Exchange 2013 over 1.25 TB when Get-PublicFolderStatistics is run?
      • Is any single public folder over 15GB that we should trim down first? (“Why” is discussed later)
    • What will my public folder mailbox layout be?
      • Can my content fit within the allowed public folder mailboxes and their quotas?
      • What public folders will go into what public folder mailboxes?

    Create the Initial Public Folder Mailboxes

    • Public folder mailboxes are created by the admin so your content has a place to live in Exchange Online. Customers with less than 25GB of content may only need a single public folder mailbox to start, but our scripts will help you determine your starting layout while backend automation will determine if you need more public folder mailboxes down the road. On-premises customers will utilize quota values that make sense for their own deployments.

    Begin the Migration Request and Initial Data Sync

    • The initial copy of public folder content from on-premises to Exchange Online is performed. This may take a long time depending on how much content you have. There is no easy way to predict the length of time it will take as there are many variables to consider, but you can monitor the progress via PowerShell. Users will continue using the on-premises public folder infrastructure during this time so there is no impact to the on-premises environment

    Perform Delta Syncs of Changed Content

    • These content delta syncs run by the admin help shorten the window of downtime for the finalization process by copying only data changed after the initial migration request copy was performed. Numerous delta syncs may be required in large environments with many public folder servers.

    Lock On-premises Public Folders and Finalize the Migration Request

    • Access to the on-premises public folder environment is blocked and a final delta sync of changed data is performed. When this stage is completed your Exchange Online public folders will be ready for user access. The access block is required to prevent any content changes taking place on-premises just before your users connections are transitioned to the Exchange Online public folder environment.

    Validate the Exchange Online Public Folder Environment

    • Create new content and permission reports, and compare them to the reports created prior to the migration.
      • If the administrator is happy, the new Exchange Online public folders will then be unlocked for user access.
      • If the administrator feels the migration was not successful, a roll back to the on-premises public folder infrastructure is initiated. However, if any changes were made to Exchange Online public folders such as content, permissions, or folders created/deleted before the rollback is initiated, then those changes will not be replicated to the on-premises infrastructure.

    Removal of legacy public folder content

    • The administrator will remove the public folder databases from the on-premises infrastructure.

    Microsoft, what can I do/not do with these things in Exchange Online?

    Now that we have given you an idea of what the migration process will be let us talk about the feature itself. Starting with the new Office 365, customers of Exchange Online will be able to store, free of charge, approximately 1.25 terabytes of public folder data in the cloud. Yes, you read the right… over a terabyte. The way this works is your tenant will be allowed to create up to fifty (50) public folder mailboxes, each yielding a 25 GB quota. However, when operating in a hybrid environment, public folders can exist only on-premises or in Exchange Online.

    Once you complete the migration process of public folders to Exchange Online, the on-premises public folder infrastructure will have its hierarchy locked to prevent user connections and its content frozen at that point in time. By locking the on-premises content we provide you with a way to rollback a migration from Exchange Online, if you deem it necessary. However, as mentioned before, a rollback can result in data loss as no changes made while using the Exchange Online public folder infrastructure are copied back on-premises.

    We will support on-premises Exchange Server 2013 users accessing Exchange Online public folders. We will also support Exchange Online users accessing on-premises public folders if you choose to keep your public folder infrastructure local. The below table depicts what users can access what public folder infrastructures. Please note for a hybrid deployment on-premises users must be on Exchange 2013 if you wish for them to access Exchange Online public folders. Also it bears worth repeating that public folders can only exist in one location, on-premises or in Exchange online. You cannot have two different public folder infrastructures being utilized at once.

    PF location > 2007 On-Premises 2010 On-Premises 2013 On-Premises Exchange Online
    Mailbox version:        
    Exchange 2007

    Yes

    Yes

    No

    No

    Exchange 2010

    Yes

    Yes

    No

    No

    Exchange 2013

    Yes

    Yes

    Yes

    Yes

    New Exchange Online

    Yes

    Yes

    Yes

    Yes

    How is public folder management in Exchange Online performed?

    When your public folder content migration is complete or you create public folders for the very first time, you will not have to worry about managing many aspects of public folders in Exchange Online. As you previously read, public folders in Exchange Server 2013 and Exchange Online are now stored within a new mailbox type in the mailbox database. Our on-premises customers will have to create public folder mailboxes, monitor their usage, create new public folder mailboxes when necessary, and split content to different public folder mailboxes as their content grows over time. In Exchange Online we will automatically perform the public folder mailbox management so you may focus your time managing the actual public folders and their content. If we were to peek behind the Exchange Online curtain, we would see two automated processes running at all times to make everything happen:

    1. Automatic public folder moves based on public folder mailbox quota usage
    2. Automatic public folder mailbox creation based on active hierarchy connection count

    Let’s go through each one of them, shall we?

    1. Automatic public folder moves based on public folder mailbox quota usage

    This process actively monitors your public folder mailbox quota usage. This process’ goal ensures you do not inadvertently fill a public folder mailbox and stop it from being able to accept new content for any public folder within it.

    When a public folder mailbox reaches the Issue Warning Quota value of 24.5 GB, this process is automatically triggered to redistribute where your public folders currently reside. This may result in Exchange Online simply moving some public folders from the nearly-filled public folder mailbox to another pre-existing public folder mailbox holding less content. However, if there are no public folder mailboxes with enough free space to move public folders into, Exchange Online will automatically create a new public folder mailbox and move some of your public folders into the newly created public folder mailbox. The end result will be all public folder mailboxes being below the Issue Warning Quota.

    Public folder moves from one public folder mailbox to another are an online move process similar to normal mailbox moves. Due to the move process being an online experience your users may experience a slight disruption in accessing one or more public folders during the completion phase of the online move process. Any mail destined for mail enabled public folders being moved would be temporarily queued and then delivered once the move request completes.

    In case the curious amongst you are wondering, we do not currently prevent customers from lowering the public folder mailbox quota values even though there is no reason you should do that. However, you are prevented from configuring quotas values larger than 25 GB.

    Let us take a moment to visualize this process as a picture is worth a thousand words. In the first scenario below a customer currently has to two public folder mailboxes, PFMBX-001 and PFMBX-002. PFMBX-001 contains three public folders while PFMBX-002 contains only one public folder. PFMBX-001 has gone over the IssueWarningQuota value of 24.5 GB and currently contains 24.6 GB of content. When the automatic split process runs in this environment it sees there is plenty of space available in PFMBX-002, and moves a public folder from PFMBX-001 into PFMBX-002. In this example, the final result is two public folder mailboxes with a similar amount of data in each of them. Depending on the size of your folders this process may move a single large public folder, or numerous mall public folders. The example shows a single folder being moved.

    image
    Scenario 1: Auto split process shuffles public folders from one public folder mailbox to another one.

    In a second scenario below, a customer has a single public folder mailbox, PFMBX-001 containing three public folders. PFMBX-001 has gone over the IssueWarningQuota value of 24.5 GB and contains 24.6 GB of content. When the split process runs in this environment it sees there are no other public folder mailboxes available to move public folders into. As a result, the process creates a new empty public folder mailbox, PFMBX-002, and moves some public folders into the new public folder mailbox; the final result is two public folder mailboxes with a similar amount of data in each of them. Again in this example we are showing a single public folder being moved, but the process may determine it has to move many smaller public folders.

    image
    Scenario 2: Auto split process must create a new empty public folder mailbox before moving a public folder.

    One noteworthy limit in Exchange Online which should be mentioned is no single public folder in Exchange Online can be over 25 GB in size due to the underlying public folder mailbox having a 25 GB quota. To give you an idea how much data that is; 25 GB of data is similar to 350,000 items of 75 KB each, or 525,000 items of 50 KB each. In most cases this volume of data can easily be split amongst multiple public folders to avoid a single folder coming anywhere near the 25 GB limit of a single public folder.

    Our migration documentation will also suggest if you currently have a single public folder over 15 GB that you try to reduce that public folder’s size to under 15 GB prior to the migration by deleting old content or splitting it into multiple smaller public folders. When we say a single public folder over 15 GB we mean exactly that and it excludes any child folders. Any child folder of a parent folder is not considered part of the 15 GB content limit suggestion for these purposes because the child public folder may reside in a different public folder mailbox if necessary. The reason for this suggestion is two-fold. First, it helps prevent you from triggering the automated split-process as soon as your migration takes place if you were to migrate very large public folders form on-premises. Second, content moved from Exchange 2007/2010 to Exchange Online may result in the reported space utilized by a single public folder increasing by 30%. The increase is due to a more accurate method used by Exchange Server 2013 to calculate space used within a mailbox database compared to earlier versions of Exchange Server. If you were to migrate a single massive public folder residing in on-premises Exchange Server 2007/2010 to Exchange Online this space recalculation may push the single public folder over the 25 GB quota. We want to help you avoid this situation as this would only be noticed once you were well into the data copy portion of the migration, and would cause you lost time having to redo the process all over again.

    If you have a particular business requirement which does not allow you to reduce the size of this single massive public folder in one of the ways previously suggested, then we will recommend you retain your entire public folder infrastructure on-premises instead of moving it to Exchange Online as we cannot increase the public folder mailbox quota beyond 25 GB.

    2. Automatic public folder mailbox creation based on active hierarchy connection count

    The second automated process helps maintain the most optimal user experience accessing public folders in Exchange Online. Exchange Online will actively monitor how many hierarchy connections are being spread across all of your public folder mailboxes. If this value goes over a pre-determined number we will automatically create a new public folder mailbox. Creating the additional public folder mailbox will reduce the number of hierarchy connections accessing each public folder mailbox by scaling the user connections out across a larger number of public folder mailboxes. If you are a customer whom has a small amount of public folder content in Exchange Online, yet you have an extremely large number of active users, then you may see the system create additional public folder mailboxes regardless of your content size.

    Ready for another example? In this example we will use low values for explanatory purposes. Let us pretend in Exchange Online we did not want more than two hundred active hierarchy connections per public folder mailbox. The diagram below shows nine hundred users making nine hundred active hierarchy connections across four public folder mailboxes. This scenario will work out to approximately 225 active hierarchy connections per public folder mailbox as the Client Access Servers spread the hierarchy connections across all available public folder mailboxes in the customer’s environment. When Exchange Online monitoring determines the desired number of two hundred active hierarchy connections per public folder mailbox has been exceeded, PFMBX-005 is automatically created. Immediately after creating PFMBX-005, Exchange Online will force a hierarchy sync to PFMBX-005 ensuring it has the most up to date information available regarding public folder structure and permissions before allowing it to accept client hierarchy connections. The end result in this example is we now have five public folder mailboxes accepting nine hundred active hierarchy connections for an average of 180 connections per public folder mailbox, thus assuring all active users have the best interactive experience possible.

    image
    Scenario 3: Auto split process creates a new public folder mailbox to scale out active hierarchy connections.

    Once you begin utilizing the Exchange Online public folder infrastructure we are confident this built-in automation will help our customers focus on doing what they do best, which is running their business. Let us take care of the infrastructure for you so you have more time to spend on your other projects.

    Summary

    In summary we are extremely excited to deliver public folders in the new Exchange Online to you, our customers. We believe you will find the migration process from on-premises to Exchange Online fairly straightforward and our backend automation will alleviate you from having to manage many aspects of the feature. We really hope you enjoy using the public folders with Exchange Online as much as we enjoyed creating them for you.

    Special thanks to the entire Public Folder Feature Crew, Nino Bilic, Tim Heeney, Ross Smith IV and Andrea Fowler for contributing to and validating this data.

    Brian Day
    Senior Program Manager
    Exchange Customer Experience

  • Introducing Message Analyzer, an SMTP header analysis tool in Microsoft Remote Connectivity Analyzer

    Microsoft Remote Connectivity Analyzer is a web-based tool that provides administrators and end users with the ability to run connectivity diagnostics for our servers to test common issues with Microsoft Exchange, Lync and Office 365. The tool started as Microsoft Exchange Server Remote Connectivity Analyzer, and based on your feedback we've continued to add functionality to test connectivity with Lync and Office 365, and made other enhancements such as tests for Outlook Anywhere, Exchange Web Services, outbound SMTP, Office 365 Single Sign-On test, support for 10 additional languages and an improved captcha experience.

    We're excited to announce Message Analyzer, a brand new addition to the Remote Connectivity Analyzer. Message Analyzer makes reading email headers less painful.

    Screenshot: Message Analyzer tab
    Figure 1: The new Message Analyzer tab in RCA

    SMTP message headers contain a wealth of information which allows you to determine the origins of a message and how it made its way through one or more SMTP servers to its destination. To use Message Analyzer, all you need to do is copy message headers from a message and paste them in the Message Analyzer tab on the RCA web site.

    Screenshot 2: Paste message headers in Message Analyzer
    Figure 2: Paste message headers in the Message Analyzer

    Trying to locate message headers in Outlook 2010 and later? See Hey Outlook 2010, where are my message headers?

    Features of the Message Analyzer

    Here's a quick look at what you can do with Message Analyzer.

    • View the most important properties and total delivery time at a quick glance.

      Screenshot 3: Message Analyzer tab
      Figure 3: View the most important header properties and delivery time

    • Analyze the received headers and displays the longest delays quickly for easy discovery of sources of message transfer delays.

      Screenshot 4: View where longest message transfer delays occurrd
      Figure 4: Quickly detect where the longest message transfer delays occurred

    • Sort all headers by header name or value.

      Screenshot 5: Sort message headers
      Figure 5: Sort message headers

    • Quickly collapse the sections that you don’t need.

    • All processing is done in your browser, and no private information is shared with Microsoft.

    • Useful for any header, whether generated by Exchange, Office 365, or any other RFC standard SMTP server or agent.

    Note, we consider this feature to be in beta for the moment. Please send us feedback and we’ll continue to make improvements.

    Check out this update to the RCA at testconnectivity.microsoft.com (short URL: aka.ms/rca).

    Stephen Griffin & Scott Landry
    On behalf of the entire MCA/RCA team
    Follow the team on Twitter - @ExRCA

  • Troubleshoot your Exchange 2010 database backup functionality with VSSTester script

    Frequently in support, we encounter several backup related calls for Exchange 2010 databases. A sample of common issues we hear from our customers are:

    • “My backup software is not able to take a successful snapshot of the databases”
    • “My backups have been failing for quite a while. I have several thousand log files consuming disk space and I will eventually run out of disk space”
    • “My backup software indicates that the backup is successful but at the end of my backup, logs do not truncate”
    • “The Exchange Writer /VSS writer is not in a stable state (state is listed as ‘Retryable‘, ’Waiting for completion‘ or ’Failed’)”
    • “We suspect that the Volume Shadow Copy Service (VSS) is failing on the server and hence there are no successful backups”

    It is critical to understand how backups and log truncation work in Exchange 2010. If you haven't already done so, check out our three-part blog series by Jesse Tedoff on backups and log truncation in Exchange 2010, Everything You Need to Know About Exchange Backups*.

    When troubleshooting backups in Exchange 2010 we are interested in two writers – the Exchange Information Store Writer (utilized for active copy backups) and the Exchange Replica Writer (utilized for passive copy backups). The writers are responsible for providing the metadata information for databases to the VSS Requestor (aka the backup software). The VSS Provider is the component that creates and maintains shadow copies. At the end of successful backups, when the Volume Shadow Copy Service signals backup is complete, the writers initiate post-backup steps which include updating the database header and performing log truncation. (For more details, see Exchange VSS Writers on MSDN.)

    As explained above, it is the responsibility of the VSS Requestor to get metadata information from Exchange writers and at the end of successful backup, VSS service signals backup complete to the Exchange writers so the writers can perform post-backup operations.

    The purpose of this blog is to discuss the VSSTester script, its functionality and how it can help diagnose backup problems.

    What does the script to?

    The script has two major functions:

    1. Perform Diskshadow backup of a selected Exchange database so we can exercise the VSS framework in the system, so at the end of a successful snapshot, database header is updated and log files are truncated. We will discuss in detail what Diskshadow is and what it does.
    2. The second function of this script is to collect diagnostic data. For backup cases, there is a lot of data that needs to be collected. To get the diagnostic data you may have to manually go to different places in the Exchange server and turn on logging. If that is not done correctly, we will miss getting crucial logs during the time of the issue. The script makes the data collection process much easier.

     

    Script requirements

    1. The current version of the script works only on Exchange 2010 servers.
    2. The script needs to be run on the Exchange server that is experiencing backup issues. If you are having issues with passive copy backups, please go to the appropriate node in the DAG and run the script. For example: You may have Database A having copies on Server1, Server2 and Server3. Server1 hosts the active copy of the database. If backups of the active copy have previously failed run the script on Server1. Otherwise run script on whichever of the remaining servers has failed previously when backing up the passive copy.
    3. Please ensure that you have enough space on the drive you save the configuration and output files. Exchange and VSS traces, Diagnostic logs can occupy up to several GBs of drive space depending on the time taken for taking backup. For example: Running the script in a lab environment consumed close to 25MB of drive space a minute.
    4. The script is unsigned. On the server where you run the script you will have to set the execution policy to allow unsigned PowerShell scripts. Please see this for how to do this.

    The script can be run on any DAG configuration. You can use this to troubleshoot Mailbox and Public folder database backup issues. Databases and log files can be on regular drives or mount points. Mix and match of the two will also work!

    Let us discuss in detail the two main functionalities of the script.

    Diskshadow functionality and how the script uses it

    What is Diskshadow and why do we utilize it in VSSTester script?

    Diskshadow.exe is a command line tool built in to Windows Server 2008 operating system family as well as Windows Server 2012. Diskshadow is an in-box VSS requestor. It is utilized to test the functionality provided by the Volume Shadow Copy Service (VSS). For more details on Diskshadow please visit:

    http://technet.microsoft.com/en-us/library/ee221016(v=ws.10).aspx

    http://blogs.technet.com/b/josebda/archive/2007/11/30/diskshadow-the-new-in-box-vss-requester-in-windows-server-2008.aspx

    The best part about Diskshadow is that it includes a script mode for automating tasks. This feature of Diskshadow is utilized in the VSSTester. The shadow copy done by Diskshadow is a snapshot of the entire volume at a given point in time. This copy is read-only.

    More details on how a shadow copy is created, please visit the following link: http://technet.microsoft.com/en-us/library/ee923636(v=ws.10).aspx

    During the course of the blog post, I will be mentioning the term “Diskshadow backup”. It is very important to understand that the term “backup” is relative here. Diskshadow uses the VSS service and gets the appropriate writer to be utilized for the snapshot. The writer will provide the metadata information of database /log files to the Diskshadow. After which Diskshadow utilizes the VSS Provider to create a shadow copy.

    After a successful shadow copy /snapshot of databases and log files, the VSS Provider signals an end-backup to Exchange writers. To Exchange this looks like a full backup has been performed on the database. The key to understand here is NO data is actually transferred to a device, tape etc. This is only a test! You will see events in the application logs that usually show up when you take a regular backup, but NO data is actually backed up. Diskshadow has simply run all the backup APIs through the backup process without transferring any data.

    The VSS Provider will take a snapshot of all the databases and logs (if present) on the volume. We will be doing a mirrored snapshot of the entire volume at the point in time when Diskshadow was run. Anything that is on the volume will be part of the snapshot. During the Diskshadow backup, we will be utilizing either the Information store writer (for active copy backup) or the Replica Writer (Passive copy backup) to provide the metadata information for the database.

    When you use the VSSTester script, it prompts you for a database to be selected to perform the Diskshadow backup. When we take a snapshot of the volume all other databases (if present on the same drive) will be part of the snapshot, but post-backup operations will happen only on the selected database. This is because we will be utilizing either the Information store Writer (Active Copy Backup) or the Replica Writer (Passive copy backup) that is associated with the selected database. DB headers get updated based on VSS Requestor interaction with the Exchange writer that was utilized, which in turn leads to log truncation. Hence the header of the selected database will be only updated and logs will be purged (only for that the selected database) without being backed up.

    When would you be interested in utilizing this Diskshadow functionality of the script?

    You would be interested to utilize this functionality in almost all scenarios that I discussed at the start of this blog post. In addition to those scenarios another one that is not related to backups sometimes arises:

    • “I had an unexpected high transactional log growth issue in my exchange 2010 environment and now I am on the verge of losing all disk space in the logs directory. I do not have the time to perform a backup to truncate logs and my goal is to safely remove all the log files”

    In the scenario mentioned above (and, by the way, if you have that problem, please go here), Exchange administrators would like to avoid causing a service outage by dismounting the database, removing log files and remounting the database. Another downside to manually removing the log files is breaking replication if the database has replicas across Database Availability Group members.

    If you are willing to forgo a backup of the log files you can use the Diskshadow functionality of the script to trigger the backup APIs and tell Exchange to truncate the log files. The truncation commands will replicate to the other database copies and purge log files there as well. If successful, the net result is that the database will not go offline for lack of disk space on the log drive, but you will not have the security of retaining those log files for a future restore.

    A sample run of the VSSTester script (with Diskshadow functionality)

    Let me demonstrate the Diskshadow functionality of the script.

    The Script can be downloaded from TechNet gallery here.

    The script initializes and gives us the following options.

    image

    We select the option 1 to test backup using the built-in Diskshadow function.

    image

    If the path does not exist, the script will create the folder for you.

    We gather the server name and verify it is an Exchange 2010 server. The script will check for the VSS writer status on the local machine. If we detect, any of the writers are not in a “Stable” state, the script will exit. You will need to restart the service associated with the writer to get the writers to a stable state (The Replication service for the Replica Writer or the Information Store service for the Exchange Writer).

    The script then gets a list of databases present on the local server and displays the database name, if database is mounted or not and what is the server that holds the active copy of the database. You will have to select the number of the database.

    Note: If the user does not provide an input, the script will automatically select the last database in the list.

    In my case, I selected database mdb5. The number to enter would be 8.

    image

    The next important check is ensuring that the database’s replicas (if present) are healthy. If we detect that one of the copies is not healthy, the script will exit mentioning that the database copies need to be in healthy status before running the script.

    image

    The script next detects the location of the database file and log files. We create the Diskshadow configuration file on the fly every time a database is selected. This configuration file is also saved to the location you had specified earlier (in the example screenshots of this blog c:\vsstesterlogs) to save the configuration and output files. In this case the log files are in a mount point and the database file is on a regular volume. The script will add the appropriate volumes to the disk shadow file.

    image

    The script will then prompt you to provide the drive letters to expose the snapshots. A common question that arises is, do I need to initialize the drive before I specify a drive letter? The answer is no!

    You will be specifying a drive letter that is currently not in use, so Diskshadow will create a virtual drive and expose the snapshot. Remember, the virtual drive that exposes the shadow copy is a read-only volume. The shadow copy is a read only copy .If the database and logs are in the same mount point / drive only, one drive letter is required to expose the snapshot, otherwise you will need to provide two different drive letters. One for exposing database snapshot and another for log files.

    image

    When you select the option to perform the Diskshadow backup, the script will automatically collect Diagnostic logs, ExTRA traces and VSS traces. Also verbose logging is turned on for Diskshadow. Whatever activity the script does is also logged in to transcript log and saved in the output files directory (c:\vsstesterlogs in this example).

    image

    Note: If you are performing a passive copy backup, ExTRA tracing will also be turned on in the active node. At the end of the script, we turn off ExTRA tracing in the active node and it will be automatically moved to the passive node. The active node ETL will be placed in the logs folder you had specified at the start of the script. .

    Now, the main Diskshadow function will execute.

    In the screenshots below we have excluded all other writers on the system that are associated with all other databases on the node (that are mounted or be replicas) and we are ONLY utilizing the writer associated with the selected database. This node hosts the passive copy of the database MDB5. Hence, the writer utilized will be associated with the Replication service aka the Microsoft Exchange Replica Writer.

    image

    image
    (please click on above two screenshots to see them)

    From the screen shot below, you can see that VSS Provider has taken a successful snapshot of the database and signaled end backup to the replica writer.

    image

    Now that we performed a successful snapshot of the database and log files, all the logging that was turned on will be turned off. The log files will be consolidated in the logs folder that you specified earlier at the start of the script. The script checks the VSS writer status after the backup is complete.

    image

    When the snapshot operation is complete, you will be prompted for an option to either remove the snapshot or leave the snapshots exposed in Windows Explorer.

    image
    (click to view)

    I selected the option to remove the snapshot; hence we will be invoking Diskshadow again to delete the snapshot created earlier.

    Let us discuss in detail exposing and removing snapshot functionality:

    1. Remove snapshots - The snapshots that were taken earlier (database or log files) will be exposed in the Windows explorer, if the snapshot operation was successful. In this script we expose the snapshots as a drive letter (that you had specified earlier). If you do not want to have a copy of the log files, you may chose this option and the snapshot will be deleted. All the logs that got purged after post-backup will be present in this read only volume and when this volume is removed they will be deleted forever.
    2. Expose Snapshots – You may choose to have the snapshots exposed. Later, if you want to delete the snapshot, please do the following
      • Open Command prompt
      • Type in Diskshadow
      • Delete shadows exposed <volume>

    Note: It is highly recommended to take a full backup of the database using your regular backup software after utilizing Diskshadow.

    After this, the script collects the application and system logs. The script filters them to cover only the period you started the script to the present. The transcript log is also stopped. The logs will be saved as a text file and saved in the output folder you had specified earlier (c:\vsstesterlogs in this example).

    image

    The most reliable method to verify log truncation takes place is to get the log sequence before and after the backup. Hence, before running the script I ran eseutil/ml ENN (the log generation prefix associated with database).

    image

    Post-backup, when I ran the same command, and can see:

    image

    We can clearly see a difference in the start of the sequence, meaning log truncation has occurred for the database. One more verification that can be done is to check the database header. We can see that the database header got updated to the most recent time, where Diskshadow was run.

    image

    I ran the script; what have I accomplished?

    If the script finished successfully:

    • We were able to successfully test and exercise the underlying VSS framework in the server. Volume Shadow Copy service was able to successfully identify and utilize the Exchange writers in the box
    • The Exchange writers are able to provide the metadata information to the VSS Requestor (Diskshadow)
    • VSS Provider was able to successfully create a snapshot /shadow copy
    • VSS successfully signaled the Exchange writers on backup complete
    • The Exchange writers were able to perform the post snapshot operations which included log truncation.

    Let us now look in to the other major functionality of the script.

    Enable logging to troubleshoot backup issues

    Use this if you do not want to test backup using Diskshadow and you just want to collect diagnostic logs for troubleshooting backup issues.

    You may collect the diagnostic logs and have them handy before calling Microsoft Support saving a lot of time in the support incident because you can provide the files at the beginning of the case.

    This time we will be selecting option 2 to enable logging.

    image

    Selecting this option does the majority of the things that the script did earlier, EXCEPT Diskshadow of course!

    After checking the writer status, you can select the database to backup. We will be enabling all the logging like before (Diagnostic Logging, ExTRA, VSS tracing). Remember that, even though you would still be selecting one database - diagnostic logging, ExTRA tracing, VSS tracing are not database specific and are turned on at the server level. When you are utilizing the script to troubleshoot backup issues you can select any one database on the server and it will turn on appropriate logging on the server.

    After the logging is turned on and traces enabled, you will see:

    image
    (click to view)

    Now you will need to start your regular backup. After the backup completes/fails, you will need to come back to the PowerShell window where you are running the script and use the “ENTER” key to terminate the data collection. The script then disables diagnostic logging and tracing that was turned up earlier. If needed it will copy diagnostic logs from the active node for that database copy as well.

    The script will again check for writer status after the backup then collect the application and system logs. It will stop the transcript log as well.

    At this point, in order to troubleshoot the issue, you can open a case with Microsoft Support and upload the logs.

    I hope this script helps you in better understanding the core concepts in Exchange 2010 backups, thus helping you troubleshoot backup issues! You can utilize Diskshadow to test Volume Shadow Copy Service and also check if the Exchange writers are performing as intended. If Diskshadow completes successfully without any error and you are still experiencing issues with backup software, you may need to contact the backup vendor to further troubleshoot the issue.

    Your feedback and comments are most welcome.

    Special thanks to Michael Barta for his contribution to the script, Theo Browning and Jesse Tedoff for reviewing the content.

    Muralidharan Natarajan

  • Exchange 2010 Database Availability Groups and Disk Sector Sizes

    These days, some customers are deploying Exchange databases and log files on advanced format (4K) drives.  Although these drives support a physical sector size of 4096, many vendors are emulating 512 byte sectors in order to maintain backwards compatibility with application and operating systems.  This is known as 512 byte emulation (512e).  Windows 2008 and Windows 2008 R2 support native 512 byte and 512 byte emulated advanced format drives.  Windows 2012 supports drives of all sector sizes.  The sector size presented to applications and the operating system, and how applications respond, directly affects data integrity and performance.

    For more information on sector sizes see the following links:

    When deploying an Exchange 2010 Database Availability Group (DAG), the sector sizes of the volumes hosting the databases and log files must be the same across all nodes within the DAG.  This requirement is outlined in Understanding Storage Configuration.

    Support requires that all copies of a database reside on the same physical disk type. For example, it is not a supported configuration to host one copy of a given database on a 512-byte sector disk and another copy of that same database on a 512e disk. Also be aware that 4-kilobyte (KB) sector disks are not supported for any version of Microsoft Exchange and 512e disks are not supported for any version of Exchange prior to Exchange Server 2010 SP1.

    Recently, we have noted that some customers have experienced issues with log file replication and replay as the result of sector size mismatch.  These issues occur when:

    • Storage drivers are upgraded resulting in the recognized sector size changing.
    • Storage firmware is upgraded resulting in the recognized sector size changing.
    • New storage is presented or existing storage is replaced with drives of a different sector size.

    This mismatch can cause one or more database copies in a DAG to fail, as illustrated below. In my example environment, I have a three-member DAG with a single database that resides on a volume labeled Z that is replicated between each member.

    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Mounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 1 3/19/2013 10:27:50 AM Healthy
    SectorTest\MBX-3 Healthy 0 1 3/19/2013 10:27:50 AM Healthy

    If I use FSUTIL to query the Z volume on each DAG member, we can see that the volume currently has 512 logical bytes per sector and a 512 physical bytes per sector. Thus, the the volume is currently seen by the operating system as having a native 512 byte sector size.

    On MBX-1:

    C:\>fsutil fsinfo ntfsinfo z:

    NTFS Volume Serial Number :       0x18d0bc1dd0bbfed6
    Version :                         3.1
    Number Sectors :                  0x000000000fdfe7ff
    Total Clusters :                  0x0000000001fbfcff
    Free Clusters  :                  0x0000000001fb842c
    Total Reserved :                  0x0000000000000000
    Bytes Per Sector  :               512
    Bytes Per Physical Sector :       512

    Bytes Per Cluster :               4096
    Bytes Per FileRecord Segment    : 1024
    Clusters Per FileRecord Segment : 0
    Mft Valid Data Length :           0x0000000000040000
    Mft Start Lcn  :                  0x00000000000c0000
    Mft2 Start Lcn :                  0x0000000000000002
    Mft Zone Start :                  0x00000000000c0040
    Mft Zone End   :                  0x00000000000cc840
    RM Identifier:        EF486117-9094-11E2-BF55-00155D006BA1

    On MBX-3:

    C:\>fsutil fsinfo ntfsinfo z:

    NTFS Volume Serial Number :       0x0ad44aafd44a9d37
    Version :                         3.1
    Number Sectors :                  0x000000000fdfe7ff
    Total Clusters :                  0x0000000001fbfcff
    Free Clusters  :                  0x0000000001fad281
    Total Reserved :                  0x0000000000000000
    Bytes Per Sector  :               512
    Bytes Per Physical Sector :       512

    Bytes Per Cluster :               4096
    Bytes Per FileRecord Segment    : 1024
    Clusters Per FileRecord Segment : 0
    Mft Valid Data Length :           0x0000000000040000
    Mft Start Lcn  :                  0x00000000000c0000
    Mft2 Start Lcn :                  0x0000000000000002
    Mft Zone Start :                  0x00000000000c0000
    Mft Zone End   :                  0x00000000000cc820
    RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

    Effects of storage changes

    But what happens if there is a change in the way storage is seen on MBX-3, so that the volume now reflects a 512e sector size.  This can happen when upgrading storage drivers, upgrading firmware, or presenting new storage that implements advanced format storage.

    C:\>fsutil fsinfo ntfsinfo z:

    NTFS Volume Serial Number :       0x0ad44aafd44a9d37
    Version :                         3.1
    Number Sectors :                  0x000000000fdfe7ff
    Total Clusters :                  0x0000000001fbfcff
    Free Clusters  :                  0x0000000001fad2e7
    Total Reserved :                  0x0000000000000000
    Bytes Per Sector  :               512
    Bytes Per Physical Sector :       4096

    Bytes Per Cluster :               4096
    Bytes Per FileRecord Segment    : 1024
    Clusters Per FileRecord Segment : 0
    Mft Valid Data Length :           0x0000000000040000
    Mft Start Lcn  :                  0x00000000000c0000
    Mft2 Start Lcn :                  0x0000000000000002
    Mft Zone Start :                  0x00000000000c0040
    Mft Zone End   :                  0x00000000000cc840
    RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

    When reviewing the database copy status, notice that the copy assigned to MBX-3 has failed.

    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Mounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/19/2013 11:13:05 AM Healthy
    SectorTest\MBX-3 Failed 0 8 3/19/2013 11:13:05 AM Healthy

    The full details of the copy status of MBX-3 can be reviewed to display the detailed error:

    [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\MBX-3 | fl

    RunspaceId                       : 5f4bb58b-39fb-4e3e-b001-f8445890f80a
    Identity                         : SectorTest\MBX-3
    Name                             : SectorTest\MBX-3
    DatabaseName                     : SectorTest
    Status                           : Failed
    MailboxServer                    : MBX-3
    ActiveDatabaseCopy               : mbx-1
    ActivationSuspended              : False
    ActionInitiator                  : Service
    ErrorMessage                     : The log copier was unable to continue processing for database 'SectorTest\MBX-3' because an error occurred on the target server: Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after a short delay.
    ErrorEventId                     : 2152
    ExtendedErrorInfo                :
    SuspendComment                   :
    SinglePageRestore                : 0
    ContentIndexState                : Healthy
    ContentIndexErrorMessage         :
    CopyQueueLength                  : 0
    ReplayQueueLength                : 7
    LatestAvailableLogTime           : 3/19/2013 11:13:05 AM
    LastCopyNotificationedLogTime    : 3/19/2013 11:13:05 AM
    LastCopiedLogTime                : 3/19/2013 11:13:05 AM
    LastInspectedLogTime             : 3/19/2013 11:13:05 AM
    LastReplayedLogTime              : 3/19/2013 10:24:24 AM
    LastLogGenerated                 : 53
    LastLogCopyNotified              : 53
    LastLogCopied                    : 53
    LastLogInspected                 : 53
    LastLogReplayed                  : 46
    LogsReplayedSinceInstanceStart   : 0
    LogsCopiedSinceInstanceStart     : 0
    LatestFullBackupTime             :
    LatestIncrementalBackupTime      :
    LatestDifferentialBackupTime     :
    LatestCopyBackupTime             :
    SnapshotBackup                   :
    SnapshotLatestFullBackup         :
    SnapshotLatestIncrementalBackup  :
    SnapshotLatestDifferentialBackup :
    SnapshotLatestCopyBackup         :
    LogReplayQueueIncreasing         : False
    LogCopyQueueIncreasing           : False
    OutstandingDumpsterRequests      : {}
    OutgoingConnections              :
    IncomingLogCopyingNetwork        :
    SeedingNetwork                   :
    ActiveCopy                       : False

    Using the Exchange Server Error Code Look-up tool (ERR.EXE), we can verify the definition of the error code –546.

    D:\Utilities\ERR>err -546

    # for decimal -546 / hex 0xfffffdde
      JET_errLogSectorSizeMismatch                                   esent98.h
    # /* the log file sector size does not match the current
    # volume's sector size */
    # 1 matches found for "-546"

    In addition, the Application event log may contain the following entries:

    Log Name:      Application
    Source:        MSExchangeRepl
    Date:          3/19/2013 11:14:58 AM
    Event ID:      2152
    Task Category: Service
    Level:         Error
    User:          N/A
    Computer:      MBX-3.exchange.msft
    Description:
    The log copier was unable to continue processing for database 'SectorTest\MBX-3' because an error occured on the target server: Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after a short delay.

    The cause

    Why does this issue occur?
    Each log file records in the header the sector size of the disk where a log file was created.  For example, this is the header of a log file on MBX-1 with a native 512 byte sector size:

    Z:\SectorTest>eseutil /ml E0100000001.log

    Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
    Version 14.02
    Copyright (C) Microsoft Corporation. All Rights Reserved.
    Initiating FILE DUMP mode... 
          Base name: E01
          Log file: E0100000001.log
          lGeneration: 1 (0x1)
          Checkpoint: (0x38,FFFF,FFFF)
          creation time: 03/19/2013 09:40:14
          prev gen time: 00/00/1900 00:00:00
          Format LGVersion: (7.3704.16.2)
          Engine LGVersion: (7.3704.16.2)
          Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer:
          Env SystemPath: z:\SectorTest\
          Env LogFilePath: z:\SectorTest\
         Env Log Sec size: 512 (matches)
          Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
              (    off,   1227,  61350,  16384,  61350,   2048,   2048,  44204)
          Using Reserved Log File: false
          Circular Logging Flag (current file): off
          Circular Logging Flag (past files): off
          Checkpoint at log creation time: (0x1,8,0) 
          Last Lgpos: (0x1,A,0)
    Number of database page references:  0
    Integrity check passed for log file: E0100000001.log
    Operation completed successfully in 0.62 seconds.

    The sector size that is chosen is determined through one of two methods:

    • If the log stream is brand new, read the sector size from disk and utilize this sector size.
    • If the log stream already exists, use the sector size of the given log stream.

    In theory, since the sector size of disks should not be changing across nodes and the sector size of all disks must match, this should not cause a problem.  In our example, and in some customer environments, these sector sizes are actually changing.  Since most of these databases already exist, the existing sector size of the log stream is utilized, which in turn causes a mismatch between DAG members.

    When a mismatch occurs, the issue only prevents the successful use of block mode replication.  It does not affect file mode replication.  Block mode replication was introduced in Exchange 2010 Service Pack 1.  For more information on block mode replication, see New High Availability and Site Resilience Functionality in Exchange 2010 SP1.

    Why does this only affect block mode replication?
    When a log file is addressed we reference locations within a log file based off a log file position.  The log file position is a combination of the log generation, the sector, and offset within that sector.  For example, in the previous header dump you can see the “Last LGPOS” is (0x1,A,0) – this just happens to be the last log file position within the log.  Let us say we were creating a block for block mode replication within a log file generation 0x1A, sector 8, offset 1 – this would be reflected as an LGPOS of (0x1a,8,1).  When this block is transmitted to a host with an advanced sector size disk, the log position would actually have to be translated.  On an advanced format disk this same log position would be (0x1a,1,1).  As you can see, it could create significant problems if incorrect positions within a log file were written to or read from.

    The resolution

    How do I go about correcting this condition?
    To fix this condition, first ensure that the same sector sizes exist on all disks across all nodes that host Exchange data, and then reset the log stream.

    The following steps can show you how to do this with minimal downtime.

    1. Ensure that Exchange 2010 Service Pack 2 or later is installed on all DAG members.

      Note: Exchange 2010 Service Pack 1 and earlier do not support 512e volumes).

    2. Disable block mode replication on all hosts.  This step requires restarting the replication service on each node.  This will temporarily cause all copies to fail on passive nodes when the service is restarted on the active node.  When the service is restarted on the passive node only passive copies on that node will enter a failed state.  Databases that are mounted and client connections are not impacted by this activity.  Block mode replication should remain disabled until all steps have been completed on all DAG members.
      1. Launch registry editor.
      2. Navigate to HKLM\Software\Microsoft\ExchangeServer\V14\Replay\Parameters
      3. Right click in the parameters key and select New –> DWORD
      4. The name for the DWORD is DisableGranularReplication
      5. The value for the DWORD is 1
    3. Restart the Microsoft Exchange Replication service on each member using the Shell: Restart-Service MSExchangeRepl

    4. Validate that all copies of databases across DAG members are healthy at this time:

      [PS] C:\>Get-MailboxDatabaseCopyStatus *

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Mounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 0 3/19/2013 12:28:34 PM Healthy
      SectorTest\MBX-3 Healthy 0 0 3/19/2013 12:28:34 PM Healthy

    5. Apply the appropriate hotfix for Windows Server 2008 or Windows Server 2008 R2 and Advanced Format Disks.  Windows Server 2012 does not require a hotfix.

      • Windows 2008 R2: KB 982018 An update that improves the compatibility of Windows 7 and Windows Server 2008 R2 with Advanced Format Disks is available
      • Windows 2008: KB 2553708 A hotfix rollup that improves Windows Vista and Windows Server 2008 compatibility with Advanced Format disks
    6. Repeat the procedure that caused the disk sector size to change.  For example, if the issue arose as a result of upgrading drivers and firmware on a host utilize your maintenance mode procedures to complete the driver and firmware upgrade on all hosts.

      Note: If your installation does not allow for you to use the same sector sizes across all DAG members, then the implementation is not supported.

    7. Utilize FSUTIL to ensure that the sector sizes match across all hosts for the log and database volumes. 

      On MBX-1:

      C:\>fsutil fsinfo ntfsinfo z:

      NTFS Volume Serial Number :       0x18d0bc1dd0bbfed6
      Version :                         3.1
      Number Sectors :                  0x000000000fdfe7ff
      Total Clusters :                  0x0000000001fbfcff
      Free Clusters  :                  0x0000000001fac6e6
      Total Reserved :                  0x0000000000000000
      Bytes Per Sector  :               512
      Bytes Per Physical Sector :       4096

      Bytes Per Cluster :               4096
      Bytes Per FileRecord Segment    : 1024
      Clusters Per FileRecord Segment : 0
      Mft Valid Data Length :           0x0000000000040000
      Mft Start Lcn  :                  0x00000000000c0000
      Mft2 Start Lcn :                  0x0000000000000002
      Mft Zone Start :                  0x00000000000c0040
      Mft Zone End   :                  0x00000000000cc840
      RM Identifier:        EF486117-9094-11E2-BF55-00155D006BA1

      On MBX-2

      C:\>fsutil fsinfo ntfsinfo z:

      NTFS Volume Serial Number :       0xfa6a794c6a790723
      Version :                         3.1
      Number Sectors :                  0x000000000fdfe7ff
      Total Clusters :                  0x0000000001fbfcff
      Free Clusters  :                  0x0000000001fac86f
      Total Reserved :                  0x0000000000000000
      Bytes Per Sector  :               512
      Bytes Per Physical Sector :       4096

      Bytes Per Cluster :               4096
      Bytes Per FileRecord Segment    : 1024
      Clusters Per FileRecord Segment : 0
      Mft Valid Data Length :           0x0000000000040000
      Mft Start Lcn  :                  0x00000000000c0000
      Mft2 Start Lcn :                  0x0000000000000002
      Mft Zone Start :                  0x00000000000c0040
      Mft Zone End   :                  0x00000000000cc840
      RM Identifier:        5F18A2FC-909E-11E2-8599-00155D006BA2

      On MBX-3

      C:\>fsutil fsinfo ntfsinfo z:

      NTFS Volume Serial Number :       0x0ad44aafd44a9d37
      Version :                         3.1
      Number Sectors :                  0x000000000fdfe7ff
      Total Clusters :                  0x0000000001fbfcff
      Free Clusters  :                  0x0000000001fabfd6
      Total Reserved :                  0x0000000000000000
      Bytes Per Sector  :               512
      Bytes Per Physical Sector :       4096

      Bytes Per Cluster :               4096
      Bytes Per FileRecord Segment    : 1024
      Clusters Per FileRecord Segment : 0
      Mft Valid Data Length :           0x0000000000040000
      Mft Start Lcn  :                  0x00000000000c0000
      Mft2 Start Lcn :                  0x0000000000000002
      Mft Zone Start :                  0x00000000000c0040
      Mft Zone End   :                  0x00000000000cc840
      RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

    At this point, the DAG should be stable, and replication should be occurring as expected between databases using file mode. In order to restore block mode replication and fully recognize the new disk sector sizes, the log stream must be reset.

    IMPORTANT: Please note the following about resetting the log stream:

    • The log stream must be fully reset on all database copies.
    • All lagged database copies must be replayed to current log.
    • If backups are utilized as a recovery method this will introduce a gap in the log file sequence preventing  a full roll forward recovery from the last backup point.

    You can use the following steps to reset the log stream:

    1. Validate the existence of a replay queue:

      [PS] C:\>Get-MailboxDatabaseCopyStatus *

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Mounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:34:37 PM Healthy
      SectorTest\MBX-3 Healthy 0 138 3/19/2013 1:34:37 PM Healthy

    2. Set the replay and truncation lag times values to 0 on all database copies. This will ensure that logs replay to current while allowing the databases to remain online. In this example, MBX-3 is a lagged copy database. When the configuration change is detected, log replay will occur allowing the lagged copy to eventually catch up. Note that depending on the replay lag time, this could take several hours before proceeding to next steps.

      [PS] C:\>Set-MailboxDatabaseCopy SectorTest\MBX-3 -ReplayLagTime 0.0:0:0 -TruncationLagTime 0.0:0:0

      Validate that the replay queue has caught up and is near zero.

      [PS] C:\>Get-MailboxDatabaseCopyStatus *

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Mounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:34:37 PM Healthy
      SectorTest\MBX-3 Healthy 0 0 3/19/2013 1:34:37 PM Healthy

    3. Dismount the database.

      CAUTION: Dismounting the database will cause a client interruption, which will continue until the database is mounted.

      [PS] C:\>Dismount-Database SectorTest

      Confirm
      Are you sure you want to perform this action?
      Dismounting database "SectorTest". This may result in reduced availability for mailboxes in the database.
      [Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y
      [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*
      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Dismounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
      SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy

    4. On each DAG member hosting a database copy, open a command prompt and navigate to the log file directory. Execute eseutil /r ENN to perform a soft recovery. This step is necessary to ensure that all log files are played into all copies.

      Z:\SectorTest>eseutil /r e01

      Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
      Version 14.02
      Copyright (C) Microsoft Corporation. All Rights Reserved.
      Initiating RECOVERY mode...
          Logfile base name: e01
                  Log files: <current directory>
               System files: <current directory>
      Performing soft recovery...
                            Restore Status (% complete) 
                0    10   20   30   40   50   60   70   80   90  100
                |----|----|----|----|----|----|----|----|----|----|
                ...................................................
      Operation completed successfully in 0.203 seconds.

    5. On each DAG member hosting a database copy open a command prompt and navigate to the database directory. Execute eseutil /mh <EDB> against the database to dump the header. You must validate that the following information is correct on all database copies:

      • All copies of the database show in clean shutdown.
      • All copies of the database show the same last detach information.
      • All copies of the database show the same last consistent information.

      Here is example output of a full /mh dump followed by a comparison of the data across our three sample copies.

      Z:\SectorTest>eseutil /mh SectorTest.edb

      Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
      Version 14.02
      Copyright (C) Microsoft Corporation. All Rights Reserved.
      Initiating FILE DUMP mode...
               Database: SectorTest.edb
      DATABASE HEADER:
      Checksum Information:
      Expected Checksum: 0x010f4400
        Actual Checksum: 0x010f4400
      Fields:
              File Type: Database
               Checksum: 0x10f4400
         Format ulMagic: 0x89abcdef
         Engine ulMagic: 0x89abcdef
      Format ulVersion: 0x620,17
      Engine ulVersion: 0x620,17
      Created ulVersion: 0x620,17
           DB Signature: Create time:03/19/2013 09:40:15 Rand:11009066 Computer:
               cbDbPage: 32768
                 dbtime: 601018 (0x92bba)
      State: Clean Shutdown
           Log Required: 0-0 (0x0-0x0)
          Log Committed: 0-0 (0x0-0x0)
         Log Recovering: 0 (0x0)
        GenMax Creation: 00/00/1900 00:00:00
               Shadowed: Yes
             Last Objid: 3350
           Scrub Dbtime: 0 (0x0)
             Scrub Date: 00/00/1900 00:00:00
           Repair Count: 0
            Repair Date: 00/00/1900 00:00:00
      Old Repair Count: 0
      Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:11
            Last Attach: (0x111,9,86)  03/19/2013 13:42:29
      Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:11
                   Dbid: 1
          Log Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer:
             OS Version: (6.1.7601 SP 1 NLS ffffffff.ffffffff)

      Previous Full Backup:
              Log Gen: 0-0 (0x0-0x0)
                 Mark: (0x0,0,0)
                 Mark: 00/00/1900 00:00:00

      Previous Incremental Backup:
              Log Gen: 0-0 (0x0-0x0)
                 Mark: (0x0,0,0)
                 Mark: 00/00/1900 00:00:00

      Previous Copy Backup:
              Log Gen: 0-0 (0x0-0x0)
                 Mark: (0x0,0,0)
                 Mark: 00/00/1900 00:00:00

      Previous Differential Backup:
              Log Gen: 0-0 (0x0-0x0)
                 Mark: (0x0,0,0)
                 Mark: 00/00/1900 00:00:00

      Current Full Backup:
              Log Gen: 0-0 (0x0-0x0)
                 Mark: (0x0,0,0)
                 Mark: 00/00/1900 00:00:00

      Current Shadow copy backup:
              Log Gen: 0-0 (0x0-0x0)
                 Mark: (0x0,0,0)
                 Mark: 00/00/1900 00:00:00 

           cpgUpgrade55Format: 0
          cpgUpgradeFreePages: 0
      cpgUpgradeSpaceMapPages: 0 

             ECC Fix Success Count: none
         Old ECC Fix Success Count: none
               ECC Fix Error Count: none
           Old ECC Fix Error Count: none
          Bad Checksum Error Count: none
      Old bad Checksum Error Count: none 

        Last checksum finish Date: 03/19/2013 13:11:36
      Current checksum start Date: 00/00/1900 00:00:00
            Current checksum page: 0

      Operation completed successfully in 0.47 seconds.

      MBX-1:

      State: Clean Shutdown
      Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:11
      Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:11

      MBX-2:

      State: Clean Shutdown
      Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:12
      Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:12

      MBX-3:

      State: Clean Shutdown
      Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:13
      Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:13

      In this case, the values match across all copies so further steps can be performed.

      If the values do not match across copies for any reason, do not continue and please contact Microsoft support.

    6. Reset the log file generation for the database.

      Note: Use Get-MailboxDatabaseCopyStatus to record database locations and status prior to performing this activity.

      Locate the log file directory for each ACTIVE (DISMOUNTED) database. Remove all log files from this directory first. Failure to remove log files from the ACTIVE (DISMOUNTED) database may result in the Replication service recopying log files, a failure of this procedure, and subsequent need to reseed all database copies.

      IMPORTANT: If log files are located in the same location as the database and catalog data folder, take precautions to not remove the database or the catalog data folder.

      In our example MBX-1 hosts the ACTIVE (DISMOUNTED) copy.

      [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Dismounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
      SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy

      Locate the log file directory for each PASSIVE database. Remove all log files from this directory. Failure to remove all log files could result in this procedure failing, and the need to reseed this or all database copies. If log files are located in the same location as the database and catalog data folder take precautions to not remove the database or the catalog data folder.

      In our example MBX-2 and MBX-3 host the passive database copies.

      [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Dismounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
      SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy

    7. Mount the database using Mount-Database <DBNAME>, and verify it has mounted.

      [PS] C:\>Mount-Database SectorTest
      [PS] C:\>Get-MailboxDatabaseCopyStatus *

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Mounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 1 3/25/2013 5:57:28 AM Healthy
      SectorTest\MBX-3 Healthy 0 1 3/25/2013 5:57:28 AM Healthy

    8. Suspend and resume all passive database copies.

      Note: The error on suspending the active database copy is expected.

      [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Suspend-MailboxDatabaseCopy

      The suspend operation can't proceed because database 'SectorTest' on Exchange Mailbox server 'MBX-1' is the active mailbox database copy.
          + CategoryInfo          : InvalidOperation: (SectorTest\MBX-1:DatabaseCopyIdParameter) [Suspend-MailboxDatabaseCopy], InvalidOperationException
          + FullyQualifiedErrorId : 5083D28B,Microsoft.Exchange.Management.SystemConfigurationTasks.SuspendDatabaseCopy
          + PSComputerName        : mbx-1.exchange.msft

      Note: The error on resuming the active database copy is expected.

      [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Resume-MailboxDatabaseCopy

      WARNING: The Resume operation won't have an effect on database replication because database 'SectorTest' hosted on server 'MBX-1' is the active mailbox database.

    9. Validate replication health.

      [PS] C:\>Get-MailboxDatabaseCopyStatus *

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Mounted 0 0 Healthy
      SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:56:12 PM Healthy
      SectorTest\MBX-3 Healthy 0 0 3/19/2013 1:56:12 PM Healthy

    10. Using Set-MailboxDatabaseCopy, reconfigure any replay lag or truncation lag time on the database copy. This example implements a 7 day replay lag time.

      set-mailboxdatabasecopy –identity SectorTest\MBX-3 –replayLagTime 7.0:0:0

    11. Repeat the previous steps for all databases in the DAG including those databases that have a single copy.

      IMPORTANT: DO NOT proceed to the next step until all databases have been reset.

    12. Enable block mode replication. Using registry editor navigate to HKLM \Software\Microsoft\ExchangeServer \V14 \Replay, and then remove the DisableGranularReplication DWORD value.

    13. Restart the replication service on each DAG member.

      Restart-Service MSExchangeREPL

    14. Validate database health using Get-MailboxDatabaseCopyStatus.

      [PS] C:\>Get-MailboxDatabaseCopyStatus *

      Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
      ---- ------ --------- ----------- -------------------- ------------
      SectorTest\MBX-1 Healthy 0 0 3/19/2013 2:25:56 PM Healthy
      SectorTest\MBX-2 Mounted 0 0 Healthy
      SectorTest\MBX-3 Healthy 0 230 3/19/2013 2:25:56 PM Healthy

    15. Dump the header of a log file and verify that the new sector size is reflected in the log file stream. To do this, open a command prompt and navigate to the log file directory for the database on the active node. Run eseutil /ml against any log within the directory, and verify that the sector size reflects 4096 and (matches).

      Z:\SectorTest>eseutil /ml E0100000001.log

      Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
      Version 14.02
      Copyright (C) Microsoft Corporation. All Rights Reserved.

      Initiating FILE DUMP mode... 
            Base name: E01
            Log file: E0100000001.log
            lGeneration: 1 (0x1)
            Checkpoint: (0x17B,FFFF,FFFF)
            creation time: 03/19/2013 13:56:11
            prev gen time: 00/00/1900 00:00:00
            Format LGVersion: (7.3704.16.2)
            Engine LGVersion: (7.3704.16.2)
            Signature: Create time:03/19/2013 13:56:11 Rand:2996669 Computer:
            Env SystemPath: z:\SectorTest\
            Env LogFilePath: z:\SectorTest\
           Env Log Sec size: 4096 (matches)
            Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
                (    off,   1227,  61350,  16384,  61350,   2048,    256,  44204)
            Using Reserved Log File: false
            Circular Logging Flag (current file): off
            Circular Logging Flag (past files): off
            Checkpoint at log creation time: (0x1,1,0) 
            Last Lgpos: (0x1,2,0)
      Number of database page references:  0
      Integrity check passed for log file: E0100000001.log
      Operation completed successfully in 0.250 seconds.

    If the above steps have been completed successfully, and the log file sequence recognizes a 4096 sector size, then this issue has been resolved.

    This guidance was validated in the following configurations:

    • Windows 2008 R2 Enterprise with Exchange 2010 Service Pack 2
    • Windows 2008 R2 Enterprise with Exchange 2010 Service Pack 3
    • Windows 2008 SP2 Enterprise with Exchange 2010 Service Pack 3
    • Windows 2012 Datacenter with Exchange 2010 Service Pack 3

    Tim McMichael

  • Updated: Exchange Server 2013 Deployment Assistant

    We’re happy to announce updates to the Exchange Server 2013 Deployment Assistant!

    We’ve updated the Deployment Assistant to include the following new scenarios:

    • Upgrading from Exchange 2007 to Exchange 2013
    • Upgrading from Exchange 2010 to Exchange 2013
    • Configuring an Exchange 2013-based hybrid deployment for Exchange 2007 organizations

    These new scenarios provide step-by-step guidance about how to upgrade your existing Exchange 2007 or Exchange 2010 organizations to benefit from the improvements and new features of Exchange 2013. Plus, Exchange 2007 organizations can now configure a hybrid deployment with Office 365 using Exchange 2013 instead of Exchange 2010 SP3 in their on-premises organization.

    And, there’s more on the way! We’re also working hard on additional scenarios, such as upgrading from a mixed Exchange Server 2007/2010 organization to Exchange 2013 and configuring Exchange 2013-based hybrid for Exchange 2010 organizations. Keep checking back here for release announcements.

    In case you're not familiar with it, the Exchange Server 2013 Deployment Assistant is a web-based tool that helps you deploy Exchange 2013 in your on-premises organization, configure a hybrid deployment between your on-premises organization and Office 365, or migrate to Office 365. The tool asks you a small set of simple questions and then, based on your answers, creates a customized checklist with instructions to deploy or configure Exchange 2013. Instead of trying to find what you need in the Exchange library, the Deployment Assistant gives you exactly the right information you need to complete your task. Supported on most major browsers, the Deployment Assistant is your one-stop shop for deploying Exchange 2013.

    The updated Exchange 2013 Deployment Assistant
    Figure 1: The updated Exchange 2013 Deployment Assistant (large screenshot)

    And for those organizations that still need to deploy Exchange 2010 or are interested in configuring an Exchange 2010-based hybrid deployment with Office 365, you can continue to access the Exchange Server 2010 Deployment Assistant at http://technet.microsoft.com/exdeploy2010 (short URL: aka.ms/eda2010).

    Do you have a deployment success story about the Deployment Assistant? Do you have suggestions on how to improve the tool? We would love your feedback and comments! Feel free to leave a comment here, or send an email to edafdbk@microsoft.com directly or via the 'Feedback' link located in the header of every page of the Deployment Assistant.

    Happy deploying!

    The Deployment Assistant Team

  • Troubleshooting Rapid Growth in Databases and Transaction Log Files in Exchange Server 2007 and 2010

    A few years back, a very detailed blog post was released on Troubleshooting Exchange 2007 Store Log/Database growth issues.

    We wanted to revisit this topic with Exchange 2010 in mind. While the troubleshooting steps needed are virtually the same, we thought it would be useful to condense the steps a bit, make a few updates and provide links to a few newer KB articles.

    The below list of steps is a walkthrough of an approach that would likely be used when calling Microsoft Support for assistance with this issue. It also provides some insight as to what we are looking for and why. It is not a complete list of every possible troubleshooting step, as some causes are simply not seen quite as much as others.

    Another thing to note is that the steps are commonly used when we are seeing “rapid” growth, or unexpected growth in the database file on disk, or the amount of transaction logs getting generated. An example of this is when an Administrator notes a transaction log file drive is close to running out of space, but had several GB free the day before. When looking through historical records kept, the Administrator notes that approx. 2 to 3 GBs of logs have been backed up daily for several months, but we are currently generating 2 to 3 GBs of logs per hour. This is obviously a red flag for the log creation rate. Same principle applies with the database in scenarios where the rapid log growth is associated to new content creation.

    In other cases, the database size or transaction log file quantity may increase, but signal other indicators of things going on with the server. For example, if backups have been failing for a few days and the log files are not getting purged, the log file disk will start to fill up and appear to have more logs than usual. In this example, the cause wouldn’t necessarily be rapid log growth, but an indicator that the backups which are responsible for purging the logs are failing and must be resolved. Another example is with the database, where retention settings have been modified or online maintenance has not been completing, therefore, the database will begin to grow on disk and eat up free space. These scenarios and a few others are also discussed in the “Proactive monitoring and mitigation efforts” section of the previously published blog.

    It should be noted that in some cases, you may run into a scenario where the database size is expanding rapidly, but you do not experience log growth at a rapid rate. (As with new content creation in rapid log growth, we would expect the database to grow at a rapid rate with the transaction logs.) This is often referred to as database “bloat” or database “space leak”. The steps to troubleshoot this specific issue can be a little more invasive as you can see in some analysis steps listed here (taking databases offline, various kinds of dumps, etc.), and it may be better to utilize support for assistance if a reason for the growth cannot be found.

    Once you have established that the rate of growth for the database and transaction log files is abnormal, we would begin troubleshooting the issue by doing the following steps. Note that in some cases the steps can be done out of order, but the below provides general suggested guidance based on our experiences in support.

    Step 1

    Use Exchange User Monitor (Exmon) server side to determine if a specific user is causing the log growth problems.

    1. Sort on CPU (%) and look at the top 5 users that are consuming the most amount of CPU inside the Store process. Check the Log Bytes column to verify for this log growth for a potential user.
    2. If that does not show a possible user, sort on the Log Bytes column to look for any possible users that could be attributing to the log growth

    If it appears that the user in Exmon is a ?, then this is representative of a HUB/Transport related problem generating the logs. Query the message tracking logs using the Message Tracking Log tool in the Exchange Management Consoles Toolbox to check for any large messages that might be running through the system. See #15 for a PowerShell script to accomplish the same task.

    Step 2

    With Exchange 2007 Service Pack 2 Rollup Update 2 and higher, you can use KB972705 to troubleshoot abnormal database or log growth by adding the described registry values. The registry values will monitor RPC activity and log an event if the thresholds are exceeded, with details about the event and the user that caused it. (These registry values are not currently available in Exchange Server 2010)

    Check for any excessive ExCDO warning events related to appointments in the application log on the server. (Examples are 8230 or 8264 events). If recurrence meeting events are found, then try to regenerate calendar data server side via a process called POOF.  See http://blogs.msdn.com/stephen_griffin/archive/2007/02/21/poof-your-calender-really.aspx for more information on what this is.

    Event Type: Warning
    Event Source: EXCDO
    Event Category: General
    Event ID: 8230
    Description: An inconsistency was detected in username@domain.com: /Calendar/<calendar item> .EML. The calendar is being repaired. If other errors occur with this calendar, please view the calendar using Microsoft Outlook Web Access. If a problem persists, please recreate the calendar or the containing mailbox.

    Event Type: Warning
    Event ID : 8264
    Category : General
    Source : EXCDO
    Type : Warning
    Message : The recurring appointment expansion in mailbox <someone's address> has taken too long. The free/busy information for this calendar may be inaccurate. This may be the result of many very old recurring appointments. To correct this, please remove them or change their start date to a more recent date.

    Important: If 8230 events are consistently seen on an Exchange server, have the user delete/recreate that appointment to remove any corruption

    Step 3

    Collect and parse the IIS log files from the CAS servers used by the affected Mailbox Server. You can use Log Parser Studio to easily parse IIS log files. In here, you can look for repeated user account sync attempts and suspicious activity. For example, a user with an abnormally high number of sync attempts and errors would be a red flag. If a user is found and suspected to be a cause for the growth, you can follow the suggestions given in steps 5 and 6.

    Once Log Parser Studio is launched, you will see convenient tabs to search per protocol:

    image

    Some example queries for this issue would be:

    image

    Step 4

    If a suspected user is found via Exmon, the event logs, KB972705, or parsing the IIS log files, then do one of the following:

    • Disable MAPI access to the users mailbox using the following steps (Recommended):
      • Run

        Set-Casmailbox –Identity <Username> –MapiEnabled $False

      • Move the mailbox to another Mailbox Store. Note: This is necessary to disconnect the user from the store due to the Store Mailbox and DSAccess caches. Otherwise you could potentially be waiting for over 2 hours and 15 minutes for this setting to take effect. Moving the mailbox effectively kills the users MAPI session to the server and after the move, the users access to the store via a MAPI enabled client will be disabled.
    • Disable the users AD account temporarily
    • Kill their TCP connection with TCPView
    • Call the client to have them close Outlook or turn of their mobile device in the condition state for immediate relief.

    Step 5

    If closing the client/devices, or killing their sessions seems to stop the log growth issue, then we need to do the following to see if this is OST or Outlook profile related:

    Have the user launch Outlook while holding down the control key which will prompt if you would like to run Outlook in safe mode. If launching Outlook in safe mode resolves the log growth issue, then concentrate on what add-ins could be attributing to this problem.

    For a mobile device, consider a full resync or a new sync profile. Also check for any messages in the drafts folder or outbox on the device. A corrupted meeting or calendar entry is commonly found to be causing the issue with the device as well.

    If you can gain access to the users machine, then do one of the following:

    1. Launch Outlook to confirm the log file growth issue on the server.

    2. If log growth is confirmed, do one of the following:

    • Check users Outbox for any messages.
      • If user is running in Cached mode, set the Outlook client to Work Offline. Doing this will help stop the message being sent in the outbox and sometimes causes the message to NDR.
      • If user is running in Online Mode, then try moving the message to another folder to prevent Outlook or the HUB server from processing the message.
      • After each one of the steps above, check the Exchange server to see if log growth has ceased
    • Call Microsoft Product Support to enable debug logging of the Outlook client to determine possible root cause.

    3. Follow the Running Process Explorer instructions in the below article to dump out dlls that are running within the Outlook Process. Name the file username.txt. This helps check for any 3rd party Outlook Add-ins that may be causing the excessive log growth.
    970920  Using Process Explorer to List dlls Running Under the Outlook.exe Process
    http://support.microsoft.com/kb/970920

    4. Check the Sync Issues folder for any errors that might be occurring

    Let’s attempt to narrow this down further to see if the problem is truly in the OST or something possibly Outlook Profile related:

    • Run ScanPST against the users OST file to check for possible corruption.
    • With the Outlook client shut down, rename the users OST file to something else and then launch Outlook to recreate a new OST file. If the problem does not occur, we know the problem is within the OST itself.

    If renaming the OST causes the problem to recur again, then recreate the users profile to see if this might be profile related.

    Step 6

    Ask Questions:

    • Is the user using any type of devices besides a mobile device?
    • Question the end user if at all possible to understand what they might have been doing at the time the problem started occurring. It’s possible that a user imported a lot of data from a PST file which could cause log growth server side or there was some other erratic behavior that they were seeing based on a user action.

    Step 7

    Check to ensure File Level Antivirus exclusions are set correctly for both files and processes per http://technet.microsoft.com/en-us/library/bb332342(v=exchg.141).aspx

    Step 8

    If Exmon and the above methods do not provide the data that is necessary to get root cause, then collect a portion of Store transaction log files (100 would be a good start) during the problem period and parse them following the directions in http://blogs.msdn.com/scottos/archive/2007/11/07/remix-using-powershell-to-parse-ese-transaction-logs.aspx to look for possible patterns such as high pattern counts for IPM.Appointment. This will give you a high level overview if something is looping or a high rate of messages being sent. Note: This tool may or may not provide any benefit depending on the data that is stored in the log files, but sometimes will show data that is MIME encoded that will help with your investigation

    Step 9

    If nothing is found by parsing the transaction log files, we can check for a rogue, corrupted, and large message in transit:

    1. Check current queues against all HUB Transport Servers for stuck or queued messages:

    get-exchangeserver | where {$_.IsHubTransportServer -eq "true"} | Get-Queue | where {$_.Deliverytype –eq “MapiDelivery”} | Select-Object Identity, NextHopDomain, Status, MessageCount | export-csv  HubQueues.csv

    Review queues for any that are in retry or have a lot of messages queued:

    Export out message sizes in MB in all Hub Transport queues to see if any large messages are being sent through the queues:

    get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,@{Name="Message Size MB";expression={$_.size.toMB()}} | sort-object -property size –descending | export-csv HubMessages.csv

    Export out message sizes in Bytes in all Hub Transport queues:

    get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,size | sort-object -property size –descending | export-csv HubMessages.csv

    2. Check Users Outbox for any large, looping, or stranded messages that might be affecting overall Log Growth.

    get-mailbox -ResultSize Unlimited| Get-MailboxFolderStatistics -folderscope Outbox | Sort-Object Foldersize -Descending | select-object identity,name,foldertype,itemsinfolder,@{Name="FolderSize MB";expression={$_.folderSize.toMB()}} | export-csv OutboxItems.csv

    Note: This does not get information for users that are running in cached mode.

    Step 10

    Utilize the MSExchangeIS Client\Jet Log Record Bytes/sec and MSExchangeIS Client\RPC Operations/sec Perfmon counters to see if there is a particular client protocol that may be generating excessive logs. If a particular protocol mechanism if found to be higher than other protocols for a sustained period of time, then possibly shut down the service hosting the protocol. For example, if Exchange Outlook Web Access is the protocol generating potential log growth, then stopping the World Wide Web Service (W3SVC) to confirm that log growth stops. If log growth stops, then collecting IIS logs from the CAS/MBX Exchange servers involved will help provide insight in to what action the user was performing that was causing this occur.

    Step 11

    Run the following command from the Management shell to export out current user operation rates:

    To export to CSV File:

    get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount,
    progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| export-csv LogonStats.csv

    To view realtime data:

    get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount,
    progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| ft

    Key things to look for:

    In the below example, the Administrator account was storming the testuser account with email.
    You will notice that there are 2 users that are active here, one is the Administrator submitting all of the messages and then you will notice that the Windows2000Account references a HUB server referencing an Identity of testuser. The HUB server also has *no* UserName either, so that is a giveaway right there. This can give you a better understanding of what parties are involved in these high rates of operations

    UserName : Administrator
    Windows2000Account : DOMAIN\Administrator
    Identity : /o=First Organization/ou=First Administrative Group/cn=Recipients/cn=Administrator
    MessagingOperationCount : 1724
    OtherOperationCount : 384
    ProgressOperationCount : 0
    StreamOperationCount : 0
    TableOperationCount : 576
    TotalOperationCount : 2684

    UserName :
    Windows2000Account : DOMAIN\E12-HUB$
    Identity : /o= First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=testuser
    MessagingOperationCount : 630
    OtherOperationCount : 361
    ProgressOperationCount : 0
    StreamOperationCount : 0
    TableOperationCount : 0
    TotalOperationCount : 1091

    Step 12

    Enable Perfmon/Perfwiz logging on the server. Collect data through the problem times and then review for any irregular activities. You can reference Perfwiz for Exchange 2007/2010 data collection here http://blogs.technet.com/b/mikelag/archive/2010/07/09/exchange-2007-2010-performance-data-collection-script.aspx

    Step 13

    Run ExTRA (Exchange Troubleshooting Assistant) via the Toolbox in the Exchange Management Console to look for any possible Functions (via FCL Logging) that may be consuming Excessive times within the store process. This needs to be launched during the problem period. http://blogs.technet.com/mikelag/archive/2008/08/21/using-extra-to-find-long-running-transactions-inside-store.aspx shows how to use FCL logging only, but it would be best to include Perfmon, Exmon, and FCL logging via this tool to capture the most amount of data. The steps shown are valid for Exchange 2007 & Exchange 2010.

    Step 14

    Export out Message tracking log data from affected MBX server.

    Method 1

    Download the ExLogGrowthCollector script and place it on the MBX server that experienced the issue. Run ExLogGrowthCollector.ps1 from the Exchange Management Shell. Enter in the MBX server name that you would like to trace, the Start and End times and click on the Collect Logs button.

    image

    Note: What this script does is to export out all mail traffic to/from the specified mailbox server across all HUB servers between the times specified. This helps provide insight in to any large or looping messages that might have been sent that could have caused the log growth issue.

    Method 2

    Copy/Paste the following data in to notepad, save as msgtrackexport.ps1 and then run this on the affected Mailbox Server. Open in Excel for review. This is similar to the GUI version, but requires manual editing to get it to work.

    #Export Tracking Log data from affected server specifying Start/End Times
    Write-host "Script to export out Mailbox Tracking Log Information"
    Write-Host "#####################################################"
    Write-Host
    $server = Read-Host "Enter Mailbox server Name"
    $start = Read-host "Enter start date and time in the format of MM/DD/YYYY hh:mmAM"
    $end = Read-host "Enter send date and time in the format of MM/DD/YYYY hh:mmPM"
    $fqdn = $(get-exchangeserver $server).fqdn
    Write-Host "Writing data out to csv file..... "
    Get-ExchangeServer | where {$_.IsHubTransportServer -eq "True" -or $_.name -eq "$server"} | Get-MessageTrackingLog -ResultSize Unlimited -Start $start -End $end  | where {$_.ServerHostname -eq $server -or $_.clienthostname -eq $server -or $_.clienthostname -eq $fqdn} | sort-object totalbytes -Descending | export-csv MsgTrack.csv -NoType
    Write-Host "Completed!! You can now open the MsgTrack.csv file in Excel for review"

    Method 3

    You can also use the Process Tracking Log Tool at http://blogs.technet.com/b/exchange/archive/2011/10/21/updated-process-tracking-log-ptl-tool-for-use-with-exchange-2007-and-exchange-2010.aspx to provide some very useful reports.

    Step 15

    Save off a copy of the application/system logs from the affected server and review them for any events that could attribute to this problem.

    Step 16

    Enable IIS extended logging for CAS and MB server roles to add the sc-bytes and cs-bytes fields to track large messages being sent via IIS protocols and to also track usage patterns (Additional Details).

    Step 17

    Get a process dump the store process during the time of the log growth. (Use this as a last measure once all prior activities have been exhausted and prior to calling Microsoft for assistance. These issues are sometimes intermittent, and the quicker you can obtain any data from the server, the better as this will help provide Microsoft with information on what the underlying cause might be.)

    • Download the latest version of Procdump from http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx and extract it to a directory on the Exchange server
    • Open the command prompt and change in to the directory which procdump was extracted in the previous step.
    • Type

      procdump -mp -s 120 -n 2 store.exe d:\DebugData

      This will dump the data to D:\DebugData. Change this to whatever directory has enough space to dump the entire store.exe process twice. Check Task Manager for the store.exe process and how much memory it is currently consuming for a rough estimate of the amount of space that is needed to dump the entire store dump process.
      Important: If procdump is being run against a store that is on a clustered server, then you need to make sure that you set the Exchange Information Store resource to not affect the group. If the entire store dump cannot be written out in 300 seconds, the cluster service will kill the store service ruining any chances of collecting the appropriate data on the server.

    Open a case with Microsoft Product Support Services to get this data looked at.

    Most current related KB articles

    2814847 - Rapid growth in transaction logs, CPU use, and memory consumption in Exchange Server 2010 when a user syncs a mailbox by using an iOS 6.1 or 6.1.1-based device

    2621266 - An Exchange Server 2010 database store grows unexpectedly large

    996191 - Troubleshooting Fast Growing Transaction Logs on Microsoft Exchange 2000 Server and Exchange Server 2003

    Kevin Carker
    (based on a blog post written by Mike Lagase)