This is a part 1 of a 3 part blog post series. Please go here for Part 2 and here for Part 3.
One of the first actions most Exchange Administrators generally take when troubleshooting suspected problems with Exchange Content Indexing will be to rebuild the impacted Mailbox Database's content index files (either manually or by using the ResetSearchIndex.ps1 script found in the \Exchange Server\Scripts directory). I've also worked with many Exchange administrators over the years who choose to proactively rebuild search indexes at various points during the calendar year or as various milestones within a project are met (say in a migration project; as a single example).
Irrespective of the rationale for resetting the indexes, most administrators, if and when asked, will be unable to provide realistic estimates for how long this process will take from start to finish. The undesired consequences of not estimating these times accurately will differ from organization to organization. Some IT departments may discourage not having server side indexes available to users during the business day citing losses in end-user productivity and upticks in escalated issues hitting the Help Desk. From an operational perspective, not having knowledge of the anticipated rebuild times might also prevent Exchange administrators from being alerted to potential problems within the rebuild process itself. Whatever your rationale, having a sound understanding of how long the process might take is valuable.
Admittedly there's very little information about the amount of time it takes (or better yet, the time it should take) to rebuild an Exchange content index available today. Ostensibly, this is because the actual rebuild times are always variable. There are many factors that influence the rebuild rates and time to complete. Most notably:
At Microsoft (both in our corporate implementation as well as in various Office-365 offerings), we utilize a Search Rebuild Framework that was developed by my colleague Anatoly Girko and I. This framework was originally designed to provide our internal operations staff with a set of comprehensive validation steps and progress indicators they could leverage when performing rebuilds of content indexes. These techniques are utilized at various key milestones within the overall rebuild process to ensure successful completion.
As the framework evolved, we decided to add additional functionality that would allow us to track and store historic throughput metrics for any and all rebuild operations. As these data collections grew, and as the subsequent trending data came to light, we found we were able to make significantly more informed and more accurate estimates for how long a given rebuild operation might take. This, in turn, allowed us as an operational team to make better decisions on when to schedule rebuilds so that we could minimize disruption to our end user customer base. Since inception, this framework has been utilized to oversee rebuild operations for several thousand content indexes within the various environments we support.
In a series of articles, we will discuss our “Rebuild Framework” so that interested parties might apply a similar methodology to their own environments should such a need arise. Each stage of the framework will be detailed including discussions on the various toolsets we have authored to assist with this process. This series will conclude with a series of graphs and tables that detail our observed content indexing rebuild statistics and conclusions to date. For customers who have not tracked statistics in this area previously, we hope this will serve as a valuable reference point. Presumably it should provide the ability for you to make more informed estimates concerning Exchange content index rebuild times in your own environments. That said, it's worth noting that because all Exchange environments are unique your rebuild metrics may differ dramatically from the rates we have observed and are presenting here.
Before diving in “head first” it’s also worth mentioning that this series is not intended as a troubleshooting guide. Our expectation is that your own troubleshooting has led you to a decision to perform rebuild(s) either in response to a problem or as a proactive measure. All examples presented within this series will focus on Exchange 2007. I made the decision to concentrate on 2007 for this post because the likelihood of rebuilding 2007 indexes are significantly higher when contrasted with 2010 (unlike 2007, 2010 has the ability to reseed Content Indexes from Healthy redundant sources thus making the need to perform complete index rebuilds much more rare in multi-copy architectures). In the coming weeks Anatoly and I will release a supplemental post that provides the script reference for the Exchange 2010 version of the Content Index Rebuild Analyzer script as we accumulate corresponding examples for its usage.
Internally at Microsoft the primary toolset we leverage when rebuilding Content Indexes is the IndexRebuildAnalyzer script. This script was authored by Anatoly and I specifically for establishing Content Index rebuild baselines. As previously noted, there are two versions of this script; an Exchange 2007 version; and an Exchange 2010 version. To calculate your statistics properly always use the script that corresponds to the Exchange Mailbox Database version whose index will be rebuilt. The IndexRebuildAnalyzer script generates two types of metrics depending on the mode of operation passed by the operator. Internally we refer to these two modes as “pre-rebuild metrics” and “post-rebuild metrics” (all properties documented within the Script Reference section below).
Although this script is primarily leveraged to track Content Index rebuild operations, Exchange Administrators could certainly utilize the script in “pre-mode” to obtain Point-In-Time (PIT) statistics for various mailbox centric purposes (e.g. “Number of Mailboxes”, “Number of Items in a Database”, “Average Message Size” for your entire organization, etc.). This might, for example, provide you with additional optics and capabilities for user trending should the tool be leveraged regularly depending on your own business requirements or needs.
The E2K7_IndexRebuildAnalyzer.ps1 script parameters as well as examples for usage can be obtained by passing the -Help parameter in the PowerShell session prior to script execution.
The following table outlines each parameter:
Statistics for multiple databases can be calculated by comma separating the database names.
As defined above the “mode of operation” of the script is determined by the presence or absence of the -PostRebuild switch. To obtain pre-rebuild metrics the -PostRebuild switch would not be utilized. When the script is instantiated in pre-mode the following Headers will be presented with corresponding metrics:
When the -PostRebuild switch is utilized IndexRebuildAnalyzer will attempt to calculate throughput metrics for Content Index rebuild operation(s). It does this by parsing the Application Event Log to obtain both the start (denoted by Event ID 109) and completion time of the rebuild (denoted by Event ID 110) for each Mailbox Database on the Mailbox Server. To calculate post-rebuild metrics successfully the complete Event ID pair must be present in Event Viewer for each Mailbox Database whose corresponding Content Index was reset. In situations where the Event ID pair is not available the script will be unable to calculate the statistics. In these situations the string “NoEventsFound” will be returned. The most common reasons why this string would be returned are:
All “pre-rebuild” headers and metrics are also calculated whenever the -PostRebuild switch is passed for script execution. Use of the -PostRebuild switch will include the addition of the following headers and metrics:
Content Index Rebuild Start Time
The Start Time of when the Search Indexer service began Full Crawl of the mailbox database.
Content Index Rebuild End Time
The Completion Time of when the Search Indexer service completed Full Crawl of the mailbox database
Total Rebuild Time: H:Min:Sec
Total time in Hours:Minutes:Seconds that were required for the Search Indexer service to complete Full Crawl of the mailbox database.
Total Rebuild Time: Min Total
Total time in Minutes that were required for the Search Indexer service to complete Full Crawl.
Total Rebuild Time: Sec Total
Total time in Seconds that were required for the Search Indexer service to complete Full Crawl.
Rebuild: Per Mailbox Average: Sec
Average Time in Seconds to complete Full Crawl per mailbox.
Rebuild: MB per/sec
Search Indexer Full Crawl throughput averages in MB/per second.
Rebuild: Items per/sec
Search Indexer Full Crawl throughput in Mail IItems per/second.
You can download the Exchange 2007 Index Rebuild Analyzer Script here.
In part 2 of this series, I will discuss the search rebuild framework and in part 3 of this series, I will discuss what we have seen to date within Microsoft.
Eric Norberg Service Engineer Office 365-Dedicated