This is a part 1 of a 3 part blog post series. Please go here for Part 2 and here for Part 3.

One of the first actions most Exchange Administrators generally take when troubleshooting suspected problems with Exchange Content Indexing will be to rebuild the impacted Mailbox Database's content index files (either manually or by using the ResetSearchIndex.ps1 script found in the \Exchange Server\Scripts directory). I've also worked with many Exchange administrators over the years who choose to proactively rebuild search indexes at various points during the calendar year or as various milestones within a project are met (say in a migration project; as a single example).

Irrespective of the rationale for resetting the indexes, most administrators, if and when asked, will be unable to provide realistic estimates for how long this process will take from start to finish. The undesired consequences of not estimating these times accurately will differ from organization to organization. Some IT departments may discourage not having server side indexes available to users during the business day citing losses in end-user productivity and upticks in escalated issues hitting the Help Desk. From an operational perspective, not having knowledge of the anticipated rebuild times might also prevent Exchange administrators from being alerted to potential problems within the rebuild process itself. Whatever your rationale, having a sound understanding of how long the process might take is valuable.

Admittedly there's very little information about the amount of time it takes (or better yet, the time it should take) to rebuild an Exchange content index available today. Ostensibly, this is because the actual rebuild times are always variable. There are many factors that influence the rebuild rates and time to complete. Most notably:

  • Variability in the total number of end-user mailboxes “homed” on an Exchange Mailbox Database
  • Variability in the size of mailboxes contained on an Exchange Mailbox Database
  • Variability in item counts between user mailboxes homed on an Exchange Mailbox Database
  • Variability in item counts between Exchange Mailbox Databases (when performing concurrent rebuilds)
  • Variability in the size of items residing within an Exchange Mailbox Database
  • Variability in number and size of mail attachments residing within an Exchange Mailbox Database
  • The types and number of enabled IFilters on an Exchange Mailbox Server (allows for the indexing of various file formats)
  • The overall system resource utilization of a Mailbox Server performing crawl (think throttling)
  • Many more…

At Microsoft (both in our corporate implementation as well as in various Office-365 offerings), we utilize a Search Rebuild Framework that was developed by my colleague Anatoly Girko and I. This framework was originally designed to provide our internal operations staff with a set of comprehensive validation steps and progress indicators they could leverage when performing rebuilds of content indexes. These techniques are utilized at various key milestones within the overall rebuild process to ensure successful completion.

As the framework evolved, we decided to add additional functionality that would allow us to track and store historic throughput metrics for any and all rebuild operations. As these data collections grew, and as the subsequent trending data came to light, we found we were able to make significantly more informed and more accurate estimates for how long a given rebuild operation might take. This, in turn, allowed us as an operational team to make better decisions on when to schedule rebuilds so that we could minimize disruption to our end user customer base. Since inception, this framework has been utilized to oversee rebuild operations for several thousand content indexes within the various environments we support.

In a series of articles, we will discuss our “Rebuild Framework” so that interested parties might apply a similar methodology to their own environments should such a need arise. Each stage of the framework will be detailed including discussions on the various toolsets we have authored to assist with this process. This series will conclude with a series of graphs and tables that detail our observed content indexing rebuild statistics and conclusions to date. For customers who have not tracked statistics in this area previously, we hope this will serve as a valuable reference point. Presumably it should provide the ability for you to make more informed estimates concerning Exchange content index rebuild times in your own environments. That said, it's worth noting that because all Exchange environments are unique your rebuild metrics may differ dramatically from the rates we have observed and are presenting here.

Before diving in “head first” it’s also worth mentioning that this series is not intended as a troubleshooting guide. Our expectation is that your own troubleshooting has led you to a decision to perform rebuild(s) either in response to a problem or as a proactive measure. All examples presented within this series will focus on Exchange 2007. I made the decision to concentrate on 2007 for this post because the likelihood of rebuilding 2007 indexes are significantly higher when contrasted with 2010 (unlike 2007, 2010 has the ability to reseed Content Indexes from Healthy redundant sources thus making the need to perform complete index rebuilds much more rare in multi-copy architectures). In the coming weeks Anatoly and I will release a supplemental post that provides the script reference for the Exchange 2010 version of the Content Index Rebuild Analyzer script as we accumulate corresponding examples for its usage.

Index Rebuild Analyzer Script

Internally at Microsoft the primary toolset we leverage when rebuilding Content Indexes is the IndexRebuildAnalyzer script. This script was authored by Anatoly and I specifically for establishing Content Index rebuild baselines. As previously noted, there are two versions of this script; an Exchange 2007 version; and an Exchange 2010 version. To calculate your statistics properly always use the script that corresponds to the Exchange Mailbox Database version whose index will be rebuilt. The IndexRebuildAnalyzer script generates two types of metrics depending on the mode of operation passed by the operator. Internally we refer to these two modes as “pre-rebuild metrics” and “post-rebuild metrics” (all properties documented within the Script Reference section below).

Although this script is primarily leveraged to track Content Index rebuild operations, Exchange Administrators could certainly utilize the script in “pre-mode” to obtain Point-In-Time (PIT) statistics for various mailbox centric purposes (e.g. “Number of Mailboxes”, “Number of Items in a Database”, “Average Message Size” for your entire organization, etc.). This might, for example, provide you with additional optics and capabilities for user trending should the tool be leveraged regularly depending on your own business requirements or needs.

The E2K7_IndexRebuildAnalyzer.ps1 script parameters as well as examples for usage can be obtained by passing the -Help parameter in the PowerShell session prior to script execution.

The following table outlines each parameter:

 

Parameter Required Description
-CMS</cluster1,cluster2> Required When the -CMS parameter is utilized statistics for all databases on an Exchange Mailbox Server or Exchange Clustered Mailbox Server will be calculated. Statistics for databases across multiple standalone Mailbox Servers or Clustered Mailbox Servers can be calculated by comma separating the Server names.
-Database<DatabaseName,DatabaseName> Required When utilizing the -Databaseparameter statistics for a specific Mailbox Database can be calculated. When using this parameter it is expected that the database name will be passed in the following format:

“MailboxServerName\StorageGroupName\DatabaseName”

Statistics for multiple databases can be calculated by comma separating the database names.

Databases that do not contain any active user mailboxes will not be processed.
-All Optional The use of the -All switch will calculate statistics for all Exchange Mailbox databases in the Exchange Organization.
-CSVFile Optional The use of the -CSVFile parameter will output all metrics to CSV file.
-PostRebuild Optional The -PostRebuild switch is used to distinguish between modes of script execution. Specifically when -PostRebuild is called the script will parse Application Event logs and attempt to calculate performance metrics for Index Rebuild operations.
-Help Optional Displays script Help.

Database Metrics Headers

Pre-Rebuild

As defined above the “mode of operation” of the script is determined by the presence or absence of the -PostRebuild switch. To obtain pre-rebuild metrics the -PostRebuild switch would not be utilized. When the script is instantiated in pre-mode the following Headers will be presented with corresponding metrics:

Header Description
Server Mailbox Server identity affiliated with processed database.
Database Display Name of Exchange Mailbox Database processed.
EDB Size (GB) Size of corresponding database file on disk in gigabytes.
EDB Size (MB) Size of corresponding database file on disk in megabytes.
Mailbox Count Active Exchange Mailbox count for processed database. Disconnected Mailboxes are not processed.
Database: Total Items Total Number of mail items present in an Exchange Mailbox Database.
Database: Total Item Size (MB) Total size of all mail items in megabytes present within the processed mailbox database.
Database: Average Message Size (MB) Average Message Size for all mail items present in processed database.
Per User: Average Mailbox Size (MB) Average Mailbox Size for Active mailboxes present on processed database.
Per User: Average Item Count Average mail item counts for Active mailboxes present on processed database.

Post-Rebuild (Utilizing -PostRebuild Parameter)

When the -PostRebuild switch is utilized IndexRebuildAnalyzer will attempt to calculate throughput metrics for Content Index rebuild operation(s). It does this by parsing the Application Event Log to obtain both the start (denoted by Event ID 109) and completion time of the rebuild (denoted by Event ID 110) for each Mailbox Database on the Mailbox Server. To calculate post-rebuild metrics successfully the complete Event ID pair must be present in Event Viewer for each Mailbox Database whose corresponding Content Index was reset. In situations where the Event ID pair is not available the script will be unable to calculate the statistics. In these situations the string “NoEventsFound” will be returned. The most common reasons why this string would be returned are:

  • The Content Index rebuild operation for a given or set of databases has not completed.
  • The Application Event Log wrapped or was cleared (best practice is to set the Maximum Log Size value to the highest sustainable value.
  • The Mailbox Databases reporting “NoEventsFound” did not recently have their Content Indexes reset (hence the absence of the Event ID pair from the Event Log). By leveraging the -CSVFile option and Excel these strings can be easily filtered out of the result set. I will address and provide examples for filtering in Step-5 of the framework.

All “pre-rebuild” headers and metrics are also calculated whenever the -PostRebuild switch is passed for script execution. Use of the -PostRebuild switch will include the addition of the following headers and metrics:

Header

Description

Content Index Rebuild Start Time

The Start Time of when the Search Indexer service began Full Crawl of the mailbox database.

Content Index Rebuild End Time

The Completion Time of when the Search Indexer service completed Full Crawl of the mailbox database

Total Rebuild Time: H:Min:Sec

Total time in Hours:Minutes:Seconds that were required for the Search Indexer service to complete Full Crawl of the mailbox database.

Total Rebuild Time: Min Total

Total time in Minutes that were required for the Search Indexer service to complete Full Crawl.

Total Rebuild Time: Sec Total

Total time in Seconds that were required for the Search Indexer service to complete Full Crawl.

Rebuild: Per Mailbox Average: Sec

Average Time in Seconds to complete Full Crawl per mailbox.

Rebuild: MB per/sec

Search Indexer Full Crawl throughput averages in MB/per second.

Rebuild: Items per/sec

Search Indexer Full Crawl throughput in Mail IItems per/second.

Conclusion

You can download the Exchange 2007 Index Rebuild Analyzer Script here.

In part 2 of this series, I will discuss the search rebuild framework and in part 3 of this series, I will discuss what we have seen to date within Microsoft.

Eric Norberg
Service Engineer
Office 365-Dedicated