Version Store issues revisited (again) - updates on data gathering techniques

If you have ever faced Version Store issue (JET_errVersionStoreOutOfMemory) on Exchange Server, you might (should) have read Nagesh’s excellent post “Troubleshooting Version Store issues - JET_errVersionStoreOutOfMemory” here  https://blogs.technet.com/b/exchange/archive/2006/04/19/425722.aspx

In this blog I will attempt to get you familiar with new tools/techniques we in CSS use for troubleshooting cases when we get a call for this issue.

“What is Version Store” and “what are basic troubleshooting steps”? These answers remain same as per Nagesh’s Blog. In this blog I will deal with differences when troubleshooting in Exchange 2003 and 2007. I will also list out in detail steps on how to gather relevant data when troubleshooting the issue. Due to key architectural changes in Exchange Server 2003 and Exchange Server 2007 the steps need to be explained separately.

For the next steps we will be using Sysinternals utility “Procdump” which can be downloaded from https://technet.microsoft.com/en-us/sysinternals/dd996900 Procdump has excellent features which allow a Dump to be created based on a Performance counter threshold.
We will also be using Perfwiz tool (for Exchange 2003) or script Experfwiz.ps1 (for Exchange 2007) to capture Performance data

Exchange Server 2003:
===================
1. Calculate the threshold value based on event 623 – for example we get 623 as below..

Event Type: Error
Event Source: ESE
Event Category: Transaction Manager
Event ID: 623
Description: The version store for this instance (0) has reached its maximum size of y MB(For purpose of this blog I will use 155Mb). It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back.

A version bucket in the Perfmon counter is in units of 64k in Exchange 2003.

The calculation is as follows: x/1024 *64 = y, where x is the number of version buckets allocated and y is the total Version Store memory. Now, we know that the maximum Version Store memory (i.e. y) is 155Mb and we can therefore work out the maximum number of version buckets allocated. x= (155*1024)/64 so we can see that this is 2480.

When we see the version buckets allocated reaching 70% of this maximum then we can say in all probability that we are experiencing a long running transaction and can therefore start to take dumps accordingly. 70% * 2480 = 1736 buckets

2. Download Perfwiz tool from https://www.microsoft.com/downloads/en/details.aspx?FamilyID=31fccd98-c3a1-4644-9622-faa046d69214 and run it as per instructions to start capturing Perf data

3. Download Procdump from https://technet.microsoft.com/en-us/sysinternals/dd996900 - extract to any folder example c:\procdump
Attach Procdump to Store.exe process with the threshold above
Syntax for Procdump in this case is..
c:\Procdump> procdump -ma store.exe -p "\Database(Information store)\Version buckets allocated" 1736 -s 20 -n 3 -accepteula c:\Procdump\store_623.dmp

Exchange Server 2007:
===================
There are three major changes in Exchange Server 2007 - Performance counter has changed, new tool to capture Perf data and change in Version bucket size unit

1. A version bucket size in the Perfmon counter is in units of 32k on 64-bit Exchange Server 2007
So using same Example of event ID 623 above..

Event Type: Error
Event Source: ESE
Event Category: Transaction Manager
Event ID: 623
Description: The version store for this instance (0) has reached its maximum size of y MB(For purpose of this blog I will use 155Mb). It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back.

The calculation is as follows: x/1024 *32 = y, where x is the number of version buckets allocated and y is the total Version Store memory. Now, we know that the maximum Version Store memory (i.e. y) is 155Mb and we can therefore work out the maximum number of version buckets allocated. x= (155*1024)/32 so we can see that this is 4960.

When we see the version buckets allocated reaching 70% of this maximum then we can say in all probability that we are experiencing a long running transaction and can therefore start to take dumps accordingly. 70% * 4960 = 3472 buckets

2. Download Experfwiz.ps1 script from https://archive.msdn.microsoft.com/ExPerfwiz and run it as per instructions to start capturing Perf data

3. The Performance counter for Version Buckets usage in Exchange Server 2007 has been changed from
In Exchange Server 2003: "\Database(Information store)\Version buckets allocated"
Changed in Exchange Server 2007 to: "\MSExchange Database(Information store)\Version buckets allocated"

So our Procdump Syntax will be.. (assuming you have downloaded and extracted procdump in folder c:\procdump)
c:\procdump>procdump -mp store.exe -p "\MSExchange Database(Information store)\Version buckets allocated" 3472 -s 20 -n 3 -accepteula c:\procdump\store_623.dmp

Procdump switches:
-p: Performance counter to monitor
-s: Consecutive seconds threshold must be hit before dump is written
-n: Number of dumps to write before exiting
-mp: Write a dump file with thread and handle information, and all read/write process memory. To minimize dump size, memory areas larger than 512MB are searched for, and if found, the largest area is excluded. A memory area is the collection of same sized memory allocation areas. The removal of this (cache) memory reduces Exchange and SQL Server dumps by over 90%.

Explanation: The arguments configure Procdump to generate Miniplus (-mp) dumps of the Store.exe process when the “Version Buckets” exceed specific calculated value for 30 seconds (-s 30), to generate up to 3 dumps (-n 3), to save the dumps in c:\procdump with names that begin with “store_623” and then exit.

Send in the dump file, application log and performance monitor log that were running when the dump was collected to CSS for further analysis.

Hope this helps..

-Sushil Sharma