I wanted to write a post regarding a lot of work that has gone into updating the Exchange Server 2007 PAL xml threshold files in order to make them more relevant and to more accurately report on Exchange Performance problems. This update couldn’t have been done without the help from Chris Antonakis who was one of the major contributors to all of these updates. Kudos to Chris on this one.
There are some major updates that you need to be aware of when running the PAL tool to analyze Exchange performance problems and the Mailbox Role was the biggest change on how to look at things.
Shown below is the selection for the Mailbox Role threshold file which includes a few new questions. These questions will help break down performance problems specific to database drives, log file drives and pagefile drives in the resultant report. Previously, this was an all encompassing generic analysis which didn’t really give you the full picture of actual bottlenecks as there are latency differences between the database and log file drives.
Adding Database Drive letters is quite easy, and gathering the data for this input can be collected from various areas such as ExBPA and in the BLG file itself. These drive letters could also include Volume Mount Points.
If you know the drive letters already, then that is great. Let’s say your database drives were on Drive E:, Drive F:, and Drive G:, you would need to enter them separated by a semicolon such as E:;F:;G: as shown in the screenshot above. You would also need to do this for the Log File Drives and the Page File Drives for a more accurate analysiss
Using an ExBPA report of the server and the Tree Report view would be the best way to get the drive letter and volume mount point information, but sometimes a BLG file may provide enough information regarding volume mount points based on the naming convention that was used (keep in mind though that although the volume mount point is named “<Drive Letter:>\Logs” it may actually contain database files or no files at all). A screenshot below shows the Logical Disk counter that shows the volume mount point names. Unfortunately we don’t have a scripted way to pull the data out of the blg file at this time, so this is a manual process.
For the above information, assuming all the _DATA volume mount points contained Exchange databases, you would start entering data in the question as the following:
S:\SG01_SG04_DATA;S:\SG05_SG08_DATA;S:\SG09_SG12_DATA
You get the idea… Just remember that all drives and mount points need to be separated by a semicolon and you should be good.
Now it’s important to note that we have included a catch all Generic Disk analysis for incase any of the drive questions were not answered. So, if you ran a report and forgot to enter any drive information in, you will get an output similar to the following in the Table of Contents. This may lean you towards an actual disk related problem due to the amount of times an analysis crossed over a threshold. You will see that there were 527 disk samples taken in this perfmon and all Database, Log and Page file drives have the same alert count in them. It is actually normal that this is happening because we will now log a tripped threshold for each drive type specific analysis and have fallen through to the Generic Disk analysis. If you see this, then go directly to the Generic analysis to look at your disk analysis.
For each one of the thresholds that tripped in which drive letters were not entered, you will see an entry in the table similar to the following stating that no data was entered in the questions. You can either ignore this and view the Generic Disk analysis or re-run the analysis with the questions correctly answered, providing a more accurate analysis.
The same holds true for the Hub Transport and Client Access server disk analysis. Another question that was added to the Mailbox server role analysis was ClientMajority which specifies if the majority of the clients are running in cached mode or not. This setting directly affects the analysis of the MSExchange Database(Information Store)\Database Cache % Hit counter.
Database Cache % Hit is the percentage of database file page requests that were fulfilled by the database cache without causing a file operation, i.e. not having to read the Exchange database to retrieve the page. If this percentage is too low, the database cache size may be too small.
Here are the thresholds that were added for this particular analysis.
The last question that was added was CCRInUse. This question helps differentiate analysis for CopyQueueLength and ReplayQueueLength between CCR and LCR replication since we have different recommended values for each configuration.
There was also an update for the HUB and HUB/CAS role threshold files where you can now specify drive information for both the Exchange Transport queue file drives and the Page File Drives.
Additionally the 64bit question was removed from all the Exchange Server 2007 PAL threshold files, since Exchange 2007 is only supported in production on a 64bit Windows operating system. It’s probably also important to point out that we’ve managed to get all of the thresholds corrected and updated and a number of new analysis rules added however we haven’t necessarily managed to update or include all of the associated rule and troubleshooting text that goes with each analysis rule. As we get some more time these will be updated, for now it will be more important to migrate all the PAL 1.0 Exchange content to the new PAL 2.0 that will be available sometime in the near future. To download the latest XML files, go the XML update page here or direct download here If you are interested in the other changes that were made to the 3 threshold files here they are below:
MBX:
HUB:
CAS:
HUB/CAS:
Hi Mike...rich content...congratulations.
Regards,
Dirk.