• Processor 0 increased CPU utilization

    While looking on a Exchange 2010 server recently in task manager to review the amount of CPU utilization, I noticed that Processor 0 was at 100% CPU while all of the other CPUs were relatively lower compared to this processor.This type of behavior is caused by the Receive Side Scaling (RSS) feature not being enabled on the server. RSS is a feature that was first implemented back in Windows 2003 with the Scalable Networking Pack which allows you to span network traffic across multiple CPU cores. If RSS is not enabled, only *one* CPU will be used to process incoming network traffic which could cause a networking bottleneck on the server.Additional information on RSS can be found here.

    Here is what it looks like in Task Manager on the Performance tab.

    clip_image002

    As you can see, the first processor is pegged at 100% CPU which is indicative of RSS not being enabled. Generally on new installations of Windows 2008 or greater, this feature is enabled by default, but in this case, it was disabled.

    Prior to enabling RSS on any given machine, there are a few dependencies that are necessary for RSS to work properly and are listed below.

    • Install the latest network card driver and associated Network Configuration Utility. The network card driver update is very important as older versions had known bugs that would cause RSS to fail.
    • Offloading features of the network card must be enabled (ie.IPv4 Checksum offload,  TCP/UDP Checksum Offload for IPv4/IPv6)
    • Receive Side Scaling must be enabled on the network card properties
    • Receive Side Scaling Queues and Max number of RSS Processors must be set to the maximum value listed in the network card properties. This is typically the amount of CPU cores that are installed on the server. Hyperthreading does not count towards the max amount of CPU cores that can be leveraged here. The use of hyperthreading is generally not recommended on Exchange servers anyway and is referenced here

      Note: If Receive Side Scaling Queues and Max number of RSS Processors are not changed to a value above 1, then enabling RSS does not provide any benefits since you will only be using a single core to process incoming network traffic.
    • RSS must be enabled at the OS layer by running  netsh int tcp set global rss=enabled . Use netsh int tcp show global to confirm that the setting was enabled properly.

    After enabling RSS, you can clearly see below the difference in processor utilization on the server as the CPU utilization for Processor 0 now fairly close to the other processors right around 3:00AM.


    image

    Many people have disabled the Scalable Networking Pack features across the board due to the various issues that were caused by the TCP Chimney feature back in Windows 2003. All of those problems have now been fixed in the latest patches and latest network card drivers, so enabling this feature will help increase networking throughput almost two fold. The more features that you offload to the network card, the less CPU you will use overall. This allows for greater scalability of your servers.

    You will also want to monitor the amount of deferred procedure calls (DPC) that are created since there is additional overhead for distributing this load amongst multiple processors. With the latest hardware and drivers available, this overhead should be negligible.

    In Windows 2008 R2 versions of the operating system, there are new performance counters to help track RSS/Offloading/DPC/NDIS traffic to different processors as shown below.

    Object Performance Counter
    Per Processor Network Activity Cycles(*)

    Stack Send Complete Cycles/sec
    Miniport RSS Indirection Table Change Cycles
    Build Scatter Gather Cycles/sec
    NDIS Send Complete Cycles/sec
    Miniport Send Cycles/sec
    NDIS Send Cycles/sec
    Miniport Return Packet Cycles/sec
    NDIS Return Packet Cycles/sec
    Stack Receive Indication Cycles/sec
    NDIS Receive Indication Cycles/sec
    Interrupt Cycles/sec
    Interrupt DPC Cycles/sec

    Per Processor Network Interface Card Activity(*)

    Tcp Offload Send bytes/sec
    Tcp Offload Receive bytes/sec
    Tcp Offload Send Request Calls/sec
    Tcp Offload Receive Indications/sec
    Low Resource Received Packets/sec
    Low Resource Receive Indications/sec
    RSS Indirection Table Change Calls/sec
    Build Scatter Gather List Calls/sec
    Sent Complete Packets/sec
    Sent Packets/sec
    Send Complete Calls/sec
    Send Request Calls/sec
    Returned Packets/sec
    Received Packets/sec
    Return Packet Calls/sec
    Receive Indications/sec
    Interrupts/sec
    DPCs Queued/sec

    I hope this helps you understand why you might be seeing this type of CPU usage behavior.

    Until next time!!

    Mike

  • The case of the slow Exchange 2003 Server – Lessons learned

    Recently we received a case in support with an Exchange 2003 server where message delivery was slow and the Local Delivery queue was getting backed up. The Local Delivery queue was actually reaching in to the two thousand range and would fluctuate around that number for extended periods of time.

    So we collected some performance data and all RPC latencies, disk latencies, CPU utilization and many of the other counters that we looked at did not show any signs of any problems. <Scratching Head>

    This is actually a common problem that I have seen where the server is responding OK to clients and everything else appears to be operating normally except for the local delivery queue that continually rises. Even disabling any Anti-virus software on the server including any VSAPI versions does not resolve the problem. So we essentially have a case of a slow Exchange server with no signs of performance degradation using any normal troubleshooting methods.

    The reason may not seem apparently obvious, but let me show you what this common problem is that I have seen in these situations. This not only applies to Exchange 2003, but it also applies to later versions of Exchange.

    In some companies, they need to be able to journal messages to holding mailboxes either on the same server or a different server to maintain a copy of all messages that are sent in the organization for compliance purposes. These journaling mailboxes can get quite large and requires a special level of attention to ensure that the mailbox sizes and item counts for those mailboxes are maintained within reasonable levels. They kind of defy what our normal recommendations/guidance states because item counts in these folders can surely reach tens of thousands of items rather quickly and depends on the amount of mail that is sent within your organization.

    Generally, the special level of attention needed that I mentioned earlier for journaling mailboxes are often overlooked. For each journaling mailbox, you need to have a process that will not only back up the items in these folders, but you need to also have some process that goes in and purges the data out of the mailbox once the backup has been taken. This purging process is necessary to maintain acceptable performance levels on an Exchange server. If these mailboxes are on their own server, user mailboxes are not normally affected. If these mailboxes are on the same server as user mailboxes, then this is where you might run in to some problems.

    In this case that we received, we had found a journaling mailbox that had almost 1.5 million items in the mailbox that was 109GB in size as shown in the below screenshot. Wow!! That is a lot of items in this one mailbox.

    huge journal mailbox-fixed

    If you tried to logon to this mailbox using Outlook, the client would most likely hang for 5-10 minutes trying to query the amount of rows in the message table to generate the view that Outlook is trying to open. Once this view is created, you should now be able to view the items and then get back control of the Outlook client. You may think that you could simply go in and start removing/deleting items from this mailbox to start lowering the overall size of the mailbox. Try as you must, but you will most likely end up trying to do this for days since the performance impact of this amount of items in the mailbox will make this a very painful process. Making any modifications to the messages in these folders will cause the message tables to be updated which for this amount of items is simply going to take an exorbitant amount of time.

    Our standard recommendation for Exchange mailboxes on Exchange 2003 servers is to have item counts under 5,000 items per folder. This guidance can be found in the Understanding the Performance Impact of High Item Counts and Restricted Views whitepaper here.

    A simple troubleshooting step would be to dismount the mailbox store that this mailbox resides in to see if the message delivery queues go down. If all of the queues flush for all other mailbox stores, you have now found your problem.

    If you absolutely need to get in to the mailbox to view some of the data, an Outlook client may not be the way to go to do some housecleaning. An alternative would be to use the MFCMAPI tool to view the contents of the mailbox. MFCMAPI will allow you to configure the tool to only allow a certain number of items to be returned at any given time. If you pull up MFCMAPI’s options screen, you can change the throttling section to limit the amount of rows that are displayed. If you were to put 4800 items in the highlighted section below, you would essentially limit the amount of rows or messages that are queried when the folder is opened to the number that you have entered. This will make viewing some of information a little bit easier, but still would be very cumbersome.

    clip_image002

    There are a couple of workarounds that you can do to clean this mailbox out.

    • If the data in the mailbox is already backed up, you could disable mail for that mailbox, run the cleanup agent and then create a new mailbox for the user. Note: the size of the database will still be huge and will increase backup and restore times even if you should recreate the mailbox. If you are finding that the backup times are taking a long time, you may want to think about using the dial tone database in the next suggestion or possibly moving the mailboxes on this store to a new database AFTER you have cleaned out the problem mailbox and then retiring the old database.
    • If the Mailbox Database houses only this one mailbox, you could simply dial tone that database starting with a fresh database. Instructions on how to do this can be found here
    • Purging the data out the mailbox using Mailbox Manager or some 3rd party tool may work, but keep in mind that you will most likely experience a performance problem on the server while the information is cleaned out of the mailbox and could take possibly hours to run

    Long live that 109GB/1.5million item mailbox!!! :)

    Another way to possibly find the high item count user is to use the PFDavAdmin tool to export items counts in users mailboxes. Steps on how to do this can be found here.

    These cases are sometimes very tough to troubleshoot as any performance tool that you might try to use to determine where the problem might lie would not showing anything at the surface. Since the Exchange server is still responding to RPC calls in a timely fashion, any expensive calls running such as a query rows operation will surely slow things down. If you see that things are slow on your Exchange 2003 server and perfmon does not show anything glaring, one of the first things that I check is item counts in users mailboxes looking for some top high item count offenders. Exchange 2007 can have other reasons for this slowness, but that would be another blog post in and of itself.

    So the moral of the story here is that should you have large mailboxes in your organization that are used as a journaling mailbox, a resource mailbox, or some type of automatic email processing application that might make use of Inbox rules to manipulate data in the mailbox, then you need to be absolutely sure that if the mailboxes are backed up or not, that the item counts in the folders of these mailboxes need to be kept to a reasonable count size or they will bring an Exchange server to crawling mode in trying to process email.

    Just trying to pass on some of this not so obvious information…….

  • Audit Exchange 2007 SP2 Auditing

    There have been a few cases that have been coming through the support channels stating that auditing is not working for whatever reason. After reviewing the cases, we have found that this is due to users or groups in the Configuration Partition of Active Directory that have been granted Full Access to the containers in the tree or having the All Extended Rights permission. Having these permissions will essentially bypass any auditing events from occurring.

    Let’s take a step back for a moment. When applying Exchange 2007 SP2 in an organization, the schema is extended with the right ms-Exch-Store-Bypass-Access-Auditing. If a user previously has been granted Full Control within the Configuration tree, that user or group will then take on an allow right for this Bypass auditing right, thus being exempt from being audited. This is not good for compliance reasons and the end results will not contain audit entries for all users that can/will be accessing mailboxes.

    The other problem is that there is currently no way to lock down the ability for any administrator to add a specific Allow on an object in Active Directory for this bypass right, thus excluding them from being audited.

    Listed below are the *default* groups that have the Bypass Auditing right due to various permission settings:

    • Domain Admins
    • Enterprise Admins
    • Exchange Organization Administrators

    Once the Schema has been extended, there are 5 places to add auditing bypass entries in the configuration container in Active Directory as shown below.

    • Exchange Organization Container
    • Exchange Administrative Group Container
    • Exchange Servers Container
    • Exchange Server object
    • Exchange Database object

    When auditing is not working as expected, it would be a rather tedious process to check permissions throughout the configuration tree for these objects where the bypass extended right has been set.  I have created a powershell script (AuditBypassPerm.ps1) that should help export permissions for each of these objects to make your job finding permissions problems that much easier.

    Before I go over the script, I want to describe some of the terms that you will need to know when looking through the output of this script. Objects in Active Directory can be assigned using Access Rights using System.DirectoryServices.ActiveDirectoryRights[]. These Access rights control what type of permission a user or group has on a particular object. A listing of all the available Access rights can be found at http://msdn.microsoft.com/en-us/library/system.directoryservices.activedirectoryrights.aspx.

    The three main rights that we are concerned with relationship to this auditing are the following:

    • ms-Exch-Store-Bypass-Access-Auditing = Bypass Exchange Access Auditing in the Information Store
    • GenericAll = The right to create or delete children, delete a subtree, read and write properties, examine children and the object itself, add and remove the object from the directory, and read or write with an extended right.
    • ExtendedRight = A customized control access right. This can be used to specifically apply a particular right such as ms-Exch-Store-Bypass-Access-Auditing or this could mean that you may have an allow right for All Extended Rights as shown below. All Extended Rights mean just that, All rights including the “Bypass Exchange Access Auditing in the Information Store” right.

       image

    So with that said, these are the three main rights that we need to concentrate on when we are trying to find a needle in the haystack. The next piece that we also need to be cognizant about is whether there is a specific deny or one of the 3 rights is being inherited from some other object in the Configuration tree.

    In my example, I used an account called AuditTest1 to show how one would troubleshoot something like this. I granted a deny to the bypass right at the organization level so that this users account would be audited, but then at the Database object level, I granted the All Extended Rights right for this account. What this essentially did is to bypass auditing for this user at the database level, thus logging no events for any mailbox access for mailboxes on that database.

    The syntax for the script is as follows:

    .\AuditBypassPerm.ps1 <MBXServername>

    After running the script, you will get output similar to the following

    image

    Once the script has completed, notepad will open up a text file which will provide information similar to the below pictorial. This allows you to visually see at each of the 5 objects where we can set the bypass permissions what rights were assigned to this user.

    image

    The key takeaway here is to view what groups/accounts have the ms-Exch-Store-Bypass-Access-Auditing, GenericAll , or ExtendedRight right set on them and if it is set, to determine at what level in the Configuration Tree a potential override has been set that would prevent auditing specific accounts from being audited.

    If there is an Inherited permission that you cannot view what rights has been set at each of the 5 object levels, then you will need to open up ADSIEdit.msc and then walk the tree up from that object until you find the object in which permissions have been changed.

    So that sounds great, but what happens when you have a user that is not listed in the tree, but is still not being audited? The main reason for this is that this user is a part of a group that could have permission in to the tree to have one of these 3 rights.

    Since Powershell V1 does not have a direct way to view a users group membership, I created another really small script to list out the groups that these users are a part of. This is not the same format that is listed above, but will help provide a general understanding of what groups the user is a part of that you can then compare to the output listed above for your troubleshooting efforts.

    The syntax for the group membership script is as follows:

    .\GetUserGroups.ps1 <Username>

    Note: This can be in simple format or domain\username format as shown below.

    image

    These set of scripts can be downloaded here

    I hope this helps untangle the inability to get auditing working for some users/groups as reviewing permissions is sometimes a very tedious task.

    Another question that comes up is how do I map the Message ID that is listed in the event when accessing the message to an actual message in a users mailbox? A sample event is listed below with the relevant parts highlighted.

    image

    So from that, we can see that we have a message ID of <6B83547937704D4EB0EFA4327EF0DEC82D8F92EC36@MIKELAGDC.mikelaglab.com> and this message was opened in the folder /Calendar.

    For every message on an Exchange 2007 server, we generate a unique Message ID that is stamped on the PR_INTERNET_MESSAGE_ID MAPI property of each message. This MAPI property is also equivalent to 0X1035000E.

    With MFCMAPI, you can find this message rather easily by creating a content restriction in the mailbox. To do this, you would need to create a MAPI profile for that user specifically on an administrative workstation or use a MAPI profile that has full access to all mailboxes.

    IMPORTANT: While going in with an administrative account performing these operations, if the message is touched in the mailbox while using MFCMAPI, an auditing event will be logged to the Exchange Auditing log. If you don’t want to log any events while doing your investigation, it may be best to logon with an approved account that has the Bypass Auditing right so that whatever actions you are taking inside a users mailbox is not audited or use an account that is specific to finding what object was audited.

    Once you open the mailbox in MFCMAPI, you would then need to navigate to and open the folder that was listed in the auditing event as shown above. In this case, it was the Calendar folder. After the folder is opened, click on Table on the Menu and then select Create an Apply Content Restriction.

    image

    Next, we need to specify what Property Tag we are looking for which in this example is 0X1035000E. Once this property tag number is entered, you will see that the Property Name field will show what we want to filter on. Click OK when you are finished.

    image

    In the Property Value field, enter <6B83547937704D4EB0EFA4327EF0DEC82D8F92EC36@MIKELAGDC.mikelaglab.com> including the < > characters as shown below. This is very important that the < > characters are entered, otherwise, the restriction will not return the message. Click OK when you are done.

    image

    The result should be the message that you are looking for.

    image

    That is all for now….

    Happy Auditing!!

  • Perfwiz replacement for Exchange 2007

    NOTE: This version of Perfwiz had been replaced by a newly written script that is talked about in http://blogs.technet.com/b/mikelag/archive/2010/07/09/exchange-2007-2010-performance-data-collection-script.aspx

    Here is a replacement for general Exchange 2007 Performance data gathering. Use this in place of Perfwiz.exe as this utility does not capture all of the counters needed to properly troubleshoot Exchange 2007 performance issues.

    Follow the steps below to enable these performance counters on any given Exchange 2007 server

    Windows 2003 version of Perfwiz

    1. Extract the contents of Exchange_2007_Perfwiz.zip to your Exchange server.
    2. Open Performance Monitor
    3. Expand Performance Logs and Alerts and select Counter Logs.
    4. Right click Counter logs and select "New Log Settings From". Select the htm file that was extracted in Step 1. Click OK
    5. Select the Log Files tab and click the Configure button
    6. For the Log location, change this to a location of your choice. Click Ok
    7. Click OK to save the Performance Counter log.
    8. To start the Perfmon log, right click "Exchange 2007 Perfwiz" and then select Start.
    9. Let this run during the problem period where performance is affected
    10. Stop the perfmon log by right-clicking "Exchange 2007 Perfwiz" and selecting Stop
    11. Make arrangements with a CSS representative to get the files analyzed.

    Windows 2008 version of Perfwiz

    1. Download the appropriate version of Perfwiz for your server. Currently I have 2 versions.

      • Exchange 2007 Perfwiz (MBX-CAS-HUB specific) - This file has targeted counters instead of including all (*) counters for obvious performance reasons and easier parsing.
      • Exchange 2007 Perfwiz (All Counters). Use this at your own discretion as this will collect a *lot* of data which may decrease performance on an Exchange Server.
      • Other Role based XMLs are self-explanatory

        How to download 
        To download these XML files to your computer, right click the file of your choice, select Save Target As... , and then save it to a directory location of your choice on your Exchange Server


        Role Based
        Use these as a high level look in to how the server is performing and if  you need to branch out with more counters, use the Full Counter/Instance set below.

        Exchange_2007_Perfwiz-AIO.xml (HUB/CAS/MBX All-In-One)
        Exchange_2007_Perfwiz-CAS.xml
        Exchange_2007_Perfwiz-MBX.xml
        Exchange_2007_Perfwiz-HUB.xml
        Exchange_2007_Perfwiz-HUB-CAS.xml
        Exchange_2007_Perfwiz-UM.xml

        All Counters/All Instances
        Use this counter set at your own discretion as this could potentially cause performance degradation on your server trying to log this amount of counters.

        Exchange_2007_Perfwiz-Full.xml


    2. Open Performance Monitor
    3. Expand Reliability and Performance and then expand Data Collector Sets
    4. Right click User Defined, Select New, and then Data Collector Set
    5. Enter a unique name for this Data Collector set (ie. ExPerfwiz), select Create from a template (Recommended) and then click Next
    6. Select the Browse button, navigate to the XML file that was saved in Step 1, select Open
    7. Select Next on the next screen
    8. Enter in a root Directory of where you would like to store the performance log files. Click Next
    9. If you need to run this performance log under different credentials, enter it on this page. Click Finish

    Recognition

    I'd like to thank Ben Winzenz for creating the Windows 2008 *full* version of Perfwiz and John Rodriguez for getting me the counter sets for the All-In-One XML which I doctored up in a custom XML file. 


    Updates
    2/13/2009 - Updated all XML files to include counters from Monitoring Without System Center Operations Manager

    9/28/2009 - Added TCPv4 and TCPv6 counters to all Perfwiz counter sets

    10/15/2009 – Large update to the XML files. See below what has been added

    Role: HUB, CAS, HUB-CAS, UM, MB, AIO

    \Memory\*
    \Netlogon(*)\*
    \Process(*)\*
    \Processor(*)\*
    \Redirector\*
    \Server\*
    \System\*
    \MSExchange ADAccess Domain Controllers(*)\LDAP Searches timed out per minute
    \MSExchange ADAccess Domain Controllers(*)\Long running LDAP operations/Min
    \MSExchange ADAccess Domain Controllers(*)\Number of outstanding requests
    \MSExchange ADAccess Local Site Domain Controllers(*)\LDAP Read calls/Sec
    \MSExchange ADAccess Local Site Domain Controllers(*)\LDAP Read Time
    \MSExchange ADAccess Local Site Domain Controllers(*)\LDAP Search calls/Sec
    \MSExchange ADAccess Local Site Domain Controllers(*)\LDAP Search Time
    \MSExchange ADAccess Local Site Domain Controllers(*)\LDAP Searches timed out per minute
    \MSExchange ADAccess Local Site Domain Controllers(*)\Long running LDAP operations/Min
    \MSExchange ADAccess Local Site Domain Controllers(*)\Number of outstanding requests

    Role: HUB, CAS, HUB-CAS, UM, AIO
    \MSExchange Store Interface(*)\ConnectionCache num caches
    \MSExchange Store Interface(*)\ConnectionCache out of limit creations
    \MSExchange Store Interface(*)\ConnectionCache total capacity
    \MSExchange Store Interface(*)\ExRPCConnection creation events
    \MSExchange Store Interface(*)\ExRPCConnection disposal events
    \MSExchange Store Interface(*)\ExRPCConnection outstanding

    Role: CAS, HUB-CAS, AIO
    \MSExchange OWA\AS Queries Failure %
    \MSExchange OWA\Average Search Time
    \MSExchange OWA\Failed Requests/sec
    \MSExchange OWA\Logons/sec
    \MSExchange OWA\Proxy Response Time Average
    \MSExchange OWA\Proxy User Requests/sec
    \MSExchange OWA\Store Logon Failure %

    Full
    Netlogon(*)\*
    \MSExchange ADAccess Local Site Domain Controllers(*)\*

    Windows 2003 version of Perfwiz for Exchange 2007
    Netlogon(*)\*

    10/21/2009 – Updated Database Instance Counters to include all instances. Affects AIO and MBX Role XML
    11/4/2009 – Added the following Physical Disk Counters to all XML files.

    \PhysicalDisk(*)\Avg. Disk Queue Length
    \PhysicalDisk(*)\Avg. Disk sec/Read
    \PhysicalDisk(*)\Avg. Disk sec/Write
    \PhysicalDisk(*)\Disk Reads/sec
    \PhysicalDisk(*)\Disk Transfers/sec
    \PhysicalDisk(*)\Disk Writes/sec

    01/11/2010 -  Added ASP.NET\Requests queued to the CAS,HUB-CAS, and AIO XMLs. Added MSExchange Database(Information Store)\Database Cache % Hit to the AIO and the MBX XMLs.

    01/14/2010 - Added all MSExchangeIS Client counters to MBX and AIO XMLs

     

  • Scalable Networking Pack Rollup Released

    I am sure you are all are intimately familiar with the problems with the Scalable Networking Pack (SNP) including it's use of the TCP Chimney feature that I blogged about at http://msexchangeteam.com/archive/2007/07/18/446400.aspx.

    The problems that surfaced due to these features being enabled by default in the Service Pack 2 release of Windows 2003 brought out the worst in some network card drivers causing all types of connectivity related problems and crippled some organizations.

    After some time, a hotfix (http://support.microsoft.com/kb/948496) was created to disable these features by adding the appropriate registry keys to a server which then required a reboot to get the SNP features disabled.

    As of 8/27/2008, a new hotfix (http://support.microsoft.com/kb/950224) has been released to help address some of the commonly reported problems with relationship to Chimney and RSS for Windows 2003 servers. As more companies deploy Windows 2008 and Vista, it is crucial, or in my opinion, critical that this hotfix be applied to all Windows 2003 servers that may communicate with these operating systems. One of the main reasons is a new feature called TCP auto-tuning which makes use of RSS to expand and shrink the sizes of your TCP window to increase/decrease throughput based on current network load. This feature greatly increases throughput on your network, but if there is an underlying problem with the network card driver or any of these features between disparate systems, you may experience slower than normal network performance. The good news is that the Chimney feature is disabled by default in Vista/Windows 2008.