Thoughts from the EPS Windows Server Performance Team
At least once a week, someone on the Performance team will get a customer call concerning hangs or resource depletion on their file server. The file server in question is used for user home folder storage and users are accessing Outlook Personal Storage (.pst) files stored on the server from their client. The issue will manifest as either a server hang, or PagedPool depletion (Event ID 2020). Oftentimes the issue will occur first thing in the morning - when users are logging on and launching Outlook. In especially severe cases, the issue occurs several times daily. Sometimes the server will hang for a few minutes and then continue operating for a few minutes - and then hang again. Rinse & repeat. The users are frustrated because of slow access to their data, the server administrators are frustrated because they are tasked with fixing the problem, and upper management is frustrated because everyone else is frustrated.
If you're in this situation - there's good news ... and very bad news. The good news is that this problem is very common and is a known issue. The very bad news (from the customer's standpoint) is that PST files on a LAN/WAN is an unsupported configuration. Some customers are very surprised to hear this but Network Stored PST files have been unsupported since the days of Exchange 4.0. Microsoft KB Article 297019 goes into some detail about the effects of Network PST files:
"A .pst file is a file-access-driven method of message storage. File-access-driven means that the computer uses special file access commands that the operating system provides to read and write data to the file.
This is not efficient on WAN or LAN links because WAN/LAN links use network-access-driven methods, commands the operating system provides to send data to or receive from another networked computer. If there is a remote .pst (over a network link), Microsoft Outlook tries to use the file commands to read from the file or write to the file, but the operating system then has to send those commands over the network because the file is not on the local computer. This creates a great deal of overhead and increases the time it takes to read and write to the file. Additionally, the use of a .pst file over a network connection may result in a corrupted .pst file if the connection degrades or fails."
Let's use an example to illustrate the problem and also follow the problem through to its end result.
Let's say that a user sends an e-mail message to 500 users within the company. All of these users have their e-mail delivered directly to their PST file which is stored on the File Server. Some of these 500 users may need to extend their PST files to receive it. To extend a PST, an extra allocation on disk has to be made via NTFS. This locks out the whole volume while free space is allocated and the Master File Table (MFT) is updated. While this is happening for each user, all I/O for the other 499 users is on hold.
Allocating free space can take an extended time, especially if the disk is fragmented. Now factor in multiple users extending their PST's in the same timeframe, and significant periods of MFT lockout might be observed, which in turn is seen as inability to access any other file on the volume, resulting in queueing in the server service work queues, and sometimes SRV 2019, 2020, 2021, or 2022 events being logged. This scenario might overload the disk(s).
Setting aside the example of one email being sent to a group of users, imagine if you had a couple of hundred users who each have two or three PST files. These users have been with the company for a while, and they rarely (if ever!) delete their email from their PST files. The files continue to grow in size - let's use an average of 1 GB as the size of the PST file. Now consider that when each user launches Outlook, they make a request for two (or three) files, each of them being about 1 GB in size. Then consider what happens when 200 users all launch Outlook around the same time when they get to work. 200 x 3 x 1 = 600 GB of data being requested at the same time. That's an awful lot of Disk & Network I/O to process simultaneously. This is a very common scenario - the file server "freezing" for a few minutes at a time while it tries to service these requests.
The queuing in the server service work queues is what causes this temporary hang. The server service uses work items to handle I/O requests that come in over the network - for example: a request to extend a PST file. These work items are queued in the server service work queues, and from there they are handled by the server service worker threads. The work items are allocated from a kernel resource called Non-Paged Pool (NPP).
The server service sends these I/O requests down to the disk subsystem. If, for reasons mentioned above, the disk subsystem does not respond in time, the incoming I/O requests are queued via work items in the server work queues. Since these work items are allocated from NPP, eventually this resource runs empty. Running out of NPP causes systems to hang eventually (logging an Event ID 2019 in the process).
Digging down into this from more of a troubleshooting perspective, we can usually see issues caused by the PST files manifested in Poolmon and Perfmon captures. For example, we may see the LSwn pool tag allocation climbing in a Poolmon trace. These allocations are made by SRV.SYS. The size of the allocation is configurable via the SizReqBuf registry value. One allocation is made for each work item used by the server service. When looking at this through Perfmon, you will notice a steady decrease in the "Available Work Items" counter. If Available Work Items reaches zero, then clients may experience difficulties accessing files (any files, not just the PST files!). You may also experience 2019 errors if the problem lies with LSwn allocations (Non-Paged Pool depletion)
Another tag that highlights the issues with the PST files is the MmSt tag. This tag represents Mm section object prototype PTEs - a memory management-related structure used for mapped files. Put a different way, this is the pool tag that is used to map the OS memory used to track shared files. MmSt issues often manifest as Paged Pool depletion (Event ID 2020).
Is there any server-side tweaking that can be done to mitigate some of these effects? Yes. Is there any guarantee that this will resolve the problem completely and indefinitely? No. As an environment continues to scale up, the problem will continue to manifest itself despite all the tweaking that we can do. At some point, the tweaking itself may contribute to the problem because we've reached a point where the server simply cannot handle the workload.
(Many thanks to Rob and Kevin from our CPR team for their technical input!)
- CC Hameed
No sure how I found this blog but its damm good. Check out these that I have been reading today! IE7
Looking at the performance point of view yes the article is very good for the network administrator to peep in their network.
But looking at the backup of pst point of view due to lack of centralised database this become a critical issue.
Looking forward to here some expert comment/article on this.
I know that the article mentions PerfMon and Poolmon counters that can help identify this problem. I possibly have this problem and am trying to rule it in or out.
We have about 1000 PSTs on a Win 2003 SP1 file server using external scsi attached storage for about 1.6TB of Users storage areas.
Could someone be more specific in describing what to look for. The Available Work Items (AWI) hitting zero is easy to watch for but Poolmon is a little less friendly. I am watching LSwn Allocs grow from 8261 to 8409 over an hour period. So, yes, it is growing but is that a bad number or rate? MmSt is also growing, 13,305,470 to 13,360,177 over the same time period. AWI has been pretty stead at around 20-30, but seems to only be affecting processor zero. The other three processors are basically flat at 25. We see occasional dips in AWI to zero. Also, after one particular spike to zero, AWI seems to have upped the number of Available Items and is now hovering around 65 instead of 30.
Any input or links to documents that will help more thoroughly identify this problem will be much appreciated.
To answer some of the comments about people needing to archive their mail to keep the ex store down in size, I would ask why?
What's the point in limiting people's mailbox size and then having them archive off to a network store somewhere? You still have to backup that network store and those disks are still costing you money. If backing up the store takes too long then you probably need to consider breaking your server up into multiple storage groups and backup one group in the evening 8:00pm, and the other in the early morning 2:00am or something like that. If you still have problems then you should be looking at getting a faster tape drive or doing disk based backups and then backing up to tape from the disk backup so that your tape backups can have all day to run if they need to.
I agree with Joe. Keep it in the Exchange IS database. You get the single instance storage benefit, and you still have to back it up anyway. With Exchange 2000 Standard, you had the 16GB limit, but since Exchange 2003 SP2 and Exchange 2007, the size restrictions are much higher. And the peformance storing in the IS is much better than PST's too.
Give me the reasons that this is bad logic?
This is all well and good when you don't have an IT policy sending you in the opposite direction.
Leaving things on the server is the way to go for a shared inbox. But if you have unreasonable storage limitations on the server, you've got to put the mail somewhere. If you can't put it in a PST file on a file server, what can you do? Distribute a weekly DVD backup of the PST file for everyone to copy locally?
Lots of good comments. I would be interested to see what some of the answers are. Our organization is having this same discussion right now and we would like to know how others are handling it.
We either have to come up with an expensive archival solution or raise the mailbox limits. It just seems that every time we set a standard we are just moving the problem somewhere else.
A new KB is out to fix the event id 2020
But as said, avoid PST Files over Network
We see issues on file clusters if terminal server user profiles contains pst files - even Outlook (on TS) struggles
I've seen lots of comments above talking about how in the real world i'ts not pratical to remove PST files from servers.
I strongly disagree. There are many products out there designed to archive exchange mail. Enterprise Vault and EAS to name two. Both these products will store archive mail in a central, compressed location where it can be easily backed up thus removing the load from your file servers.
If you don't have the budget for these tools then as someone else said leave it all in exchange. The disk space you save on your file servers you can add to exchange and you'll need less disk this way.
So leave the mail on the server: Go IMAP, use a web-based client (e.g., Google), cut down on the distro lists, increase mailbox size... Take all the incentives out of wanting to keep mail anyplace but the server.
Excellent article. One of the best technical explanations I have read. It is worth noting that the Lotus Notes local mail store has the same characteristics.
KB Article 297019 provides recommendations for alternatives to .pst files. In the end, the recommendation is to stop using .pst files altogether.
Se state pensando di salvare il file .pst in una cartella condivisa per ottimizzare tempi e risorse... non fatelo!
If you have 50000 users and a federal requirement of keeping emails for no less than 3 years, you cant store them on the exchange server. I have 3 pst's (2 archives and one working) that i have to store on a network server as backup software on a client is prohibitively expensive and the workstation is nothing more than a beefed up terminal. storing pst's in the users home directory is the only way to guarentee backups. PFBackup from Microsoft is good, but will only run once a day. if it could be configured to run everytime you close outlook, we would do it.
Matt - you are forgetting the other option - implementing a vaulting solution. Many organizations absolutely block PST files from being created on their servers and use a vaulting solution instead. This gives them the flexibility to maintain their archive, enforce document retention policy, keep the live mailboxes small(er) AND stay in a supported configuration.