Thoughts from the EPS Windows Server Performance Team
Useful Microsoft Blogs
In our last post, we talked about Pages and Page Tables. Today, we’re going to take a look at one of the most common problems when dealing with virtual memory – the Page Fault. A page fault occurs when a program requests an address on a page that is not in the current set of memory resident pages. What happens when a page fault occurs is that the thread that experienced the page fault is put into a Wait state while the operating system finds the specific page on disk and restores it to physical memory.
When a thread attempts to reference a nonresident memory page, a hardware interrupt occurs that halts the executing program. The instruction that referenced the page fails and generates an addressing exception that generates an interrupt. There is an Interrupt Service Routine that gains control at this point and determines that the address is valid, but that the page is not resident. The OS then locates a copy of the desired page on the page file, and copies the page from disk into a free page in RAM. Once the copy has completed successfully, the OS allows the program thread to continue on. One quick note here – if the program accesses an invalid memory location due to a logic error an addressing exception similar to a page fault occurs. The same hardware interrupt is raised. It is up to the Memory Manager’s Interrupt Service Routine that gets control to distinguish between the two situations.
It is also important to distinguish between hard page faults and soft page faults. Hard page faults occur when the page is not located in physical memory or a memory-mapped file created by the process (the situation we discussed above). The performance of applications will suffer when there is insufficient RAM and excessive hard page faults occur. It is imperative that hard page faults are resolved in a timely fashion so that the process of resolving the fault does not unnecessarily delay the program’s execution. On the other hand, a soft page fault occurs when the page is resident elsewhere in memory. For example, the page may be in the working set of another process. Soft page faults may also occur when the page is in a transitional state because it has been removed from the working sets of the processes that were using it, or it is resident as the result of a prefetch operation.
We also need to quickly discuss the role of the system file cache and cache faults. The system file cache uses Virtual Memory Manager functions to manage application file data. The system file cache maps open files into a portion of the system virtual address range and uses the process working set memory management mechanisms to keep the most active portions of current files resident in physical memory. Cache faults are a type of page fault that occur when a program references a section of an open file that is not currently resident in physical memory. Cache faults are resolved by reading the appropriate file data from disk, or in the case of a remotely stored file – accessing it across the network. On many file servers, the system file cache is one of the leading consumers of virtual and physical memory.
Finally, when investigating page fault issues, it is important to understand whether the page faults are hard faults or soft faults. The page fault counters in Performance Monitor do not distinguish between hard and soft faults, so you have to do a little bit of work to determine the number of hard faults. To track paging, you should use the following counters: Memory\ Page Faults /sec, Memory\ Cache Faults /sec and Memory\ Page Reads /sec. The first two counters track the working sets and the file system cache. The Page Reads counter allows you to track hard page faults. If you have a high rate of page faults combined with a high rate of page reads (which also show up in the Disk counters) then you may have an issue where you have insufficient RAM given the high rate of hard faults.
OK, that will do it for this post. Until next time …
- CC Hameed
What about the "memory\pages/sec" counter? According to the Perfmon explanation this counter records the hard page fault rate for the monitored system.
What would you consider a "high rate of page faults and page reads" for a (Vista) workstation? Are there any published baselines for workstations vs Exchange servers vs SQL Servers?
What would be considered a high rate of Page Faults/Sec?
Thanks a lot for Very Well Explained with simple and strait language.
What does it mean by "high rate of page faults combined with a high rate of page reads" ? what figures are considered as high rate in which columns of the conter information in perfmon? I have 8GB RAM on HP Prolient Gen 5 sever with 4 quad core physical processors with Win 2003 EE 64 bit OS and IIS 6.0 where maximum of 1GB RAM is used when web application server is dying to serve the user requests when "Web Services" --> "Current Connections" will reach 50 in maximum column. Where should I refere for what actions I should take to get rid of my performance issue. I am a completely new guy to this kind of analysis.
any help is greatly appreciated with lot of thankfulness
So how does Vista, W2K8 and W2K8 R2 come up with the Hard Faults / Sec in Reliability and Performance Monitor. I'd have thought, at least for W2K8 R2 that this counter would be exposed directly in perfmon.
You talk about page reads/sec allowing you to track hard faults, but does that mean it's the Hard Faults/ sec counter in disguise? In my testing it didn't show that exactly, but it was close (it was always slightly higher).
Sorry for being so ignorant, but I was trying to find the previous posts and just couldn't. Can anyone describe an easy way to find the previous posts?
To find the previous post, go to the archive for June 2008 (/b/askperf/archive/2008/06.aspx). It will be the article listed immediately prior to this one.
finally an articale that gives a pretty good idea about page faults/sec
I had the same question as Craig and Jeroen...
"If you have a high rate of page faults combined with a high rate of page reads (which also show up in the Disk counters) then you may have an issue where you have insufficient RAM given the high rate of hard faults."
What would be considered a high rate of page faults, and of page reads?
Very interesting, but the key thing is HOW you reduce Page Faults