An Overview of Troubleshooting Memory Issues - Part Two

An Overview of Troubleshooting Memory Issues - Part Two

  • Comments 5
  • Likes

In our last post, we looked at some common memory issues and how to troubleshoot them.  Today we're going to go over excessive paging and memory bottlenecks.

We've talked about issues with the page file in several posts - something to bear in mind is that although you want to have enough RAM to prevent excessive paging, the aim should not be to try to prevent paging activity completely.  Some page fault behavior is inevitable - for example when a process is first initialized.  Modified virtual pages in memory have to be updated on the disk eventually, so there will be some amount of Page Writes /sec.  However, when there is not enough RAM installed, there are two issues in particular that you may see - too many page faults, and disk contention.

Let's start with Page Faults.  Page faults are divided into two types, soft and hard.  A page fault occurs when a process requests a page in memory and the system cannot find the page at the requested location.  If the requested page is actually elsewhere in memory, then the fault is a soft page fault.  However, if the page has to be retrieved from the disk, then a hard fault occurs.  Most systems can handle soft page faults with no issues.  However, if there are lots of hard page faults you may experience delays.  The additional disk I/O resulting from constantly paging to disk can interfere with applications that are trying to access data stored on the same disk as the page file.  Although high page faults on a system is a fairly straightforward issue, it requires some extensive data gathering and analysis in Performance Monitor.  The counters below are the important ones when troubleshooting a suspected page fault issue:

Counter Description Values to Consider
Memory \ Pages /sec Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays.  It is the sum of Memory \ Pages Input/sec and Memory \ Pages Output/sec.  It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory \ Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files. If the Pages / sec multiplied by 4,000 (the 4k page size) is greater than 70% of the total number of Logical Disk Bytes / sec to the disk(s) where the page file is located on a consistent basis then you should investigate. 

Translation:  If paging to disk is > 70% of your total disk activity on a consistent basis then there may be an issue
Memory \ Page Reads /sec Page Reads/sec is the rate at which the disk was read to resolve hard page faults. It shows the number of reads operations, without regard to the number of pages retrieved in each operation. Hard page faults occur when a process references a page in virtual memory that is not in working set or elsewhere in physical memory, and must be retrieved from disk. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It includes read operations to satisfy faults in the file system cache (usually requested by applications) and in non-cached mapped memory files. Compare the value of Memory \ Pages Reads/sec to the value of Memory \ Pages Input/sec to determine the average number of pages read during each operation. Look for sustained values.  If the value is consistently greater than 50% of the total number of Logical Disk operations to the disk where the page file resides, then there is an inordinate amount of paging taking place to resolve hard faults.
Memory \ Available Bytes Available Bytes is the amount of physical memory, in bytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free and zero page lists. If this value falls below 5% of installed RAM on a consistent basis, then you should investigate.  If the value drops below 1% of installed RAM on a consistent basis, there is a definite problem!

Remember that since the operating system has to write changed pages to disk, that there will be page write operations occurring.  However, the Page Reads /sec which indicates the number of hard faults is extremely sensitive to situations with insufficient RAM.  As the value of Available Bytes decreases, the number of hard Page Faults will normally increase.  The total number of Pages /sec that can be sustained by the system is a function of the disk bandwidth.  This does however mean that there is no simple number to determine whether or not the disks are saturated.  Instead you have to identify how much of the overall disk traffic is being caused by paging activity.

Another indicator of a memory bottleneck is that the pool of Available Bytes is depleted.  Page trimming by the Virtual Memory Manager is triggered when there is a shortage of available bytes.  What page trimming does is attempt to replenish the pool of available bytes by identifying virtual memory pages that have not been referenced recently.  When page trimming is effective, older pages trimmed from the process working sets are not needed again soon.  Trimmed pages are marked in transition and remain in RAM for a period of time to reduce the amount of paging to disk that occurs.  However, if there is a chronic shortage of available bytes, then page trimming is less effective and the result is that there is more paging to disk.  Since there is little room in RAM for the pages that are marked in transition, if a recently trimmed page is referenced again it has to be accessed from disk as opposed to RAM.  The more severe the bottleneck, the more often the page file is updated - which interferes with application-directed I/O operations on the same disk.

Before we wrap up, let's quickly discuss the guideline listed above when looking at the Memory \ Available Bytes counter.  Normally if the Available Bytes is greater than 5% of the installed RAM consistently, then you should be in decent shape.  However, there are some applications that can manage their own working sets - IIS6, Exchange Server and SQL Server.  These applications interact with the virtual memory manager to increase their working sets if there is memory available and should trim their working sets when signaled by the operating system.  The applications rely on RAM-resident cache buffers to reduce the I/O to disk.  Thus, RAM will always look full as a result.

And on that note, we will wrap up our two-part Overview of Troubleshooting Memory Issues.  Until next time ...

- CC Hameed

Share this post :
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • The Windows Server Performance Team has published part 2 of troubleshooting memory issues . This one

  • Segue em seguida a continuidade do overview sobre troubleshootings de memórias apresentado por CC Hameed

  • Picking up on your comments about soft page faults, I wonder if you've seen that memory allocation can cause >100,000 soft faults per second, which can really slow a system down.

    The issue occurs if a program is allocating and deallocating over 1MB of memory, then the heap manager will return this memory to the free list and so Windows will zero it out in the kernel next time you use it, on a page-by-page basis.  

    This could be considered an application problem rather than a Windows problem, though personally I feel Windows should allow the limit to be adjusted.

  • Bryan,

    I am seeing the same thing with soft page faults.  I am glad to see someone else has noticed this problem.  It will consume about 25-50% of the cpu time (as seen by the kernel time plotted in Task Manager).  This is definitely a problem with the OS as far as I am concerned.

    I will get up to 200,000 soft page faults per second in a test case I put together today.  Just because I am malloc/free'ing memory for intermediate results.  The other way one can see this occuring is by noticing the constantly adjusting working set size displayed in Task Manager while the Maximum Memory sizes never change.  The OS is doing a lot of work and consuming a lot of resources in order to get in my way.

    We are trying to use a Windows system for some performance computations.  This is causing us to not meet the systems full potential.

    Do you know if there are any settings to make the OS a little lazier on my behalf?

  • i am confised that how ram cache physical memory related to each other & how they work pls help meeeeeeeeeeeeeeeeeeeeeeeee