Thoughts from the EPS Windows Server Performance Team
Useful Microsoft Blogs
Hello, and welcome to our second post in the Windows 7 launch series. This post is going to be a long one, so buckle in. We’re going to start with an overview of Fault Tolerant Heap, which is a new feature in Windows 7 and Windows Server 2008 R2 and then go over some Memory Management pieces. If you’re not familiar with the general concepts of Heap, you may want to start by reading our previous posts - Heap, Part One and Heap, Part Two. Heap is a term used to describe several type of memory structures that are used to store information. For instance, every process has what is called 'process heap' and this exists as long as the process lives. A process can also have what is called a 'private heap', which is for use only by the process that creates it. A DLL can also create a heap, and does so within the memory space of the process that owns the DLL.
None of this probably seems interesting to you unless you are an application developer, but what is important is what happens if a heap becomes damaged. We on the Performance Team lovingly call this 'Heap Corruption', and yes, the word 'lovingly' is used sarcastically in this case. Heap corruption occurs when a region of heap is overwritten by bad data. Heap, like all memory structures, is broken down into what are called pages. If more data is written into a page that will fit, it ends up spilling over into the next page. The problem with this is that the act of writing into the next page is not fatal; when it happens no one is the wiser. However, when that next page needs to be accessed, it will encounter bad data and most likely crash. If you follow the logic of this you will realize that this means that the actual culprit that wrote the bad data is probably long gone by the time the crash occurs, and all we see if we debug it is the victim. Basically, if you are debugging an application crash and you see RtlAllocateHeap or RtlFreeHeap at the top of the faulting stack, you are probably a victim of heap corruption. Here is what a heap corrupted stack may look like in the debugger:
STACK_TEXT: 00000000`01abdfd8 00000000`77ef1ce6 ntdll!ExpInterlockedPopEntrySListFault+0x000000000`01abdfe0 00000000`77ef3cc7 ntdll!RtlAllocateHeap+0x27800000000`01abe230 000007ff`7fd5fea0 rpcrt4!DCE_BINDING::DCE_BINDING+0x14f00000000`01abe290 000007ff`7fd61b82 rpcrt4!RpcStringBindingComposeW+0xb000000000`01abe310 000007ff`7d4d81bf winsta!RpcWinStationBindSecure+0x4f00000000`01abe3a0 000007ff`7d4d671a winsta!WinStationOpenServerW+0x7500000000`01abe420 000007ff`5ee3bfef tscfgwmi!CWin32_TerminalService::ExecQuery+0xdf00000000`01abe790 000007ff`6de41687 framedyn!Provider::ExecuteQuery+0x7700000000`01abe7c0 000007ff`6de47813 framedyn!CWbemProviderGlue::ExecQueryAsync+0x2c300000000`01abedb0 000007ff`7fd69a75 rpcrt4!Invoke+0x6500000000`01abee20 000007ff`7fe96cc9 rpcrt4!NdrStubCall2+0x54d00000000`01abf3e0 000007ff`7fe961b6 rpcrt4!CStdStubBuffer_Invoke+0xb100000000`01abf420 000007ff`57369edb ole32!SyncStubInvoke+0x6200000000`01abf4c0 000007ff`57369e27 ole32!StubInvoke+0x14200000000`01abf580 000007ff`5718e41b ole32!CCtxComChnl::ContextInvoke+0x21e00000000`01abf770 000007ff`5718cdcb ole32!STAInvoke+0x9700000000`01abf7e0 000007ff`573692da ole32!AppInvoke+0x14400000000`01abf870 000007ff`57369a55 ole32!ComInvokeWithLockAndIPID+0x5a900000000`01abf9d0 000007ff`57369373 ole32!ComInvoke+0x12700000000`01abfa40 000007ff`5718d1f3 ole32!ThreadDispatch+0x2b00000000`01abfa70 000007ff`5718d19a ole32!ThreadWndProc+0x13a00000000`01abfb20 00000000`77c43abc user32!UserCallWinProcCheckWow+0x1f900000000`01abfbf0 00000000`77c43f5c user32!DispatchMessageWorker+0x3af00000000`01abfc60 00000001`00011f31 wmiprvse!WmiThread<unsigned long>::ThreadWait+0x14100000000`01abfef0 00000001`00013539 wmiprvse!WmiThread<unsigned long>::ThreadDispatch+0x51900000000`01abff50 00000001`0001379d wmiprvse!WmiThread<unsigned long>::ThreadProc+0x2d00000000`01abff80 00000000`77d6b71a kernel32!BaseThreadStart+0x3a
Unfortunately, this is a fairly common scenario that we see. Our previous posts on heap issues contain information on how we debug these, and more information is available in Microsoft KB Article 286470.
OK, so let’s look at how Windows 7 and Windows Server 2008 R2 mitigate heap corruption issues – Fault Tolerant Heap (FTH). The main goals of FTH are:
So basically, Fault Tolerant Heap (FTH) watches for applications that crash, and then tries to determine if the crash is due to heap corruption. If the conclusion is that it is, then FTH tracks the application to see if the frequency of the crash warrants a shim, or applies a shim on the next run, depending on its configuration and whether the internet is accessible. An administrator can also apply a shim manually using the Application Compatibility Toolkit. The FTH shim is designed to mitigate the most common causes of heap corruption, such as small buffer overruns and double frees. It also tracks subsequent behavior of the shimmed application to determine the degree to which the shim was successful. If it is deemed not successful, the shim is removed to minimize interference with normal application functionality.
Full FTH functionality is only supported on client SKU’s. This means that it does not monitor and shim applications running on server SKU’s. However, you can manually apply the shim to an application on a server using the Application Compatibility Toolkit. FTH also only applies to interactive programs. Since services are no longer allowed to interact with the desktop starting with Windows Vista and Windows Server 2008, they will typically not be eligible for automatic FTH monitoring. Again however, you can manually shim a service using the Application Compatibility Toolkit.
FTH runs as part of what is called the Diagnostic Policy Service, which runs within a SVCHOST.EXE process running under the Local Service account. Because of this, the Local Service account requires full Read access to the path of the application in question, or else it may track the application but never be able to apply the shim. The user's desktop for instance is not fully readable by the Local Service account, so an application being run directly from the desktop will not be shimmed.
FTH registry values are stored in the following key: HKEY_LOCAL_MACHINE\Software\Microsoft\FTH. There are a number of values under this key, but the main ones to watch are:
There is also a State key under the FTH key. This key stores information on applications that have been shimmed. So for instance, if you open this key on a fresh machine, it should have nothing under it other than the typical Default - Value Not Set. Once an application crashes due to heap corruption more than 3 times within 60 minutes, it will be added to this key in the format of <Appname> = <binary blob>. You can't read what is actually in the binary value, but it includes various information such as the process-specific versions of the values listed above. All this key is really useful for from a user standpoint is that you can view the key to see what if any processes have been caught crashing in what appears to be heap corrupting behavior. Overall, FTH should assist in automatically addressing many common application crashes without any sort of intervention by the user. Now, let’s turn our attention to some new memory management pieces within Windows 7 and Windows Server 2008 R2, beginning with Working Set Trimming …
In previous versions of Windows, especially 64-bit versions of Windows Server 2003, the size of the working set of system cache could potential grow to consume all, or nearly all of RAM. In Windows Server 2008 R2 and Windows 7, significant changes were introduced to the management of working sets to address that situation. The nature of the changes is as follows:
With respect to Contiguous Memory Allocations, new multi-megabyte tracking structures allow the memory manager to skip already-allocated ranges in large page chunks, yielding up to a 512x performance increase on some workloads. For example, Hyper-V allocations by VID.sys are now more than 30 times faster. In addition, pervasive top-down optimal scanning with a sliding window has contributed greatly to increased performance. Specifically, it dramatically improves the performance of Hyper-V creation of guest VMs and enterprise applications like SQL that allocate large amounts of memory.
The effectiveness of ASLR has been enhanced to include 64 possible load addresses for 32-bit drivers and 256 for 64-bit drivers, up from 16 for each. In addition, large session drivers such as Win32k.sys are also now relocated. Extra effort is also made to relocate user space images even when system virtual address space is tight by using the user address space of the system process.
Finally, let’s look at some Translation Look-Aside Buffer (TLB) and Cache Flush improvement. The TLB is how a processor caches virtual-to-physical translations to provide performance gains. The operating system is required to flush the corresponding entries whenever it changes a virtual-to-physical mapping. Windows 7 and Windows Server 2008 R2 take advantage of newer CPU designs that do not require TLB invalidation for permission promotion, eliminating the need for TLB flushing for many common operations such as dirty bit faults. Also added is automatic tracking of I/O space mappings, thus making the system robust against conflicting attribute specification by automatically guaranteeing that incorrect mapping requests are transparently converted to correct ones. This improves performance by eliminating unneeded and costly flushing of the entire cache.
And with that – we’ve reached the end of our second day! Tomorrow, Jim Martin will be back with a look at Core Parking / Intelligent Timer Tick and Timer Coalescing. Enjoy the rest of your Friday!
- Tim Newton, with special contributions by Jim Martin
<p>Excellent post! A few questions...</p>
<p>1) "MaximumTrackedProcesses - The maximum number of instances of a tracked process that FTH will monitor concurrently. The default is 4."</p>
<p>How is an "instance" determined - by image name only? Path? Other characteristics?</p>
<p>2) Extra effort is also made to relocate user space images even when system virtual address space is tight by using the user address space of the system process.</p>
<p>Can you elaborate on this? What is meant by "user address space of the system process"?</p>
<p>MaximumTrackedProcesses uses the path to determine uniqueness, so if you have the same app installed in two different places, it will track them both even though the EXE name is the same.</p>
<p>The second question is a bit more complex. What this means is that in a situation where the machine is very loaded (32-bit under heavy load), the system address space can become fragmented. If this occurs, there may not be enough contiguous system cache virtual address space for a process. We can now get more memory by attaching to the System process and temporarily using its user mode address space range since this range is almost always completely empty since no user mode app executes in the context of the System process.</p>
<p>Thanks for the response, Tim!</p>
<p>The user-mode address space range of the System process is not something that I've encountered previously in discussions. Any idea where one might find more information about it? (I don't have my WI 5th edition handy; will have to check there later...)</p>
<p>Presumably, KeStackAttachProcess is used to accomplish this? Was there specific code added to Windows 7 to support the notion of using the range 0x0-0x7fffffff for some purpose? Or could the same technique be used in a pre-Win7 OS, were one to wish to attempt it?</p>
<p>(I guess, I'm also curious about the possibility that something could be "hidden" [malware?] in the user-mode address space range of the System process. I don't imagine that many anti-malware apps check that range, but I could certainly be imagining incorrectly!... Any thoughts are appreciated!)</p>
<p>The user mode address range of the system process is treated just like that of any other process. Typically however, no user mode code is executing there and the space only contains data allocations created by drivers. For 32-bit systems (with limited VA space) this can give you some more space to work with. </p>
<p>None of this is new, and the ability to use it has existed in every release of NT. KeStackAttachProcess is the API you would use to access this. I don't think this would have any effect on potential malware, since the ability to access it is not new, there is just no real reason to exploit it over any other address space.</p>
<p>I am not nearly as knowledgeable about this level of memory management as I sound, I got all this info from the product group ;)</p>
<p>Thanks very much, Tim, for the responses (and for being the 'liaison', as well as caring enough to get the extra info in the first place!). :-)</p>
<p>Thank you for the great post!</p>
<p>I have a question regarding shims and compatibility. </p>
<p>Will FTH apply shims to an application marked as compatible with Windows 7 (e.g. which has an appropriate supportedOS ID in Compatibility section of its manifest)?</p>
<p>Hello Tim, I am not able to see FTH folder in the path you mentioned in your before post. There is another procedure I have to do to install or configure FTH in my windows servers 2008 R2?</p>