Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
As you’ve probably surmised by my blog posts and other writings, I like knowing exactly what my systems are doing. I want to know if a process is running away with the CPU, causing memory pressure, or hitting the disk. Besides keeping my computers running smoothly, my vigilance sometimes helps me spot performance and reliability problems in Windows and third-party code.
The main way I keep tabs on things is to configure Process Explorer to run automatically when I log in. Whenever I configure a new computer, I add a shortcut to Process Explorer to my profile’s Start directory that includes the /t (minimize) switch. Process Explorer runs otherwise hidden with tray icon that shows a small historical view of CPU activity level. Because I want access to detailed information about system processes, as well as my own, I also specify the /e option on Vista, which causes Windows to present a UAC prompt on logon that allows me to grant Process Explorer administrative rights.
Because I keep an eye out for CPU spikes in Process Explorer’s tray icon, which show up as green or red for user-mode (application) and kernel-mode (operating system and drivers) CPU usage, respectively, I’ve identified several application bugs over the last few months. In this post, I’ll share how I used both Process Explorer and another tool, Kernrate, to identify a problem with a third-party driver and followed the problem through to a fix by the vendor.
Not long after I got a new laptop several months ago, I noticed that the system sometimes felt sluggish. Process Explorer’s tray icon corroborated my perception by displaying a mini-graph of red CPU activity. The icon opens a tooltip that reports the name of the process consuming the most CPU when you move the mouse over it, and in this case the tooltip showed the System process as being responsible:
The first few times I noticed the problem, it resolved itself shortly after and I didn’t have a chance to troubleshoot. However, I could see by opening Process Explorer’s System Information dialog that the CPU spikes were significant:
The System process is special because it doesn’t host an executable image like other processes. It exists solely to host operating system threads for the memory manager, cache manager, and other subsystems, as well as device driver threads. These threads execute entirely in kernel mode, which is why System process CPU usage shows up as red in Process Explorer’s graphs.
I suspected that a third-party device driver was the cause of the problem, so the first step in my investigation was to figure out which thread was using CPU, which would hopefully point me at the guilty party. I watched vigilantly for signs of trouble every time I switched networks and jumped the first time I saw one. Process Explorer shows the threads running in a process on the Threads page of the Process Properties dialog, so I double-clicked on the System process and switched to the Threads page the next time I noticed the CPU spike:
The “ntkrnlpa.exe” prefix on each thread’s start address identified the ones I saw at the top of the CPU usage sort order as operating system threads (Ntkrnlpa.exe is the version of the kernel loaded on 32-bit client systems that have no execute memory protection or server systems that need to address more than 4GB of memory). Because I had previously configured Process Explorer to retrieve symbols for operating system images from the Microsoft public symbol server, the thread list also showed the names of the thread start functions. The most active threads began in the ExpWorkerThread function, which means that they were worker threads that perform work on behalf of the system and device drivers. Instead of creating dedicated threads that consume memory resources, the system and drivers can throw work at the shared pool of operating system worker threads.
Unfortunately, knowing that worker threads were causing the CPU usage didn’t get me any closer to solving identifying a root cause. I really needed to know what functions the worker threads were calling, because the functions would be inside the device driver or operating system component on whose behalf the threads were running. One way to look inside a thread’s execution is to look at the thread’s stack with Process Explorer. The stack is a memory region that stores function invocations and Process Explorer will show you a thread’s stack when you select the thread press the Stack button or double-click on the thread. On Vista, however, you get this error when you try and look at the stack for threads in the System process:
The System process is a special type of process on Vista called a “protected process” that doesn’t allow any access to its threads or memory. Protected processes were introduced to support Digital Rights Management (DRM) so that hi-definition content providers can store content encryption keys with a reduced risk of an administrative user using DRM-stripping tools to reach into the process and read the keys.
That approach foiled, I had to find another way to see what the worker threads were doing. For that, I turned to KernRate, a command-line profiling tool that’s a free download from Microsoft. KernRate can profile user-mode processes and kernel-mode threads. It uses the sample-based profiling facility that was introduced in the first release of Windows NT, which records the unique addresses at which the CPU is executing when the profiling interval timer fires. When you stop a profile capture, Kernrate retrieves the information from the kernel, maps the addresses to the loaded device drivers into which the fall, and can even use the symbol engine to report the names of functions.
I wouldn’t need symbols if the trace identified a device driver, so I ran Kernrate without passing it any arguments. Despite the fact that there’s no officially supported version of Kernrate for Vista, the version for Windows XP, Kernrate_i386_XP.exe, works on Vista 32-bit (you can also use the recently-released xperf tool to perform similar profiling - xperf requires Vista or Server 2008, but works on 64-bit versions). I let the profile run through heavy bursts of CPU and then hit Ctrl+C to print the results to the console window:
In first place were hits in the kernel, but in second was a driver that I didn’t recognize, b57nd60x. Most driver files are located in the %systemroot%\system32\drivers directory, so I could have opened that folder and viewed the file’s properties in Explorer, but I had Process Explorer open so a quicker way to check the driver’s vendor and version was to open the DLL view for the System process. The DLL view shows the DLLs and files mapped into the address space of user-mode processes, but for the System process it shows the kernel modules, including drivers, loaded on the system. The DLL view revealed that the driver was for my laptop’s NIC, was from Broadcom, and was version 10.10:
Now that I knew that the Broadcom driver was causing the CPU usage, the next step was to see if there was a newer version available. I went to Dell’s download page for my system, but didn’t find anything. Suspecting that what I noticed might not be a known issue, I decided to notify Broadcom. I used contacts on the hardware ecosystem team here at Microsoft to find the Broadcom driver representative and email him a detailed description of the symptoms and my investigation. He forwarded my email to the driver developer, who acknowledged that they didn’t know the cause and within a few days sent me a debug version of the driver with symbols so that I could capture a Kernrate profile that would tell them what functions in the driver were active during the spikes. The problem reoccurred a few days later and I sent back the kernrate output with function information.
The developer explained that my trace revealed that the driver didn’t efficiently interact with the PCIe bus when processing specific queries and the problem seemed to be exacerbated by my particular hardware configuration. He gave me new driver for me to try and after a few weeks of monitoring my laptop closely for issues, I confirmed that the problem appeared to be resolved. The updated driver has not yet been posted to Dell’s support site, but I expect it to show up there in the near future. Another case closed, this time with Process Explorer, Kernrate, and a helpful Broadcom driver developer.
If you like these troubleshooting blog posts, you’ll enjoy the webcast of my “Case of the Unexplained…” session from TechEd/ITforum. Its 75 minutes are packed with real-world troubleshooting examples, including the one written up in this post and others, as well as some that I haven’t documented. At the end of the session I ask the audience to send me screenshots, log files and descriptions of their own troubleshooting success stories, in return for which I’ll send back a signed copy of Windows Internals. The offer stands, so remember to document your investigation and you can get a free book. I’ve gotten a number of great examples and my next blog post will be a guest post by someone that watched the webcast and used Process Monitor to solve a problem with their web server.
Finally, if you want to see me speak live, come to TechEd US/IT Pro in June in Orlando where I’ll be delivering “The Case of the Unexplained…”, “Windows Server 2008 Kernel Advances”, and “Windows Security Boundaries”. Hope to see you there!
I couldn't figure out what was causeing my father's computer to run so slowly. When I used process explorer it showed that the Interupts were running at >75%. When I looked in detail using kernrate I saw that hal was using almost twice as much time as the kernal itself. When I looked at hal in the detailed view from kernrate it was mostly harddrive related commands.
Further investgation showed that somehow the primary IDE controller was in PIO instead of DMA. Once I uninstalled the device and bounced the box everything worked great. The harddrive was running in DMA5 once more, and idle system processes were back below 4%. I would never have figured out this error without sysinternals and watching your presentation of how to use the tools.
hello. great post! in fact it describes same exact trouble that I have. could you please send me that renewed driver that you got from dell. because they still don't have it on website.
I remember seeing a video demonstration on the Microsoft site for using Procexp. Do you happen to kown the URL or how I can access the demo again.
Great tool, I am trying to sort out an issue with Windows media player (The player is halted by high CPU usage from some where for about 0.5 - 1 sec always around 3/4 of the way through the track?)
I have used Procexp to indentify the thread, but cannot get much further.
Any help welcomed
I have had the same issue. So far I have been hesitant to install the driver update from Broadcom. But after calling Dell and being told they will not support Vista (as my D820 came with XP orignally) I decided to bite it installed the updated driver from the Broadcom site. The spikes have dissapeared.
This happens because the hardware vendor will configure every possible driver for the entire machine to be loaded and enabled even for devices you do not use.
As a plea for help, can we get a Windows version with a user friendly way to disable drivers and services for parts that are not needed?
Zero configuration wireless networking service
Windows image acqusition (WIA) service
SSDP discovery service
Our standard desktop is severely slowed down by the extra RAM usage used by the unneeded processes and drivers.
Lastly, can some of the always loaded processes (e.g., print spooler) be put into a hibernate mode where they use minimal memory until a print request comes in from the user? Spooler is using 9mb of RAM on a machine booted 1 hour ago where there has been no print jobs started.
Yep, I had issues with Brodcomm and the Chimeny stack as well. Once I turned off the chimeny stack all was well, but didn't get a new driver to fix it however since then MS has posted a hotfix to turn the tcp chimeny stack off by default instead of on by default which is much better. So if you were able to get a debug version of the driver and Kernrate to get the functions that a protected process used, then couldn't the same method be used to be able to snoop and get past the DRM and get teh keys anyway? DRM doesn't work, there is always away around it so just get rid of it.
I have the same exact issue described in this article with my Inspiron 1525. Occasionally the Broadcom driver takes up 50% of both cores of my Core2Duo 2.0Ghz and will not fix itself until the system is restarted. Sending/recieving in Microsoft Outlook 2007 reproduces this error every time for me.
Is there a way I can fix this? I'm at my wits end. Mark makes reference to a 'fixed driver' he recieved but I cannot find it anywhere.
MUCH THANKS FOR ANY HELP!
Updated drivers are available on Broadcom site at http://www.broadcom.com/support/ethernet_nic/netxtreme_desktop.php
Hope this helps :-)
I installed the newest Vista driver (10.82.0.0a) on my Dell Precison M70 from Broadcom's site menitoned above. Since then all peaks on the system process disappeared. Thanks!!!
Just wanted to chime in and offer my thanks as well. After seeing a constantly high CPU after I installed a system monitor, I recalled this blog and took a closer look. Sure enough it was the same driver issue. I downloaded the tools and went through the steps, finding the same issue. I'll be sure to keep these on hand whenever I find something similar. Thanks again!
I've noticed that the system process has been using 17% of my cpu for some time now. I finally got around to kernrating it, and lo-and-behold it was the same Broadcom driver. Installing the new version has made the system process nice and quite again. Thanks Mark!
P.S. I couldn't get xperf to admit that the Broadcom driver was at fault, perhaps I didn't specify the correct logging option.
Had the same problem you described (100% CPU spikes) with my Vista installation on HP Compaq nx8220. Tracked it down as directed and replaced the driver with the one from the Broadcom-Site.
Everything's back to normal now - thank you so much.
Wow, amazing how quickly problems get solved when people share each other's secrets a la open source. Microsoft acts internally more like open source, and expects the public not to. Or perhaps everybody gets PDBs just for the asking.
This is a very good presentation : Well Explained, i wanted to know if we would be able to find out drivers / Process under the System Process, On our terminal Servers we see System Process having an Handle on NTUSER.dat file under the profile directory.
How to detect which is the culprit
How about adding something like a "Monitor drivers" feature to Process explorer where it uses the same sequence of debugging steps to maintain a list of non-microsoft device drivers that frequently use a lot of cpu? I know some would always be expected to show up but perhaps these can be marked as known and then unknown ones highlighted? This idea needs refining but might be worth adding,