Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
As you’ve probably surmised by my blog posts and other writings, I like knowing exactly what my systems are doing. I want to know if a process is running away with the CPU, causing memory pressure, or hitting the disk. Besides keeping my computers running smoothly, my vigilance sometimes helps me spot performance and reliability problems in Windows and third-party code.
The main way I keep tabs on things is to configure Process Explorer to run automatically when I log in. Whenever I configure a new computer, I add a shortcut to Process Explorer to my profile’s Start directory that includes the /t (minimize) switch. Process Explorer runs otherwise hidden with tray icon that shows a small historical view of CPU activity level. Because I want access to detailed information about system processes, as well as my own, I also specify the /e option on Vista, which causes Windows to present a UAC prompt on logon that allows me to grant Process Explorer administrative rights.
Because I keep an eye out for CPU spikes in Process Explorer’s tray icon, which show up as green or red for user-mode (application) and kernel-mode (operating system and drivers) CPU usage, respectively, I’ve identified several application bugs over the last few months. In this post, I’ll share how I used both Process Explorer and another tool, Kernrate, to identify a problem with a third-party driver and followed the problem through to a fix by the vendor.
Not long after I got a new laptop several months ago, I noticed that the system sometimes felt sluggish. Process Explorer’s tray icon corroborated my perception by displaying a mini-graph of red CPU activity. The icon opens a tooltip that reports the name of the process consuming the most CPU when you move the mouse over it, and in this case the tooltip showed the System process as being responsible:
The first few times I noticed the problem, it resolved itself shortly after and I didn’t have a chance to troubleshoot. However, I could see by opening Process Explorer’s System Information dialog that the CPU spikes were significant:
The System process is special because it doesn’t host an executable image like other processes. It exists solely to host operating system threads for the memory manager, cache manager, and other subsystems, as well as device driver threads. These threads execute entirely in kernel mode, which is why System process CPU usage shows up as red in Process Explorer’s graphs.
I suspected that a third-party device driver was the cause of the problem, so the first step in my investigation was to figure out which thread was using CPU, which would hopefully point me at the guilty party. I watched vigilantly for signs of trouble every time I switched networks and jumped the first time I saw one. Process Explorer shows the threads running in a process on the Threads page of the Process Properties dialog, so I double-clicked on the System process and switched to the Threads page the next time I noticed the CPU spike:
The “ntkrnlpa.exe” prefix on each thread’s start address identified the ones I saw at the top of the CPU usage sort order as operating system threads (Ntkrnlpa.exe is the version of the kernel loaded on 32-bit client systems that have no execute memory protection or server systems that need to address more than 4GB of memory). Because I had previously configured Process Explorer to retrieve symbols for operating system images from the Microsoft public symbol server, the thread list also showed the names of the thread start functions. The most active threads began in the ExpWorkerThread function, which means that they were worker threads that perform work on behalf of the system and device drivers. Instead of creating dedicated threads that consume memory resources, the system and drivers can throw work at the shared pool of operating system worker threads.
Unfortunately, knowing that worker threads were causing the CPU usage didn’t get me any closer to solving identifying a root cause. I really needed to know what functions the worker threads were calling, because the functions would be inside the device driver or operating system component on whose behalf the threads were running. One way to look inside a thread’s execution is to look at the thread’s stack with Process Explorer. The stack is a memory region that stores function invocations and Process Explorer will show you a thread’s stack when you select the thread press the Stack button or double-click on the thread. On Vista, however, you get this error when you try and look at the stack for threads in the System process:
The System process is a special type of process on Vista called a “protected process” that doesn’t allow any access to its threads or memory. Protected processes were introduced to support Digital Rights Management (DRM) so that hi-definition content providers can store content encryption keys with a reduced risk of an administrative user using DRM-stripping tools to reach into the process and read the keys.
That approach foiled, I had to find another way to see what the worker threads were doing. For that, I turned to KernRate, a command-line profiling tool that’s a free download from Microsoft. KernRate can profile user-mode processes and kernel-mode threads. It uses the sample-based profiling facility that was introduced in the first release of Windows NT, which records the unique addresses at which the CPU is executing when the profiling interval timer fires. When you stop a profile capture, Kernrate retrieves the information from the kernel, maps the addresses to the loaded device drivers into which the fall, and can even use the symbol engine to report the names of functions.
I wouldn’t need symbols if the trace identified a device driver, so I ran Kernrate without passing it any arguments. Despite the fact that there’s no officially supported version of Kernrate for Vista, the version for Windows XP, Kernrate_i386_XP.exe, works on Vista 32-bit (you can also use the recently-released xperf tool to perform similar profiling - xperf requires Vista or Server 2008, but works on 64-bit versions). I let the profile run through heavy bursts of CPU and then hit Ctrl+C to print the results to the console window:
In first place were hits in the kernel, but in second was a driver that I didn’t recognize, b57nd60x. Most driver files are located in the %systemroot%\system32\drivers directory, so I could have opened that folder and viewed the file’s properties in Explorer, but I had Process Explorer open so a quicker way to check the driver’s vendor and version was to open the DLL view for the System process. The DLL view shows the DLLs and files mapped into the address space of user-mode processes, but for the System process it shows the kernel modules, including drivers, loaded on the system. The DLL view revealed that the driver was for my laptop’s NIC, was from Broadcom, and was version 10.10:
Now that I knew that the Broadcom driver was causing the CPU usage, the next step was to see if there was a newer version available. I went to Dell’s download page for my system, but didn’t find anything. Suspecting that what I noticed might not be a known issue, I decided to notify Broadcom. I used contacts on the hardware ecosystem team here at Microsoft to find the Broadcom driver representative and email him a detailed description of the symptoms and my investigation. He forwarded my email to the driver developer, who acknowledged that they didn’t know the cause and within a few days sent me a debug version of the driver with symbols so that I could capture a Kernrate profile that would tell them what functions in the driver were active during the spikes. The problem reoccurred a few days later and I sent back the kernrate output with function information.
The developer explained that my trace revealed that the driver didn’t efficiently interact with the PCIe bus when processing specific queries and the problem seemed to be exacerbated by my particular hardware configuration. He gave me new driver for me to try and after a few weeks of monitoring my laptop closely for issues, I confirmed that the problem appeared to be resolved. The updated driver has not yet been posted to Dell’s support site, but I expect it to show up there in the near future. Another case closed, this time with Process Explorer, Kernrate, and a helpful Broadcom driver developer.
If you like these troubleshooting blog posts, you’ll enjoy the webcast of my “Case of the Unexplained…” session from TechEd/ITforum. Its 75 minutes are packed with real-world troubleshooting examples, including the one written up in this post and others, as well as some that I haven’t documented. At the end of the session I ask the audience to send me screenshots, log files and descriptions of their own troubleshooting success stories, in return for which I’ll send back a signed copy of Windows Internals. The offer stands, so remember to document your investigation and you can get a free book. I’ve gotten a number of great examples and my next blog post will be a guest post by someone that watched the webcast and used Process Monitor to solve a problem with their web server.
Finally, if you want to see me speak live, come to TechEd US/IT Pro in June in Orlando where I’ll be delivering “The Case of the Unexplained…”, “Windows Server 2008 Kernel Advances”, and “Windows Security Boundaries”. Hope to see you there!
I have Process Explorer instead of Taskmanager.
One problem I used to face is ProcExp will open with standard user privilage so incase I need to see inside system processes it will fail. Today I noticed that I can add a /e switch in registry itself where ProcExp places a shortcut as debugger. So each time I press Ctrl+Shift+Esc to open ProcExp I get UAC prompt and I can open it in Admin mode. Good learning, thanks to Autoruns which gave me the path in registry.
Thanks for troubleshooting and in the replies the link to the Broadcom vanilla driver.
From Wireshark, I noticed this seemed to be happening when Outlook 2007 was trying to talk RPC/DCE to the Exchange server. There was a repetitive 4 packet transaction, not sure what either Outlook or the server trying to do or why this affected the BroadCom card.
FYI, the latest version of KernRate that runs on Vista and Windows 2008 is available in Windows 2008 DDK (it's in \tools\other). There're both 32-bit and 64-bit versions. The DDK itself is a free download from Microsoft.
Between the two of you I didn't even need to track the bastard intel 2100 crappy driver down. I first attempted the broadcom 57x driver to no avail. Saw Intuits post and tried that. Bingo!
Thanks to this article I was able to trace the constant CPU usage of about 40% on my system to ACPI.SYS. It appears to be looping after coming out of sleep mode and the only way to stop it is through a Restart. Unfortunately I am stuck now. Vista says the driver is up to date and it won't allow it to be removed or disabled. I've seen a few other posts elsewhere with the same problem but none with a fix.
Nice article. Having used Kernrate, a mention of krview would have been nice. I have seen the reports generated by krview is very informative and a nice document to send it across for further analysis. I wonder why Microsoft is not coming up with a x64 version kernrate. Though this is OT, I am using this thread to bring to your attention and hope you can use your contacts inside to do something about this. I have not tried xperf yet as I use XP...
I've found a problematic thread using process explorer, but can't suspend it...get the same error message-"unable to access thread" I tried using kernrate but after I press control-c the window just disappears. Is this because it only shows the results by printing? Or is it because I'm using Vista-64 and need to try xperf? Also, isn't there anyway to disable the security protecting of the file I want to access? Via Dos prompt or what? Any help you guys can give would be helpful, but anyhow, kudos on an excellent post!
Great article. I had CPU spiking up on the system process and went through your diagnostic process. It turned out to be the same Broadcom driver that was causing the issue. I downloaded the latest driver and the issue is resolved now.
Again a great article.
I would really like to watch your webcasts but unfortunately the bandwidth in Thailand is too poor. Is there any way to download the video files?
That's a great help for me, thank you!
I've followed your article step by step to identify the problem but got results I can not explain. I have two Ethernet NIC in my machine, and one of them is always off. Recently I was to use it too and noticed it slows down the system (around 5% of CPU time spent in system process). While Process Explorer shows CPU usage in main window, it doesn't report any CPU activity when threads tab is opened for the system process. On the other hand, KernRate reported that intelppm is where vast hits are done (76%). intelppm is a processor driver, and I see no reason for it to be affected by NIC driver.
I tried latest available drivers for my NIC but it didn't solve the problem.
Is there any other way to track down to the NIC driver - just to make sure it is exact place where problem occurs?
Something that has always bugged me in both Process Explorer and in Windows Task Manager: When you first start the programs, the graphs build from zero up to the first data point, even showing a flat-line prior to the first data point.
This is WRONG - if you have no data there should be no line on the graph! Interestingly, the memory graph in Task Manager does it right, only the CPU Usage graph is wrong (at least in Windows 7), but all the graphs are wrong in Process Explorer.
Ideally, the graph background should clearly show when data gathering started, perhaps by not drawing the rulings prior to the first data point.
Sorry to nit-pick though - Process Explorer is a great utility!
My PC had a similar problem like your laptop from time to time. The process “System” was using most of the CPU and I did not know why. That was annoying. I searched the Internet for information to possible causes in the past but found no solution. Today the problem occurred again on my PC and I searched the Internet again. Other than before today I found your article in my search results and directly thought that this article could be helpful because I already read articles from yours in the past that impressed me. Then I read it and to my happiness it really helped me to find the cause of the problem on my PC. I also used Process Explorer based on your article for the search. In my case in the threads page of the properties to the process System was shown that ssrtln.sys was using all of the CPU. Based on this information I then searched the Internet to ssrtln.sys and found further information on the following site:
According to the content of this site ssrtln.sys is an essential part of the DLA boot process created by Sonic. Because I do not use DLA I did not need it. Therefore I followed the steps to deactivate it and after the reboot the problem was not there again. I hope it will stay like this. Otherwise thanks to you and your great tool Process Explorer now I know how I can find out which individual elements are running inside the process System so that I can check again which are using all of the CPU. That is really great. Thank you very much for this article, for your tools and your help in general.
John D. Blue
You are the man!
My Dell laptop has this exact same problem! Thank you for taking the time to report it. I'm going to check the Dell support site to see if there's an updated Broadcom driver.
I ran KernRate and I got the similar results. Using process explorer I can see that I have the same 10.10.0.0 b57nd60x driver as you had in 2008.
I wish more people were are picky about performance as you are!
Results for Kernel Mode:
OutputResults: KernelModuleCount = 175
Percentage in the following table is based on the Total Hits for the Kernel
Time 6762 hits, 25000 events per hit --------
Module Hits msec %Total Events/Sec
ntkrnlpa 4968 11159 73 % 11130029
b57nd60x 976 11160 14 % 2186379
win32k 284 11160 4 % 636200
hal 199 11159 2 % 445828
nvlddmkm 178 11160 2 % 398745
intelppm 68 11160 1 % 152329
dxgkrnl 20 11160 0 % 44802
Ntfs 15 11160 0 % 33602
fanio 11 11160 0 % 24641