Thoughts from the EPS Windows Server Performance Team
Good day AskPerf! Vinod Rawat here to discuss/demonstrate the benefits of using the Tracelog.exe utility to troubleshoot High CPU issues. Tracelog.exe is a command line utility that allows us to start/stop or enable trace logging, and offers the following features:
NOTE Tracelog.exe is included in the tools\tracing\<Platform> subdirectory of the Windows Driver Kit (WDK), where <Platform> is either i386, amd64, or ia64.
In this post, I am going to talk about troubleshooting a High CPU issue, where no user-mode processes are visibly spiking the CPU. With that, let’s get started!
The first thing you will want to do is download Process Explorer and run it on the affected workstation/server. In this example, we see that Interrupts and its corresponding CPU usage shown below:
You can also add the following Performance Monitor Counters and see the same behavior:
Processor: % Interrupt Time
Processor: %DPC Time
Processor: %Privilege Time
This is where tracelog.exe comes in handy. Open an Elevated Command Prompt, and browse to the folder that includes the tracelog.exe utility (see Note above). During the high interrupt time we can capture DPC/ISR ETL tracing with the following syntax:
tracelog -start -f test.etl -dpcisr -UsePerfCounter -b 64
-start = Starts up the test.etl trace session
-f = Log to file <name>
-dpcisr = Enable kernel events for DPC/ISR analysis
-UsePerfCounter = Use Perf Counter clock
-b 64 = Sets buffer size to <n> Kbytes
NOTE When tracing DPCs and ISRs, always add the -UsePerfCounter parameter to the command. The system timer resolution is too low to measure the time spent in these activities. Tracerpt.exe, the tool that formats DPC/ISR events, requires the performance counter clock values for its reports. (Tracerpt is included in Windows XP and later versions of Windows.)
Let the tracelog.exe run for roughly 30seconds to 1 minute, then stop the tracing with following syntax:
Now we can create a report using the Tracerpt.exe tool with the following command:
tracerpt test.etl -report dpcisr.html -f html
These switches create the report file dpcisr.html in HTML format.
Now comes the fun part (The following is just a descriptive example and what I experienced on my machine).
1. Open the html file you created above
2. Locate the DPC/ISR section in the report:
3. Depending on your scenario, focus on either “Distribution of DPC execution times” or “ISR processor utilization” columns:
4. Sort by the Percent column and you will see the highest consumer (dxgkrnl.sys) at top:
Dxgkrnl.sys is at top taking highest-but-nominal ISR processor time on my machine above (which is obviously normal).
For those who want to go deeper than this: When a culprit driver’s ISR routines are keeping the CPU busy, you can further zoom in and find out the exact range of time these routines are spending in:
Focus on Lower Bound and Upper Bound range and the corresponding Percent. More than 88% (61.66 + 26.59 = 88.25%) count is taking more than 25 microseconds. Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds. ISR routines belonging to the driver above (obscured) have exceeded this value, thus you know how to pin-point the culprit and then check the concerned ISR routines’ time taken.
Any reason to prefer tracelog/tracerpt over xperf?
Just a question, Is there an error in the last paragraph. First you write that more than 88.25% is taking 25 seconds which is really long time and then later microseconds.
Oscar, this was a mistake which has been corrected. Thanks!
If the NT Kernel Logger session is already in use, the StartTrace func returns ERROR_ALREADY_EXISTS