This is another 100% CPU issue. This time I’m not working on any customer case as the issue is happening on my own Windows XP SP2 machine. So I resolved trying to troubleshoot that just for fun J.
First, let me explain the scenario:
Random interactive applications, like Outlook, Word, Excel, Explorer, Internet Explorer etc, were unexpectedly consuming 100% CPU. Once they started eating up all the CPU they get completely hung and do not stop using all the CPU until I actually kill their processes. Some times after killing their process and restarting the application the issue was gone, some times the issue comes back right after the application restarts and I need to kill it again…
Ok, I made my part playing as an end user by just killing the processes for some time but it started buzzing me as it was becoming more frequent. So now it’s time have some fun and go to the core of the issue.
All I need to do is follow the steps in the post Lab: Win32 Application Causing 100% Condition. So let’s go start doing that:
Once the problem happened next time, it was in the infopath.exe application, I attached the Windbg (from the Debugging Tools For Windows) and executed the command below in order to find out which thread I should look at:
Ok, so now I know the thread 0 is the one I’m looking for since it’s the one who’s been consumed the CPU for the last 40 seconds. You might be asking “what if there were other threads also consuming the CPU?” Well, for this specific issue I knew we would find only one single thread since my machine is a dual CPU and whenever the problem happens the app takes only one of the CPUs.
With that said, let’s take a look at the call stack of the thread 0:
Ok, so we don’t have that much… the return address is 0x0 and ebp register is 0x0 so since we don’t have the proper symbols to work with this I guess we won’t be able to rebuild the stack. So we will have even more fun diving in the ASM from now on!!!
We already know the application nview.dll is the one loaded in the call stack. Let’s see what is it doing or trying to do. Our current instruction pointer tells us this:
So all we need to do to follow the assembler execution path is to get the current CPU registers value and start reproducing the effects the real instructions would have. The current value of the CPU registers is below:
Notice that both EAX and EDX (the two registers used by the first operation) are 0x0. So the first instruction mov eax, edx won’t change anything. It will make a copy of the EDX register to the EAX but since they are both 0x0 this will result in no changes at our registers. Let’s take a look at the subsequent instructions and their effects:
and eax, 01000000h
It will make an AND logical operation between EAX and the value 0x1000000 and load the result at the EAX register. Again, this won’t change the EAX value since 0x0 AND 0x1000000 results in 0x0 which is already the current EAX value.
It will make an Exclusive-OR operation, or XOR, between the stack base pointer EBP and itself and load the result at the EBP (which is the destination operand on this operation). Needless to say this would load EBP with 0x0, however as we can notice below from the current register values, EBP is already 0x0:
It will make an OR logical operation between EAX and EBP and load the result at the EAX. We already know this will be the equivalent of making 0x0 OR 0x0 which will result in guess what… Exactly! – It will result in 0x0 which doesn’t change the original value of EAX.
je nview!NVLoadDatabase+0xc8a (100020ea)
This will cause the instruction pointer EIP to “jump” to 100020ea just in case the operands of the previous OR operation were equal in value. Well, EAX is indeed equal, in value, to EBP since they’re both 0x0. So the EIP is loaded with the value 100020ea and our assembly execution path is properly changed as below:
So, let’s continue from the instruction located at 100020ea.
It will make a shift left on the EDX value of 8 positions. EDX is a 32-bit representation of the value 0x0, so shifting it 8 positions wouldn’t change anything. EDX remains 0x0 after the operation as we can see below:
This one we already know. It won’t change anything as EAX = EDX = 0x0.
It will a shift right on the EAX of 8 positions. As we’ve seen before with the EDX and the shift left operation, this won’t change anything as EAX = 0x0.
Finally EAX changed its original value of 0x0 to the result of the sum between itself and ESI register. ESI = 0003fff8 and so EAX will from now on be EAX = 0003fff8 as can see below:
Again, a shift righ operation on the EDX with 31 positions. EDX will remain with 0x0 after that as we’ve seen before.
This will convert the value of EAX to a 64-bit value. But it will keep it original value of 0003fff8.
mov eax, esi
It will copy the value of ESI to EAX, however as we noticed before EAX was made equal to ESI by the operation add eax, esi previously executed (since EAX was 0x0 by that time). So this operation doesn’t change anything either.
This will load EAX with the value resultant from the operation ECX + ESI. So let’s make a little math here… ECX + ESI = 00c90000 + 0003fff8 = 00ccfff8. So EAX will be from now on = 00ccfff8 as we see below:
cmp eax, edi
This will compare the values of EAX and EDI and change the proper flags accordingly.
mov dword ptr [esp+14h],edx
This will copy the EDX value to the memory location referred by the stack pointer plus 20 bytes, or in other words, this will copy EDX value to 20 bytes down on the stack. So, doing a little math again we will have that such a position would be the ESP = 0012c6c0 plus 20 bytes, so 0012c6c0 + 0x14 = 0012c6d4. Let’s see what is current content of 0012c6d4:
All right, so it’s currently 0x0 and we are copying the value of EDX which is also 0x0 to that location in the memory. So, we won’t actually change anything here. Let’s move on.
jb nview!NVLoadDatabase+0xc60 (100020c0)
This will make our execution path jumping to 100020c0 just in case the result of the operation cmp previously executed, tells us the value of EAX is lower (or below) EDI. By the time the cpm operation was executed, EAX was 00xxfff8 and EDI was 00cd0000 so EAX is indeed below EDI and will therefore jump in the execution path as below:
So we moved back some bytes on our execution path. Let’s keep moving…
mov edx, dword prt [eax]
It will copy the value referred by EAX to the register EDX. The value currently referred by EAX is the following:
So we’re just copying 0x0 to EDX which is already 0x0, so it doesn’t change our register’s value.
mov eax, dword ptr [eax+4]
This will copy the value referred by the EAX plus 4 bytes to EAX. The valued pointed by EAX + 4 is the value pointed by 00ccfffc which is 0x0. So now we’re basically making EAX = 0x0 again – As we had in the very beginning of our execution path.
mov dword ptr [esp+14h],eax
This will just copy the EAX value to a specific position on the stack (20 bytes under the top pointed by the ESP). So it doesn’t change any register and it actually doesn’t change the stack value either since EAX is currently 0x0 once again and we had already saved the value 0x0 at that same position on the stack when executed the instruction located at 100020ff.
Now hold on: The next instruction is exactly the one we executed in the very first step of all this investigation. And guess what? The registers and the stack have exactly the same values as we had in the very beginning. So we don’t need to move forward on this to conclude that we will end up again in the same place, and again, and again, and again… indefinitely! This turns out to be an infinite loop!!
This explains why this thread is using 100% of our CPU and concludes our investigation. Unfortunately this is something we can’t fix since it’s happening within the nview.dll’s code and this is not a MS product.
NVIEW.DLL is a NVidia module apparently used by the NVidia Desktop Manager software. I’ve done some research through the NVidia web site and I haven’t found any reference to this so I believe there is no definitive solution for this yet. As a workaround I simply disabled the Desktop Manager feature and everything seems to be fine now.
Great post, like usual you provide all the details to make it easy to follow.
even 4 y after .. you can guess?-- that bug still excists..
landed here after all .. Thx a lot Marcelo Fartura..
You saved me an long eve with Maths :)
Yes - same problem - happening on a Win2003 machine. Putting up with this for the past 3 or so years. NVidia says I have the latest drivers. :(
THANK YOU. I have been having this issue for YEARS now and thought I had a trojan or virus or something. And even though the last bit of programming I studied was turbo pascal back around 1986, I actually followed that explanation well enough to understand what was probably going on by the time I read:
It will make a shift left on the EDX value of 8 positions. EDX is a 32-bit representation of the value 0x0, so shifting it 8 positions wouldn’t change anything. EDX remains 0x0 after the operation..."
Actually started to LOLwhen I got to "Now hold on..." Well done.
Can't believe this hasn't been remedied by Nvidia... Looks like I'll go with ATI instead next time...