The Case of the Process Startup Delays

The Case of the Process Startup Delays

  • Comments 76

I’ve been extremely busy here at Microsoft and so haven’t had time to blog until now, but plan on getting back to posting regularly. Before I start with a look at a technical problem I ran into recently, I’m pleased to report that the Sysinternals integration is proceeding smoothly and that Bryce and I will unveil an exciting new tool when the site moves to its new home under Microsoft TechNet in late October.

 

I don’t use my laptop much when I’m not traveling, but I occasionally read email on it in the living room. Like most Windows users, I’m frustrated by occasional unexplained delays when I perform routine tasks like start a program or open a web page. Since joining my laptop to an internal Microsoft domain, I began experiencing regular delays when starting processes. With my Sysinternals tools arsenal in hand, I set out to investigate the root cause, suspecting that recently joining the laptop to the Microsoft domain played a role.

 

I began my research by first noticing that, after a delay of a few seconds starting a new process, processes I started within the following 30 seconds launched instantly. I therefore started Process Explorer, waited for 30 seconds, and then executed Notepad from Explorer’s Run dialog. Notepad didn’t appear in Process Explorer’s process tree during the expected delay, which implied the Explorer thread starting Notepad was experiencing the pause, not Notepad’s startup.

 

A look at the stack of the launching Explorer thread might give me a hint about the cause, but I was too impatient to look at each of Explorer’s over a dozen threads and so attached the Windbg debugger from the Microsoft Debugging Tools for Windows to Process Explorer, launched Notepad with Process Explorer’s Run dialog, and broke into the debugger. I opened the thread list in Windbg by selecting Processes and Threads from the View menu, selected the first one displayed, and then revisited the View menu to open the Call Stack dialog:

 

 

 

Stacks display with the most recently called function at the top, so the ZwWaitForSingleObject frame at the top meant that the thread was waiting on some object to become signaled. The stack frames further up the stack are in the RPCRT4 (Remote Procedure Call Runtime Version 4) DLL and the reference to the OpenLpcPort function told me that the thread was trying to initiate a RPC with another process on the same system.  It looked like the wait might be due to the GetMachineAccountSid call highlighted in the screenshot. Just as for domain user accounts, computers belonging to a domain have accounts and GetMachineAccountSid’s name implies that the function returns the Security Identifier (SID) of the computer’s domain account.

 

I set a breakpoint on the return from the call to GetMachineAccountSid in the OpenLpcPort function and after a short pause consistent with the startup delays, the debugger’s command prompt activated. The x86 calling convention is for function return values to be passed in the EAX register so I examined its value:

 

 

After translating the value to decimal with the “?” command I searched for 1789 in the global error definitions file, WinError.h, of the Platform SDK:

 

 

I scoured MSDN documentation and the Web and found essentially no information on the underlying cause for that error code. However, the term “trusted relationship failure” implies that the domain the computer is connecting to doesn’t trust the domain of the computer. But under the circumstances the error didn’t make sense, because I was disconnected from the network and even if the computer was trying to connect a domain, the only one it would connect is the one it belonged to.

 

On a hunch, though, I opened a command prompt and ran PsGetSid to see what error it would get when trying to look up the computer’s domain SID (a computer’s domain account name is the computer name with a “$” appended to it):

 

 

Sure enough, it experienced the same delay, which must be a network timeout, and got the same error. Then I used remote access to connect to the domain and ran the command again:

 

 

 

Further, after connecting to the domain I no longer experienced the startup delays. I disconnected, but continued to have delay-free process launches, even after 30 seconds. After rebooting and not connecting, though, the delays reoccurred.

 

At that point I decided to investigate the internals of GetMachineAccountSid. The stack trace showed that it calls into the Netlogon DLL, which performs its own RPC to a function called NetrLogonGetTrustRid. I knew that the Netlogon service runs inside of the Local Security Authority Subsystem (LSASS):

 

 

I attached Windbg to LSASS and set a breakpoint on NetrLogonGetTrustRid. After launching a new process I hit the breakpoint and saw that if a particular field in a data structure is NULL, Netlogon tries to connect to a domain controller, but if the connection fails for any reason, it blindly returns error 1789. However, when I connected to the domain the call succeeded and the value in the data structure filled in with the SID of the computer account, which persists even after a disconnect. That explained the change in behavior after I connected to the domain.

 

Turning my attention back to GetMachineAccountSid, I found that it caches the results of an error for 30 seconds before asking NetLogon to attempt to connect to a domain controller again. That explained the 30 second quick-start periods. A look through the code flow in the debugger also revealed that OpenLpcPort queries the computer SID as part of a check to see if it matches the SID passed as one of its parameters. If so, OpenLpcPort changes the SID to the SID of the Local System account before calling NtSecureConnectPort, mapping the domain SID to a local one. NtSecureConnectPort takes a SID as a parameter and will only connect to the specified Local Procedure Call (LPC) port if the port was created by the account that matches the SID.

 

I’d answered a number of questions, but the big one remained: why was an RPC happening during a process launch at all? The initial stack trace only went up as far as the NegotiateTransferSyntax frame, but there were obviously other frames that the symbol engine couldn’t determine. The stack display went further when I had hit the breakpoint I set in OpenLpcPort, though:

 

 

Near the bottom you can see the call to ShellExecCmdLine that the CRunDlg class, which is responsible for the Run dialog implementation, calls. That eventually results in what looks like the execution of shell execute hook extensions, and the one that makes the RPC call is implemented by the MpShHook DLL. I didn’t know off hand what that DLL was, but Process Explorer’s DLL view showed that it’s part of Windows Defender:

 

 

I suspected that the hook is part of Windows Defender’s real-time protection, which the Windows Defender team confirmed. Autoruns reports that Windows Defender registers the shell execute hook:

 

 

The mystery was solved!  Putting it all together:

  1. Explorer’s Run dialog calls ShellExecuteCmdLine
  2. ShellExecuteCmdLine calls out to shell execute hooks
  3. Windows Defender’s hook for real-time protection, MpShHook.Dll, calls RPC to communicate with the Windows Defender service, passing the SID of the service as an argument
  4. The RPC library calls GetMachineAccountSid to see if the SID matches the computer’s domain SID, in which case it would map the SID to the local system account SID
  5. GetMachineAccountSid performs an RPC to the Netlogon service to get the computer account’s SID
  6. If the computer account’s SID hasn’t been obtained already, Netlogon tries to connect to a domain controller
  7. If the domain controller connection fails after a timeout (the delay), Netlogon returns a trust-relationship failure error
  8. The Windows Defender RPC proceeds using the unmapped SID
  9. Windows Defender’s service performs real-time checks and then process launches

 

A little more research led me to conclude that the delay only happens under very specific circumstances where:

 

  • The system is running Windows XP 64-bit for x64 or Windows Server 2003 SP1
  • Windows Defender Beta 2 is active
  • The system is domain joined, but has not connected to the domain in the current boot session

32-bit Windows XP doesn’t perform the SID mapping in OpenLpcPort and Windows Defender doesn’t use a shell execute hook on Windows Vista. The Windows Defender team is looking at workarounds for the next release, but now that I understand the delay I can work around it. 

Leave a Comment
  • Please add 3 and 8 and type the answer here:
  • Post
  • Just wanted add -

    The only reason you didn't see MpShHook.dll on the first callstack (before the breakpoint is set) is because WinDBG only shows the first 20 "frames" by default. If you click the "more" button shown in the upper right of your screen shot you'll see more frames and would see the MpShHook. Breakpoints and registers sounds like more fun though! :-P

    Power to the debugger!!
  • Ah, thanks. I never realized that was what the "more" button was for. I haven't done much debugging where the stack was more than 20 frames deep or I needed to look up a stack that far.
  • I had the same delay problems, but i don't have the windows defender installed. I removed the domain integration and everything runs fine now.
  • Good article, but I'm afraid that domain issues when the laptop is not connected are far too common with Windows.  We have a few sales guys that come in the office complaining about slow login times.  And indeed, login times can take 2 to 3 minutes when not on the domain.  Tried many suggestions on the Internet, but I just ended up having them hibernate the laptop to get around the slow login time.

    I can even get it to the point where all the event logs are clean and pristine and the login time is still really long.  Mind you this only happens on a handful of laptops, not all of them.
  • Mark,

    I love your articles! I'm learning so much! It's nice to see you posting again.

    I ran into a similar issue with XP Home locking up on my Dad's pc as soon as he hit his desktop.

    Using lessons (and your tools) I tracked the issue down to the wuauclt.exe process overloading his CPU.

    I've got the troubleshooting experience blogged over at <a href="http://grandstreamdreams.blogspot.com/2006/09/thawing-xp-system.html">Thawing an XP System</a>

    Any chance you could take a look into this issue down the road? Or maybe you've run into it already.  I'm not on the elite level of process thread breakdown like you!

    Cheers!
    --Claus
  • I have learned from your web site zalot. A process context switch involves extracting that address requested from the Kernel Environment protection block and dumping it into the cr3 register of the cpu. The 1024 system file directories that form the mazin system file director has to page from the physcial and the linear. Address space laid out and a thread created. Hoevever, with explorer, the extra attention from the cpu, and subsytem environment only slow that loader that has read the Ds:DX header file so the app, or whater the process ( and relational dll) must have the DLL export the functions that the executable imported. Because of OLE 2, macro and stdin recordings, further user input may divert kernel code. Perhaps the symbolic files that represent the process for pdb info could load into the WinDbg. Changes in functionlity may also involve those cpu instensive sniffers, newtork traversal tools. Or it had an updated hotfix and the function signature differs and therefore does not respond to the API calling it. The, perhaps USER API calls the User32.dll, but is rerouted and packaged as an LPC and routed to the csrss.exe for processing ( before the overhead issue).
  • I too have noticed this issue when not connected to the domain.  I am not using Defender here either and this is on XP Pro 32-bit.

    I have never actually tried to pinpoint the problem because there are just too many things running related to A/V, VPN, etc, but as a general rule, at least in my situation, a reboot and login with cached credentials typically fixes it up for me.

    This seems to happen more often when I'm logged into the domain at work, then hibernate, then power on and connect to my home network (from hibernate state).

    Nice article though..easy to follow and understand.  I firmly believe Microsoft will only benefit by having your skills supporting them behind the scenes.
  • Nice article... but the problem is far to common. I don't think there is currently a reliable way to determine DC connectivity for netlogon (or any of the components involved) to behave any differently.

    I hope there is a non-intrusive fix to this. thoughts?
  • Thank you Mark for this article, and any other you share with us too. Regards from Poland.

    g-n-d.net
  • PingBack from http://stuart.amanzi.co.nz/2006/09/04/some-new-feeds-for-feeddemon/
  • Why does Windows Defender hook into ShellExecute to monitor starting processes, rather than CreateProcess?
  • Intersting, your RSS link worked fine in Outlook 2007, that is until you moved your blog to Microsft servers.  When I try to add your new RSS feed outlook complains that it is invalid.

    Otherwise, as always you insights are useful and instructive.
  • @ nobody

    "Why does Windows Defender hook into ShellExecute to monitor starting processes, rather than CreateProcess?"

    My guess is because ShellExecute opens files whether they're executables, documents, or folders.  Defender sees more by hooking ShellExecute than CreateProcess.
  • I have seen this same process start delay on the Tablet PC edition of windows.  killing windows defender resolves the performance problem for me.  
  • i see the same thing on my laptop too
    XP PRo 32, NOD32 and SpySweeper loaded.

    when not on the DOMAIN, i need to delete any unconnected resources too.
    I use batch files for that
Page 3 of 6 (76 items) 12345»