Blogs

The Case of the Veeerrry Slow Logons

  • Comments 20
  • Likes

This case is my favorite kind of case, one where I use my own tools to solve a problem affecting me personally.  The problem at the root of it is also one you might run into, especially if you travel, and demonstrates the use of some Process Monitor features that many people aren’t aware of, making it an ideal troubleshooting example to document and share.

The story unfolds the week before last when I made a trip to Orlando to speak at Microsoft’s TechEd North America conference. While I was there I began to experience five minute black-screen delays when I logged on to my laptop’s Windows 7 installation:

image

I’d typically chalk up an isolated delay like this to networking issues, common at conferences and with hotel WiFi, but I hit the issue consistently switching between the laptop’s Windows 8 installation, where I was doing testing and presentations, and the Windows 7 installation, where I have my development tools. Being locked out of your computer for that long is annoying to say the least.

The first time I ran into the black screen I forcibly rebooted the system after a couple of minutes because I thought it had hung, but when the delay happened a second time I was forced to wait it out and face the disappointing reality that my system was sick. When I logged off and back on again without a reboot in between, though, I didn’t hit the delay. It only occurred when logging on after a reboot, which I was doing as I switched between Windows 7 and Windows 8. What made the situation especially frustrating was that whenever I rebooted I was always in a hurry to get ready for my next presentation, so had to suffer with the inconvenience for several days before I finally had the opportunity to investigate.

Once I had a few spare moments, I launched Sysinternals Autoruns, an advanced auto-start management utility, to disable any auto-starting images that were located on network shares. I knew from previous executions of Autoruns on the laptop that Microsoft IT configures several scheduled tasks to execute batch files that reside on corporate network shares, so suspected that timeouts trying to launch them were to blame:

image

I logged off and logged back on with fingers crossed, but the delay was still there. Next, I tried logging into a local account to see if this was a machine-wide problem or one affecting just my profile. No delay. That was a positive sign since it meant that whatever the issue was, it would probably be relatively easy to fix once identified.

My goal now was to determine what was holding up the switch to the desktop. I had to somehow get visibility into what was going on during a logon immediately following a boot. The way that immediately jumped to mind as the easiest was to use Sysinternals Process Monitor to capture a trace of the boot process. Process Monitor, a tool that monitors system-wide file system, registry, process, DLL and network operations, has the ability to capture activity from very early in the boot, stopping its capture only when the system shuts down or you run the Process Monitor user interface. I selected the boot logging entry from the Options and opened the boot logging dialog:

SNAGHTML274998d

The dialog lets you direct Process Monitor to collect profiling events while it’s monitoring the boot, which are periodic samples of thread stacks. I enabled one-second profiling, hoping that even if I didn’t spot operations that explained the delay, that I could get a clue from the stacks of the threads that were active just before or during the delay.

After I rebooted, I logged on, waited for five minutes looking at a black screen, then finally got to my desktop, where I ran Process Monitor again and saved the boot log. Instead of scanning the several million events that had been captured, which would have been like looking for a needle in a haystack, I used this Process Monitor filter to look for operations that took more than one second, and hence might have caused the slow down:

SNAGHTML27bf247

Unfortunately, the filter cleared the display, dashing my hopes for quickly finding a clue.

Wondering if perhaps the sequence of processes starting during the logon might reveal something, I opened the Process Tree dialog from the Tools menu. The dialog shows the parent-child relationships of all the processes active during a capture, which in the case of a boot trace means all the processes that executed during the boot and logon process. Focusing my attention on Winlogon.exe, the interactive logon manager, I noticed that a process named Atbroker.exe launched around the time I entered my credentials, and then Userinit.exe executed at the time my desktop finally appeared:

image

The key to the solving the mystery lay in the long pause in between. I knew that Logonui.exe simply displays the logon user interface and that Atbroker.exe is just a helper for transitioning from the logon user interface to a user session, which ruled them out, at least initially. The black screen disappeared when Userinit.exe had started, so Userinit’s parent process, Winlogon.exe, was the remaining suspect. I set a filter to include just events from Winlogon.exe and added the Relative Time column to easily see when events occurred relative to the start of the boot. When I looked at the resulting items I could easily see the delay was actually about six minutes, but there was no activity in that time period to point me at a cause:

image 

Profiling events are excluded by default, so I clicked on the profile event filter button in the toolbar to include them, hoping that they might offer some insight:

image

In order to minimize log file sizes, Process Monitor’s profiling only captures a thread’s stack if the thread has executed since the last time it was sampled. I therefore was expecting to have to look at the thread profile events at the start of the event, but my eye was drawn to a pattern of the same four threads sampled every second throughout the entire black-screen period:

image

I was fairly certain that whatever thread was holding things up had executed some function at the beginning of the interval and was dormant throughout, so was skeptical that any of these active threads were related to the issue, but it was worth spending a few seconds to look at them. I opened the event properties dialog for one of the samples by double-clicking on it and switched to its Stack page, on the off chance that the names of the functions on the stack had an answer.

When I first run Process Monitor on a system I configure it to pull symbols for Windows images from the Microsoft public symbol server using the Debugging Tools for Windows debug engine DLL, so I can see descriptive function names in the stack frames of Windows executables, rather than just file offsets:

SNAGHTML3eeaff2

The first thread’s stack identified the thread as a core Winlogon “state machine” thread waiting for some unknown notification, yielding no clues:

image

The next thread’s stack was just as unenlightening, showing the thread to be a generic worker thread:

image

The stack of the third thread was much more interesting. It was many frames deep, including calls into functions of the Multiple UNC Provider (MUP) and Distributed File System Client (DFSC) drivers, both related to accessing file servers:

image

I scrolled down to see the frames higher on the stack and the name of one of the functions, WLGeneric_ActivationAndNotifyStartShell_Execute, pretty much confirmed the thread to be the one responsible for the problem, since it implied that it was supposed to start the desktop shell:

image

The next frame’s function, WNetRestoreAllConnectionsW, combined with the deeper calls into file server functions, led me to conclude that Winlogon was trying to restore file server drive letter mappings before proceeding to launch my shell and give me access to the desktop. I quickly opened Explorer, recalling that I had two drives mapped to network shares hosted on computers inside the Microsoft network, one to my development system and another to the internal Sysinternals share where I publish pre-release versions of the tools. While at the conference I was not on the intranet, so Winlogon was unable to reconnect them during the logon and was eventually – after many minutes – giving up:

image

Confident I’d solved the mystery, I right-clicked on each share and disconnected it. I rebooted the laptop to verify my fix (workaround to be precise), and to my immense satisfaction, the logon proceeded to the desktop within a few seconds. The case was closed! As for why the delays were unusually long, I haven’t had the time – or  need - to investigate further. The point of this story isn’t to highlight this particular issue, but illustrate the use of the Sysinternals tools and troubleshooting techniques to solve problems.

TechEd Europe, which took place in Amsterdam last week, gave me another chance to reprise the talks I’d given at TechEd US. I delivered the same Case of the Unexplained troubleshooting session I had at TechEd US, but this time I had the pleasure of sharing this very fresh and personal case. You can watch it and my other TechEd sessions either by going to my webcasts page, which lists all of my major sessions posted online, or follow these links directly:

Windows Azure Virtual Machines and Virtual Networks
Windows Azure Internals
Malware Hunting with the Sysinternals Tools
Case of the Unexplained 2012

And you can see all of both event’s sessions online at their webcast sites:

TechEd North America 2012 On-Demand Recordings
TechEd Europe 2012 On-Demand Recordings

I hope you enjoyed this case!

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • It's all very good and the case is interesting. However, aren't there many millions of people with disconnected network drives struggling to logon? Is it that difficult to say "if id doesn't connect in 2 secs, forget it".

  • Does the verbosestatus policy (KB325376) from the 2000 era still work?

  • @WndSks, yes in fact Windows 2008R2 has verbose status on.

    I thought the solution would be to disable active setup and shorten the $mft.

  • "Robert Silver" has a point. I assume a large number of users everyday log into laptops with disconnected network shares. So, I assume by this time, this problem should have been very well known. Are we missing something here?

  • Nice post - it's always nice reading along someone unraveling a mystery.  In your screenshot with the big red "SIX MINUTES" in it, as soon as I saw the "\\redmond\ipc$" activity, I was smiling.  I wouldn't have gotten to the point of looking at thread stacks, but I understand that it's a great opportunity to showcase more of the tool.

    But I get the feeling there's more to this mystery. Isn't Windows capable of loading your shell and attempting to reconnect your network drives at the same time?  When I log in to a PC with unavailable network drives, the GUI loads, and then a few seconds later I get a little tooltip balloon in my tray saying something like "could not reconnect all network drives."

    Is your \\redmond\files\SYSINTERNALS\ share using WebDAV instead of the usual SMB?  I wonder if that might have something to do with the massive timeout...

  • Isn't there a role for Network Location Awareness (NLA) here?

    I could see Winlogon: 1) identifying the network the system is on, 2) comparing the current network to the network that the drives were mapped from, 3) make the connection if equal, 4) dynamically bring up a tunnel to the shares' home network, or leave in a disconnected state until accessed then dynamically bring up a tunnel.

    Suffering a six minute delay when instrumentation is available to avoid the problem seems awkward at best.

    I was wondering about restoring the drive maps asynchronously and letting the shell come up.  There might be profile, folder redirection, and startup program issues with that.   Still those errors could happen for a variety of reasons including being simply disconnected from a net and the shell has to handle that gracefully.

  • I'm not sure why the delays were so abnormally long and haven't had the time - or need - to investigate further. The point is that I solved my immediate problem using Sysinternals tools and troubleshooting techniques.

  • I'm using xbootmgr for this. The generic graph shows en entry "RestoringNetworkconnections" with a start and stop time:

    public.sn2.livefilestore.com/.../Networkconnections.PNG

    Here you can see that it takes 23s.

  • Hi Mark,

    For my users with laptops, I have the following GPO option explicitly set to Disabled:

    Computer/Administrative Templates/System/Logon/Always wait for the network at computer startup and logon

    That setting has kept my users from having similar issues.

  • This is the exact same issue I see when logging on to my work account on my work laptop at the kitchen table while disconnected from work.  It hangs for about five minutes and then finally lets me in.  I have taken note of the suggestion by "eclayp" but this will probably not be implemented because there are probably very few users in my company who use the laptop in that fashion.

  • This is the opposite problem that I have.  My system takes a long time to *shutdown*.  I've tried using Process Monitor to discover the problem...but no luck.  Do you want to make an example of my system Mark?

    Peace,

    Randy

  • @Randy: If you aren't getting anywhere with ProcMon, get a Windows Performance Toolkit (WPT) xbootmgr trace. WPT is in the Win7+.NET 4.0 SDK.

    xbootmgr -shutdown

    Look at the phase graphs.

  • Thanks for posting this!  I've barely just begun to use SysInternals and assumed there was some boot logging function available, so this is exactly what I needed!

  • Thanks Mark,

    slow logins seem to be a "never-ending-story" So this articel is very helpful to get under the hood...

  • hmm... this is known problems with the network shares ... But in my case it was resolved just practically :). And this is the old problem ...