Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
Update: The Active Directory team has released useful guides for troubleshooting slow logon issues:
Emails containing troubleshooting cases keep arriving in my inbox. I’ve received many cases that start with a seemingly unsolvable problem and end a few steps later with a solution or - often just as useful - a workaround. I’ve amassed several hundred such cases that I’ve captured in over 400 PowerPoint slides, giving me great material from which to draw for my blog and the Case of the Unexplained talk series I’ve delivered at a number of major industry conferences
I’m always looking for fresh cases, use of obscure tool features, and unique troubleshooting techniques, so please keep them coming. This time, I’m sharing a fascinating case that highlights two useful techniques: comparing Sysinternals Process Monitor logs from working and problematic systems, and using Sysinternals PsExec to capture activity during a logon.
The case begins when a systems administrator at a large company got multiple end-user complaints that logon was taking over three minutes. The users didn’t encounter any problems once logged on, but the delays were understandably frustrating. Many other users running with the same software configuration weren’t experiencing issues, however. Looking for commonalities, the administrator queried the network configuration database and, sure enough, saw that all the systems with complaints were Dell Precision 670 workstations. He thought he had a major clue until he looked he saw that the systems running without issue included seemingly identical 670 workstations.
Looking for clues more directly, his next step was to try to analyze the logon process of the delayed systems. He used PsExec to run Process Explorer in the Local System account so that it would survive a logoff and be active at the next logon. Because the systems were running Windows XP, the command-line he used was the following (see the end of the post for how to do this on Windows Vista and higher):
psexec –sid c:\sysint\procexp.exe
psexec –sid c:\sysint\procexp.exe
The “-s” directs PsExec to launch the process in the Local System account, “–i” to connect the process with the interactive desktop so that its windows are visible, and “-d” to return immediately instead of waiting for the process to terminate. Note that if you have Fast User Switching enabled and you are not logged into session 0, do not log out, but instead switch users, login to the problematic account, and then switch back to the session from which you started PsExec.
At the subsequent logon, he noticed that Lisa_client_18.104.22.168.exe, the company’s own system inventory line-of-business (LoB) application, consumed CPU for a short time, went idle for three minutes, then exited, after which the logon process would continue as normal:
David Solomon coined a phrase back before Process Monitor replaced Filemon and Regmon that still applies when updated: “When in doubt, run Process Monitor!” (I follow this advice religiously, even having my daughter run Process Monitor when she comes to me with a homework question). This case is a great example of that philosophy put into practice because it seems unlikely on the surface that Process Monitor would reveal the cause for a process hang, but the administrator turned to the tool nonetheless.
After launching Process Monitor with PsExec and capturing a logon trace, he scrolled to the beginning of the captured data and started his analysis. Because of what he saw in Process Explorer, the Lisa_client process was the obvious suspect, so he right-clicked on its process name in one of the trace lines and selected the Include quick-filter menu item to remove from the display entries related to activity from other processes:
When troubleshooting a hang with Process Monitor, you should first see if there are any gaps in operation time stamps that match the hang duration. You can look for lengthy operations by adding the Duration column to the display and then making sure to filter out operations that commonly don’t immediately complete, like directory change notifications. That can be useful when you don’t see a significant time gap between operations because the process has multiple threads, some of which continue to operate while the one causing the hang is dormant.
To his pleasant surprise, he soon found an event that not only preceded a gap of exactly three minutes, but that had an unusual result code, IO DEVICE ERROR:
It appeared that the Lisa_client process performed a SCSI pass-through command to the disk hosting the C: volume that timed-out after three minutes with a hardware error. Wondering what the result of the command was on one of the 670’s that logged on promptly, he captured a trace from one and saw that the corresponding operation took less than a millisecond and was successful:
The evidence clearly pointed at a hardware issue with the disks installed on a subset of the 670’s, so he gathered disk type data from all the 670’s, correlated them with the reports of slow logons, and found that all of the slow systems had Seagate disks and the others had Fujitsu disks.
His company was obviously not going to replace disks just to avoid an issue being caused by its own LoB application, so he had to figure out a workaround. He notified the Lisa_client development team of the issue, who reported that they could remove the command without loss of functionality, but that it would take at least several days for the update to go through their internal release process. Having a few days where system information wouldn’t be collected for a subset of systems was less important than end-user productivity, so in the meantime he wrote a WMI logon script to query the system disk and launch Lisa_client only if it wasn’t a Seagate model.
Without Process Monitor’s help he would have probably determined that the disks were the key hardware difference, but it’s not clear he would have discovered the root cause and been able to work around it rather than resort to replacing disks. This is yet another case solved with the help of Process Monitor and insightful detective work.
In closing, I mentioned that I would provide steps for configuring an application to survive logoff and logon on Windows Vista, Windows Server 2008 and higher. The PsExec command I supplied for Windows XP won’t work on newer operating systems because Windows Vista introduced Session 0 Isolation, requiring a couple of extra commands to make the launched application accessible after a logon. First, start the utility in session 0 with this PsExec command in an elevated command prompt:
psexec –sd –i 0 c:\sysint\procmon.exe
psexec –sd –i 0 c:\sysint\procmon.exe
You’ll see a window titled “Interactive services dialog detection” flash in the taskbar, indicating that a process is running with a window on the hidden session 0 desktop. Click on the taskbar window to restore the notification dialog and then on the Show Me the Message button to switch to that desktop:
The utility you launched will be visible there and you can configure it with desired settings (it’s running in the Local System account so won’t have your own account’s defaults). When done, click on the Return button to get back to the main desktop. You can now logoff and log back on to reproduce the problem you’re investigating. After logging on again, execute the following commands in an elevated command prompt to cause the doorway to the session 0 desktop to reappear:
net stop ui0detect net start ui0detect
net stop ui0detect net start ui0detect
Go back to the session 0 desktop to look at the captured information and close the tool.
One last thing I want to leave you with is a reminder that I’ve documented many other troubleshooting cases in this blog and you can find them in the blog index here. You can also watch recordings of my Case of the Unexplained sessions from TechEd here and be sure to come to TechEd US this June in New Orleans, where I’ll be delivering it again with all new cases.
Cool post. Your blog is great! Thanks alot for sharing :D
First, I've had slow logon problems on multiple computers, and this is the first time I've seen a truly helpful debugging process. The mere command line "psexec -sid" puts me in Nerdvana. Knowing that trick, I might have figured out the subsequent procexp and procmon steps, but it's great to have it all laid out. Now I've got to go fix two friends' laptops that take forever to log in!
BTW, every time your blog appears with a new post in my RSS reader, I eagerly give it my full attention. Every post is pure gold. Keep up the great work, Mark.
i'll let you know about another cause of slow logons in windows 7.
all you have to do is select a solid color for the desktop, in personalize/desktop backgound/solid colors.
logon time will go from 2 or 3 seconds to at least 30 seconds. choosing a picture will restore the 2 second logon time.
i took one of the pictures and edited it, making it a solid color, that's how i got around the issue.
seems strange, but it's completely reproducible.
"psexec –sid c:\sysint\procexp.exe" does not survive a logoff-logon on xp. What am I doing wrong?
Sad fact is Mark, was that I was a Vista TAP tester for that company, and I called out that the Lisa client was not Vista compatible.
(I was also the regional support guy for that tool as well).
"Vista compatibility is on the Lisa roadmap!" the dev team told me.
Looks like it never happened.
Yu're the boss Mark. All your post are really great, I'm learning (and sure I will learn) a lot with this.
Thanks for it!
not 100 % sure but i think this case was already included in one of the "case of unexplained" videos located on http://technet.microsoft.com/en-us/sysinternals/bb963887.aspx
Excellent article!!!! Will use this to debug my logon delays at my PC.
I'm using Windows XP and I'd like to view analyze a login, but I'm a bit lost. Here is what I understand to do...
1) From the command line, go to the folder where you have sysinternals suite extracted to.
2) psexec –sid C:\sysinternals\procexp.exe
3) accept the EULA
4) log off
5) what do I do next?? how do you view the data after logging back in.
Having the same problem as Pheel & Chris. doesn't seem to work on Windows XP
@Dale: the problem was a bad interaction between their internal software and a particular brand of hard drives, on XP. Vista compatibility is utterly irrelevant.
I agree with ethanT above. Keep up the great work! Your stories and insight help hundreds, maybe thousands of IT folks around the world.
Thanks for the feedback! I'm glad to hear the you find the blog interesting and useful.
Mark, could these tools be used to troubleshoot another startup issue. I intermittently get a situation where after logging into the Win2k3 domain, the XP client will hang at "Applying personal settings." Can ping the PC from another PC and even check eventvwr remotely but never get the desktop proper. Only solution is the cold-boot and log in again which always works. Seems to occur on notebooks that connect to different networks.
There is a hotfix available for the Windows 7 solid color wallpaper problem. KB977346