Microsoft's official enterprise support blog for AD DS and more
“Ned” the Gnome
Mike here again and in the spirit of Halloween I want to discuss mythical creatures. What would the world be without J.R.R. Tolkien’s idea of smelly, leather-skinned Orcs or Greek Mythology’s gift of Pegasus, the winged stallion? Unfortunately, for each great mythical creature, like giant walking trees (that walk for hours—thank you Kevin Smith), there is a horrendous one. The dreadful creature I want to discuss today is the corrupt user profile.
I absolutely shudder when I hear the words “corrupt profile.” Like Superman, who is defeated by a glowing green rock—the corrupt profile is my kryptonite (Ned’s is the phrase Tips and Tricks). So, the purpose of this blog is to dispel the myth of the corrupt user profile.
Let me start by contradicting myself—there is actually such a thing as a corrupt user profile; however, it is extremely rare. I’ve spent over ten years at Microsoft and I’ve seen two—count them—two actually corrupt user profiles. I’ll identify the “real” corrupt profile later. First, let’s identify what is NOT a corrupt user profile because it’s more prevalent.
Occasionally, users report their profiles not loading, or Windows informs users that it logged the user on with a temporary user profile. It’s rare for Windows to not load a user profile because it is a “corrupt” user profile. Typically, a user profile does not load because:
The most common scenario classified as the mythical corrupt profile is the first, and rightly so because is painfully difficult to diagnose. Configuration is the second most likely scenario that attributes to the mythical corrupt profile. It’s rare to associate unavailable user profiles as corrupt, or scenarios involving the awesome access is denied error message.
Another scenario that perpetuates the corrupted profile myth is one that involves user settings disappearing. It’s unlikely that user settings disappear; it’s more likely the user settings were not saved. A number of scenarios can lead to this possibility.
Most recently, I’ve seen a number of scenarios, mostly with Terminal Servers, where settings do not persist. Our case data show a trend of these scenarios using non-Microsoft profile management software. This software changes how Windows handles the user profile. Typically, these implementations treat the user profile as a local profile and then implements “magical magic” to roam user data back to a central location. This introduces a number of moving parts that must work correctly to ensure user settings are saved. Also, some of these non-Microsoft solutions allow you to partition portions of the user settings that persistent and those that do not. This allows control over which user settings roam through their solution and which settings do not. In these cases, verify the solution, third-party or otherwise, propagated the saved settings. However, this is not a corrupt user profile.
Remember that Windows stores user settings in a registry file. The registry file is the smallest unit of roaming data. That means that Windows roams the entire user hive when the user logs off (or in the background with Windows 7). However, when a user logs on to multiple computers or has multiple sessions, then that user’s settings are only as good as the last session that writes to the central location.
Consider the following scenario. A user has a laptop and frequently uses Terminal Services. The user shares the same profile between these computers. On Friday, the user logs on their laptop—the profile is loaded. After some time, the user makes a Terminal Services connection and begins to work in that session. The user then disconnects the Terminal Services session and goes to lunch. When they return, they change their desktop background on their laptop. The user logs off at the end of the day and their saved user settings roam to the central location. On Monday, the user logs on expecting their new desktop background; however, they receive their old desktop background. You discover that idle Terminal Services sessions are configured to logoff after a preconfigured idle time. The session’s user settings have a later time stamp then the previous and therefore writes last, resulting in the user’s setting appearing as if they did not save. This is another reason why we encourage separate user profiles for Terminal Services. So, add this experience to the list of mythical corrupt profiles.
Another scenario that perpetuates the corrupt profile myth is with misbehaving applications that “magically” work when you delete the user profile. This is not a corrupt user profile. There is a big different between corrupt data and unexpected data. It’s difficult to determine what is wrong in these scenarios.
Clearly it is related to user data because resetting the user data to blank or nothing restores the application’s performance to the expected behavior. These scenarios require a thorough understanding of the application, how it consumes user data, and the upper and lower limits of each setting. Deleting the entire user profile to accommodate a misbehaving application is a quick fix with huge ramifications. The “fix” for one application effectively breaks other applications. Also, deleting the user profile removes stored credentials, keys, and certificates that may be critical to the user.
A better approach is to create a new user and test the application with a new user profile. But deleting a user profile because an application or a feature of an application does not work is overlooking the larger issue. Resist the urge and instead break out Process Monitor, capture registry activity, and reproduce the issue. Inventory the registry keys the applications uses in the user’s hive. Review the values of each of the keys in a working and failing scenario and compare the two. Use the process of elimination to determine the setting and value that is causing the failure.
If time is not on your side and you know deleting the user profile resolves the problem, then create a virtual machine of the problematic computer so you can continue your investigation at a later time. Incorrect data stored in user settings does not make the profile corrupt.
I’ve identified some of the common misconceptions that are associated with the corrupt profile mythology, and there are others. However, these scenarios consistently rise to the top. So, what is a real corrupt profile? I’m glad you asked.
A user profile is a predetermined folder structure and accompanying registry data. Microsoft Windows uses the registry data to describe and preserve the user environment. The folder structure is storage for user and application data, specifically for an individual user. Windows stores the profile on the local hard drive, loads the profile when the user logs on, and unloads the profile when the user logs off.
The preserved data that describes the user’s environment is nothing more than a registry hive. More specifically, the user’s registry portion of the profile is loaded into HKEY_CURRENT_USER. Registry hives, keys, value names, and values are stored in a specific structure that Windows recognizes as the registry. Each element within the structure has its own metadata, such as last write time and security descriptor. All of this information must adhere to the scope and limits of the structure. Consider the following example:
An application saves the position of its window in the user’s settings. Window locations are represented as coordinates on the screen. These coordinates are integer values. Integers are positive or negative values. However, the upper left corner of the screen is typically represented by the coordinate 0, 0. What if another application saved -12 and 0 as this data? Both numbers are valid integers. It meets the structure of a REG_DWORD, which is an integer data type for the registry. Yet, the application does not work correctly when this value is present in the registry. This is not a corrupt profile—its bad data; however, not in the context of the registry or the profile. The registry only cares that the value is within the scope of that data type.
So, an actual corrupt profile is when the structure of the registry hive no longer conforms to the expected structure. I’ve seen this two times in 13 years and in both cases it was not exclusive to the user’s registry. The corruption persisted throughout registry hives and multiple aspects of the computer did not function correctly. In both these cases, new users with new profile as well as existing user with existing profiles experienced the problem. However, it was noticeable that multiple aspects of the computer were behaving poorly. Ultimately, the problem was diagnosed to a non-windows binary. The binary overwrote heap memory that the registry used. The binary modified that data before it was committed to disk. Then, Windows committed modified memory to disk; thereby misaligning the registry structure—which is a real corrupt user profile.
Be wary when you hear a co-worker reporting a corrupt user profile. Ask them if they saw during their most recent snark hunting trip or during their last encounter with a ravenous Bugblatter Beast. More likely—they’ve seen one of the manifestations we’ve described in this post. It’s a difficult and time consuming problem to troubleshoot and resolve. But some additional diligence will surface the real problem.
Mike “The Corrupt Profile Gladiator” Stephens
Very nice post. Although would it be possible for you to further elaborate on the few cases you have run accross? Mostly from a curiousity standpoint, but I would be interested to see more details on what the real "corruption" looked like, how it manifested, and how it was ultimately found.
I'd like to get your opinion on an issue which I don't think you have quite put your finger on with this post. I suspect it may be on the borderline of being a real corruption (albeit, caused by something else?).
As a desktop tech a year or two ago (*shudder*, glad to be back in the systems engineering arena), I had a number of clients who would regularly have the exact same issue as below, primarily on Windows Vista, but occasionally on XP and 7.
The issue, is two fold (forgive me for not having any event logs etc for further perusal, it was a while ago). Firstly, it is reported in the event log that the users "CLASSES" registry hive could not be loaded. Secondly, the registry key "HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders" was actually "corrupted" - which I suspect has some relation to the first part of the issue. You could not even open this key and see the actual registry entries (it wasn't permissions related, either)
Fixing the issue was simple, delete the "Shell Folders" key, and recreate all of it's registry entries. Everything would then come back fine...
Corruption? Or bad data? That's the question.
As the architect for what is now Citrix's Profile Management product I have been working with and writing about user profiles a lot. Regarding the common misconception of corrupt profiles I came to roughly the same conclusion as you, albeit 2.5 years ago. Here is my take on the subject:
I’ve received several requests to elaborate on the scenarios in the article. The plan is to create additional blog posts with each post providing more depth about the scenarios; and hopefully a way to allow our readers walk through the scenarios themselves. My hope is to outline the scenario; provide some background; break it; and then show how we here in support identify it—that’s the plan at least. I have everything done … in my mind—now I just need to write it
I’ll try to dedicate a blog that highlights actual registry corruption (the structure itself); however, that’s actually more difficult to do while making it look like an accident. But, it would be a good post if I can swing the details and implementation.
Sounds good, and thanks for the follow-up! Looking forward to the future posts...
Outstanding post Mike! (and nice Eminem reference) I think as long as there are users and help desk techs this myth will continue.
I have to admit years ago when I did frontline helpdesk support it was an easy thing to say to a VIP user. "Your profile is corrupt". You recreate it and have a script to copy from the old to the new and they are back up and running and you get out of their office as fast as possible.
I like the virtual machine idea; that was not around when I was help desk...would have been nice.
Thanks for the comments. The first issue you mentioned is Windows’ inability to load the user’s CLASSES registry hive. This registry key is backed by the file usrclass.dat. It’s just registry data. The key is HKEY_CURRENT_USER\Software\Classes. The idea behind this is to allow per-user COM registration. For XP and 2003, you can enable USERENV logging to track down while Winlogon is having a problem loading the file – there should be an error message and usually and result code. Other ways to track this down is to use Process Monitor and log the profile load. Filter File and registry events that involve the filename or registry location, respectively. Most of the time that Windows fails to load something is because something else has opened an exclusive handle to the file or registry (95 percent of the time is a file handle). I typically use Process Explorer and search for the process that has a handle to the file (sounds like another blog post). If you’re lucky, the process will be intuitively named to where you can identify the “dood” that isn’t playing nice in the sandbox. Sometimes, the process will be SVCHOST, which now you need to further investigate all the services living in that SVCHOST process. The worst scenario is when the process comes under SYSTEM. The likely of culprits in this scenario is a kernel mode driver that has a handle to the file. Unfortunately, this requires a debugger and copious amounts of free time. However, you could ask the question “What uses a filter driver and constantly looks at files on the operating system?” Antivirus and intrusion protection software are two big ones that come to mind. Uninstall these (disabling does not remove the kernel driver—that why we uninstall) and reproduce the problem. The bad thing with that is the reboot kills your repro.
Second issue – you claim the shell folders keys were corrupt—define corrupt? Shell folder keys are stored in ntuser.dat, not usrclass.dat so, I’m disinclined to believe the two events are related—especially when I’d guess the first problem is a handle problem and not an actual problem with the registry key. Also, I’m a bit confused that the key was impossible to read; however, some way it could be deleted. Typically, you need to be able to read something to delete it. Without data, I’d lean toward something wrote “bad” data to these keys, or they were empty. I’ve seen this before. Again, I turn to trusty Process Monitor to identify the process “that massages” those keys into porridge; and go from there