There are a few places (including Windows Internals) which loosely describe the “Last Known Good” feature of Windows, but as I wasn’t familiar with the specifics I decided to dig into it a bit with a Windows Server 2008R2 SP1 virtual machine in my lab environment.
Here is what I found, not relying on source code but using tools available to anyone – Registry Editor & Process Monitor…
When Windows boots normally, it looks at the following registry key: HKEY_LOCAL_MACHINE\SYSTEM\Select
Under this key are 4 REG_DWORD values; Current, Default, Failed, LastKnownGood.
Each of thsese numbers reference a ControlSet00x key under HKEY_LOCAL_MACHINE\SYSTEM:
Current is the one used to boot the system this time round (unless it differs from Default, indicating a system change was made before the last restart).
Default is the one to use for the next boot (i.e. when LastKnownGood needs updating).
Failed is the last one which was Current when "Last Known Good" was selected at boot.
LastKnownGood indicates the one which was known to let the system boot correctly.
To see how these work, here are the values in different scenarios...
Starting with a typical system in the running state, we have:
Registry keys under SYSTEM: ControlSet001 ControlSet002
Registry values under Select: Current = 1 Default = 1 Failed = 0 LastKnownGood = 2
If the system unexpectedly reboots (power outage, bugcheck, hardware issue, etc.) then on booting the boot manager will wait for 30 seconds before booting normally. This is to give the user a chance to select Safe Mode or one of its variants.
The registry values under Select are unchanged in this case. This is because the problem is unlikely to be due to a system change (driver installation) made by the user.
As it could be a driver update which has rendered the system unstable, this is why Safe Mode exists – Last Known Good does not help in this scenario.
If we shut down the system and restart it, then hit F8 after the POST to get the advanced boot menu up, and select "Last Known Good", here is what happens…
A new ControlSet00x key is cloned from the LastKnownGood key, where x is the first unused number (3, copied from 2). If Failed is non-zero, the ControlSet00x key indicated is deleted (nothing to do on the first failure). Failed is set to the value of Current (1). Current and Default are set to the value of LastKnownGood (2). LastKnownGood is set to the value of the newly-created key (3).
The registry now looks like this:
Registry keys under SYSTEM: ControlSet001 ControlSet002 ControlSet003
Registry values under Select: Current = 2 Default = 2 Failed = 1 LastKnownGood = 3
Repeating the previous scenario, we can see how the results differ…
A new ControlSet00x key is cloned from the LastKnownGood key, where x is the first unused number (4, copied from 3). If Failed is non-zero, the ControlSet00x key indicated is deleted (1). Failed is set to the value of Current (2). Current and Default are set to the value of LastKnownGood (3). LastKnownGood is set to the value of the newly-created key (4).
Now let’s see what the registry contains:
Registry keys under SYSTEM: ControlSet002 ControlSet003 ControlSet004
Registry value under Select: Current = 3 Default = 3 Failed = 2 LastKnownGood = 4
For good measure, one more repeat to show how the numbers are re-used…
A new ControlSet00x key is cloned from the LastKnownGood key, where x is the first unused number (1, copied from 4). If Failed is non-zero, the ControlSet00x key indicated is deleted (2). Failed is set to the value of Current (3). Current is set to the value of LastKnownGood (4). LastKnownGood is set to the value of the newly-created key (1).
A summary of the registry keys & values again:
Registry keys under SYSTEM: ControlSet001 ControlSet003 ControlSet004
Registry value under Select: Current = 4 Default = 4 Failed = 3 LastKnownGood = 1
The previous scenarios have shows when LastKnownGood is used for recovery, but when it is updated – i.e. when do we consider the currently-used control set to be “good” so it can be used to restore stability?
When an administrator installs a device driver and a reboot is triggered, Default is set to the value of LastKnownGood. On rebooting, Default is different from Current and so this ControlSet00x key is cloned to the first unused number. Then, the original ControlSet00x key indicated by Current is deleted.
LastKnownGood is not updated until a successful user logon has taken place – at which time it is updated to point to the newly-created key. This is to ensure that the system really is bootable and is not hung during startup, or a key service is failing to start which is required for a user to logon.
Windows will automatically boot into the Window Recovery Environment (WinRE) to allow the user to fix the issue.
Yes, I tested this by manually hacking Current to see what happened on a reboot :)
The Services key resides under HKLM\SYSTEM\ControlSet00x, and the configuration data for each service is under its own sub-key – so if you make a change to a service startup, for example, this won’t be present in the “other” control set and will get reverted if you invoke “Last Known Good”.
The same is true for any service you install or remove – the key will be missing or present accordingly in the other control set.
So if you install some software or make some system configuration change which is successful, then accidentally invoke Last Known Good somewhere down the line before any other change is made, be aware that it may be necessary to set Current & Default to the value of Failed, set Failed to 0, then LastKnownGood to the previous value of Current in order to recover quickly (or do a system state restore).