One of the other cool new logs, as it relates to servicing, is the new sessions.xml log. Located in the \Windows\servicing\sessions directory, its a log of all of the different transactions that happen on the machine from a servicing perspective. A small snippet of a log is seen below:
- <Session version="1.0" id="30018265_436341867" client="WindowsUpdateAgent" options="816" currentPhase="1" lastSuccessfulState="Complete" pendingFollower="false" retry="true" Queued="2009/07/22/10:31:30" Started="2009/07/22/10:31:32" Complete="2009/07/22/10:31:45" status="0x0">
- <Phase seq="1">
<package id="Package_for_KB972636~31bf3856ad364e35~amd64~~220.127.116.11" name="KB972636" targetState="Installed" options="4" />
- <Phase seq="1" rebootRequired="false" Resolved="2009/07/22/10:31:38" Staged="2009/07/22/10:31:39" Installed="2009/07/22/10:31:44">
<Resolve package="Package_1_for_KB972636~31bf3856ad364e35~amd64~~18.104.22.168" update="972636-1_neutral_LDR" />
<Stage package="Package_1_for_KB972636~31bf3856ad364e35~amd64~~22.214.171.124" update="972636-1_neutral_LDR" />
<Install package="Package_1_for_KB972636~31bf3856ad364e35~amd64~~126.96.36.199" update="972636-1_neutral_LDR" />
Because this is log is in XML, you can collapse all but the transactions that you are interested in seeing. The one thing I like about this log is that it tells you exactly what is happening with each package in a particular fix or update.
For example, in the sample above you can see that KB972636 was installed on this machine recently. It was installed by the WindowsUpdate Agent and it did not require a reboot. This is all really good information to know when trying to troubleshoot an issue with servicing in Win7. Using this, you might be able to tell if a particular package didn’t get staged properly. Or, if a reboot was required and you’re in a reboot loop issue, then you know that you can use the new DISM /revertpendingactions flag and roll yourself back.
During startup or shutdown, when a servicing operation is in progress, the system displays "Installing update X of Y". This always leaves me wondering - "what are these updates?". It would be nice if this sessions.xml data could be piped to the screen in some appropriate format, so the user might have some understanding of what updates were being installed, return codes and reboot required status, etc.
There is a Group Policy to enable verbose startup, shutdown, logon and logoff messages:
My request is there be an equivalent policy for servicing messages. Example name:
Having this extra log file is great, if you know it exists, but the servicing summaries in this xml file could be at least as useful if they were displayed during or at end of a servicing operation. At the very least, having something to read on the screen might decrease the subjective time taken to install.
Thanks for the idea Drew. I like that. I'll see what I can about getting something like that on the radar.
Thanks Joseph. IMO, verbose messaging (of any type), should be enabled by default during any Setup pass, and on any boot cycle that an Administrator is auto-logged-on. Except for the Home SKUs perhaps.
I agree COMPLETELY with the above pipe-to-screen suggestion! The trend of ever-increasing difficulty in diagnosing Windows is creating a bit of a disturbing "gap" of people that would otherwise be interested in understanding Windows internals, but can't find the right place to look (and the event viewer has gotten to be a kludgey,slow-loading nightmare of a last-resort tool)! With the log files and status information tucked away neatly into a distant corner of the file system, it's a bit of a steep learning curve for newcomers to really understand how to fix some pretty basic Windows problems.
I've tried to help whereever I can, by typing out full error numbers and unique problem keywords in my online posts to help search engines out, but a good 95% of the things I search for regarding low-level Windows diagnostics come up totally blank, or with just a raw listing of all possible error codes (and no extra info other than what I already had). It's gotten to be quite frustrating. For example, at the moment I'm dealing with a computer where I was diagnosing a hung startup, found an HP disk LowerFilter service causing the issue (didn't realize it was a LowerFilter until it was too late, though), and disabled the service. The boot hang turned into a 0x7B bootloop, and I spent the next 6 hours diagnosing and trying to fix it. Unfortunately, the diagnostic involved running a DaRT 6.5 SFC scan, which complained that there were pending operations requiring a reboot (which, obviously, couldn't happen).
Here's where servicing issues come into play. I didn't know how to revert pending actions (with that literal switch on DISM), so I did what I usually do to resolve a "pending actions" problem: I deleted "pending.xml". I knew it would cause issues, but I also knew I could rely on CheckSUR (update readiness tool) to repair the component store issues caused by the mixed-state. Deleting pending.xml made SFC happy, it ran the scan, and proposed a short list of "corrupted" files to be repaired. OK, I let it "do its thing". It didn't fix the issue (obviously).
Fast forward a couple hours, I get the issue resolved by finally understanding the boot-time relationship between ntdetect, Services, CriticalDeviceDatabase, and Control\Class, and found that with the HP disk filter service disabled but not removed as a LowerFilter, the "disk" class can't load, and the computer doesn't boot. OK, Windows is running happy again. I run CheckSUR and it, well... it does its long-running silent operation (another case of the "PLEASE! WHY aren't you telling me what you're doing?!" epidemic of the Vista/7 era). It leaves me with "installation complete" and I push onward. I have a few updates so I try to install them. Failed, immediately. Huh? Well... dig a little deeper, and I get this:
(on something like "dism /online /remove-feature:TabletPCOC"...)
The specified image is no longer serviceable.
Unmount the image and discard your changes. Mount the original image to try the operation again.
Um... dewd, what? This isn't a mounted WIM image, this is the, ah... this is the Windows image you're actually running from, like, with that "/online" switch. It's you, dude. You're saying, you're not going to like, ever let me "service" you again? That's it, game over, the end? That's, I dunno, a little too digital for my comfort level... but that's all it's been, no matter what I do. The COMPONENTS registry hive is far too complex for me to understand (finally having grasped CLSID and file type interactions, among others), so that's kinda off-limits. An earlier log entry in CBS.LOG indicated that the Windows 7 Service Pack 1 (referenced by its KB number) couldn't be verified as fully intact, since it couldn't find a MUM file somewhere (I absentmindedly deleted the CBS log to isolate new entries, so those are gone), but without being able to apply or remove ANY updates, I'm stuck in a completely locked state.
This is where a little more verbosity and a little less "noise" in the log files (like all the successful "provider" events, and OH GOD PLEASE allow us to disable the excess logging of Service Control Manager start/stop events in the system Event Log!) would come in very handy! I'm left with only one option on this PC: in-place upgrade, or full reinstall! :(
Thought I'd share this little story as a little real-world "low level diagnostic" insight... only hope is that it'll help improve the MS products we all work with every day :)
(retrying this post; post didn't appear the first time and I wasn't signed-in, so hopefully this doesn't post twice!)
@Falcon; Thanks for the comment. I understand your point and it's a delicate balance between giving users such as yourself the information you want to know regarding things going on 'under the covers' and the masses who really just want their stuff to work without too much of a hassle. We do what we can.
I'm curious what knowing exaclty what each of these components does really does for you though. For example, 0x7B's have been well documented and have (what I feel) is a very good verbiage attached to its output, ie. INACCESSIBLE_BOOT_DEVICE. That might not tell you exactly what's happening on the machine but it does give you an indication where the problem is. It's something with the bootable portion of the OS. A quick Bing or Google search would yield you several articles and resolutions for those types of issues.
So, are you looking for better error handling for the actual return codes, better verbosity in the return codes themselves or something else?
Joseph: Hey, thanks for the reply! One of these rare, golden occasions when I can actually share real experience with someone that understands what I'm talking about :)
Well, it's not exactly about wanting to know what each of the components *does* (although that may be useful in some unforeseen circumstances, like getting a "unservicable image" error on a live Windows install!)... but rather, I'd just like to see the "user-facing" services do a better job of communicating status information with the "system-facing" utilities they control - it seems like most Windows utilities have a sort of "multiple layer" obscurity, where the utility functions are being overly boiled-down into function calls that can't easily report their status through the "blocking barrier" - the utility is relying on the function call to block until it completes, so the only recourse the utility has is to sit there and play "hold music" (like "Please wait" or "Installing update 1 of 59") and wait for that function call to return. Meanwhile, the utility library that was called has its own independent log file located somewhere on the disk. Problem is, there's little to no communication between the utility program and the function call performing the actual "duty", other than just a simple return value indicating success (0x0) or failure (!= 0x0). End result? Users and techs left out in the cold when something goes wrong. :/
In the case of the 0x7B, you're right, it's definitely one of the most well-documented errors in Windows history. But that's what just kept driving me on towards resolution instead of just "nuking it": I knew how it should work, what happens in that phase of the boot process, and it was just hurting my head that everything was in place and it was still BSOD'ing! Really, I was just using that as a "this is why I did what I did" example, not an issue with the 0x7B error itself - nothing MS could have done, other than handing-off the "Oh, this is the driver I couldn't actually find, by the way" error condition to the BSOD routine, could have made that "fix" go any quicker. I don't blame Windows for that - I blame HP for that dang filter driver conflict. ;)
What I meant there was that, in trying to do a system file check scan (sfc /scannow /offbootdir=c:\ /offwindir=c:\), I got an error saying there were "pending operations". OK, that's specific enough, but it offers no resolve and no alternatives - no way to tell SFC that "um, but I can't reboot, that's why I'm running SFC offline!". It's a case of the "function call" disability again: the only thing SFC knows is that "I couldn't initialize the scanner, and it gave me this error code, which means this". It couldn't tell me anything specific, because the return value of the function call is just a simple 32-bit number - no room for a file name or for the function call to return the error it actually logged (which is usually much more useful). As a result, I ended up making the horribly poor decision to out-and-out delete WinSxS\pending.xml (or was it reboot.xml? can't remember which), just to get SFC to work. If I had known about DISM's "/RevertPendingActions" switch, I would certainly have used it instead, and I would (still) have a fully functional Windows install... but because it didn't provide me with context-appropriate ("I'm using this tool because the computer won't boot, why are you telling me that you won't run until I reboot?") error information, I ended up totally botching the thing.
It's a harmless learning experience for me, sure. The PC was just being polished up after a (freshly restored and updated) Vista 64 to Windows 7 64 upgrade, and I wanted to update the drivers and any useful OEM software before sending it "back into the wild". No loss, especially since the PC still boots Windows and runs great - it'll just never be able to update itself again (and that's unacceptable to me), hence now needing a reinstall. However, I'm concerned about the thousands of PCs a day that are "nuked" unnecessarily for problems just like this that could have been solved if Windows would have provided more useful failure information. The KSOD (black screen of death) is a prime example: it manifests itself as an indefinite "hang" just before the Welcome screen fades in - during that time where boot-time "pending" actions are performed with *absolutely no* status information whatsoever displayed on the screen! Most of these end up getting nuked because nobody knows how to really fix it (the "rpcss" fix frequently touted as "the fix!" has never once been applicable in my experience). The whole component-based servicing model of Windows Vista/7 is just so opaque to most techs I know of (all, actually - I'm the only person I know of that's ever dug this far into it), that when something goes wrong with it, "just nuke it".
That's what I would like to see improved on, really... just make CBS a little more... Window-ey - transparent, that is - and less "frosted-glass fruit logo" - or, opaque :) It'd make repair work a lot less destructive for most people, and I'd really feel a lot better about being in control of my PC again!
Oh, and sorry for the wall of text replies! Guess I just have a lot to explain, and I just like technical writing ;)
Cool, thanks for the response.
I see what you're getting at now, and I actually agree with a lot of your points. Agreeing with them and getting them implemented in product is a little bit of a different matter though <G>. Suffice it to say I will make sure the product teams see your feedback and that your voice is heard.
THANKS!! Means more to me than I can really even express... more often than I care to think about, detailed code-level product feedback (from myself and others) to software companies (of any kind) is idly dismissed as a "support inquiry" and thrown away without ever being seen by a developer - it's really a high point in the day to just read the words "I will make sure the product teams see your feedback and that your voice is heard"! Certainly the best response I've ever received from someone at Microsoft. Thanks again!
Have a good week/weekend.
@Falcon4: Interesting comments.
This is my 64 bits worth regarding how to implement piping-system-logs-to-displays.
There is ideally a way to display system update type data to those users/admins who would or could benefit from seeing it, without annoying or distracting or concerning 'ordinary' users. Maybe this is it: Display the data using an On-Screen-Data look. Like the OSD display you see when you press your monitors menu button. This would imply to the user@machine that something 'techy' was happening - especially because, other than the on-screen text, everything else is dulled (as in UAC prompt reduced brightness snapshot). The OSD text might appear as such...
Windows Updating in progress... (big bold text across top of screen)
User input is disabled (smaller)
Session log details:
Ordinary user will look away or go for coffee break. IT pro will sit there transfixed <G>. It would be fairly obvious to anyone that now would not be a good time to restart or shutdown the machine, or try to use it. The OSD text, used sparingly, is unusual enough that user does not ignore it like they ignore most popups - so potentially it has more 'authority' and the user just accepts that something important has to happen. That's the theory anyway.
There is a command line tool called Comandiux that can do what i'm talking about, using RTF formatted text as a parameter. comandiux.scot.sk/news.php Scroll to 'On Screen display OSD:' for examples.