Hi Folks. Lakshman Hariharan here again with another real world example. Feedback from the previous post I wrote indicated that readers found value in the methodology and techniques we use to isolate and troubleshoot issues that appear somewhat complex and hard to pin down at first glance. I am hoping to do a series of such real world examples which use tools available to the general public for download, some well-known and some not so much. Meaning the larger goal of this series of posts being that no special (internal Microsoft) tools or other special skills (such as debug) are required to track down many of these issues.

For the purposes of this post we will use three specific utilities. PsGetsid.exe, klist.exe and nltest.exe, all of which have been available for a very long time.

So let us get down to brass tacks. What issue am I going to discuss in this post? As it often happens at the start of a Monday, I received an email that one of my customers’ homegrown .Net application has started throwing a certain error about being unable to resolve Security Identifiers (or SIDs for short) to names. The customer performed initial troubleshooting on their own and arrived at the conclusion that the issue they are observing is the exact same one described in the following knowledge base article.

SID S-1-18-1 and SID S-1-18-2 can't be mapped on Windows 7 or Windows Server 2008 R2-based computers in a domain environment
http://support.microsoft.com/kb/2830145

The crux of the issue is that Windows Server 2012 (and above) introduce two new SIDs. The problem is that Windows 7 and Windows Server 2008 R2 clients do not know about these SIDs because when they (Windows 7 and 2008 R2) were written these particular SIDs didn’t exist.

So far so good. Except, there was a catch in this situation. To the customer's great credit, in anticipation of this issue the hotfix in the article above was proactively deployed on all Windows 7 clients a few months ago. And therein lies the catch and conundrum. So what went wrong? A few questions arose out of this predicament we found ourselves in.

1. How do we confirm that the clients indeed have the hotfix?

2. If the clients already had the hotfix deployed then why were they seeing the error?

3. Why does the issue only happen sporadically on the same clients?

4. The deployment of Windows Server 2012 R2 domain controllers months ago. So why is it now that these problems were being reported?

Let’s answer these questions one at a time.

1. How do we confirm the clients indeed have the hotfix installed?

The answer to this is quite simple, as laid out in the article. If the clients have the hotfix installed then the output of the PSGetsid.exe tool should indicate a successful resolution of the newly introduced SIDs to names. So, if the output is the screenshot below then the clients are not necessarily experiencing the specific issue outlined in the article above.

image

If you receive the output in the following screenshot then the clients do not have the hotfix and you need to install it on them.

image

We executed the command from a few machines and received the output in the first screenshot.

2. The clients already had the hotfix so why do they receive the error when launching the application?

So we located a user currently reporting the issue and started a remote troubleshooting session via Lync screen sharing. Once we started the remote session it became apparent that the users reporting the issue left out one critical piece of information. That they weren’t launching the application from their Windows 7 machines clients as a “fat client” application.

This application was published on a farm of Citrix servers. So in essence the “client” in this case was the Citrix server and not the Windows 7 clients. This information brought an additional challenge or two. First of which being what Citrix server the client was using when the issue is reported. Because as you recall, the application is published on a farm of Citrix servers. This question itself is easy to answer because the good folks at Citrix have included as part of the ICA client a Connection Center that displays what Citrix server the client is using at that point, as shown in the screenshot below with the actual server name rendered illegible, for obvious reasons.

As a note, Citrix is just being used here in keeping with the spirit of the example being real world because that is what was being used at this customer. The issue could just as easily have manifested itself with Remote Desktop Services (RDS) if that is how the application were published.

image

3. Why is the issue only being reported sporadically?

Once we have the name of the Citrix server that the client is using at that point, we then go about figuring what domain controller the Citrix server, which for the purposes of this post we will call CONTOSOCITRIX1.

For that we will use another common tool, nltest.exe. This tool, as the name suggests is used to query and manipulate the Netlogon service.

The switch we are using in this example below is dsgetdc which instructs the Netlogon service to call the exact API (DsGetDcName) that clients use to locate domain controllers.

As an aside, I commonly observe many users execute the set l command and the resulting LOGONSERVER variable returned to find what domain controller authenticated them. My personal opinion based on empirical evidence is that the nltest.exe output provides more reliable information than the set.exe command for this purpose.

On one of the clients we ran the nltest.exe command with the /force switch repeatedly and launch the application published on CONTOSOCITRIX1 after every execution of the command. What we observed was that whenever CONTOSODC2 was our authenticating domain controller, the application would throw the error. CONTOSODC2 also happened to be one of the newly introduced Windows Server 2012 R2 domain controllers. We saw that if the domain controller returned was CONTOSODC1 or any other Windows Server 2008 R2 domain controllers the application worked as expected. Output of the nltest.exe command when executed from a Windows 7 machine against another server below. The /server switch instructs nltest to query the Netlogon service on the target server instead of the local host. The /force switch forces the Netlogon service on the server to find a domain controller and not use cached information from the last time it found a domain controller successfully.

C:\>nltest/dsgetdc:CONTOSO /server:CONTOSOCITRIX1 /force
DC: \\CONTOSODC2
Address: \\192.168.123.3
Dom Guid: ddeadc4f-dddc-00c9-b2ab-11652d7c10c3
Dom Name: CONTOSO
Forest Name: contoso.com
Dc Site Name: Contoso-Headquarters
Our Site Name: Contoso-Headquarters
Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

The output from the nltest.exe command when executed from a Windows 8 client will look like the following. Note the parts highlighted in red below (DS_8 and DS_9 flags) indicating that the domain controller being returned is 2012 R2 and above. The nltest.exe tool in Windows 8 has been updated to include this information.

C:\>nltest/dsgetdc:CONTOSO /server:CONTOSOCITRIX1 /force
DC: \\CONTOSODC1
Address: \\192.168.123.2
Dom Guid: ddeadc4f-dddc-00c9-b2ab-11652d7c10c3
Dom Name: CONTOSO
Forest Name: contoso.com
Dc Site Name: Contoso-Headquarters
Our Site Name: Contoso-Headquarters
Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS DS_8 DS_9
The command completed successfully

 

At this point we have answered three out of the four questions we set out to answer at the beginning of the post. First, we discussed how to find if the clients have the hotfix installed or not. Second, we saw that even though the clients had the hotfix installed they were reporting the error because the application was being launched from Citrix. Thirdly why is the issue only reported sporadically? It is because the only time the users saw this issue was if the Citrix server authenticated against a 2102 R2 domain controller.

4. Why is the issue being reported only now?

The last question to be answered is why only now? If you recall from the beginning of the post, Windows Server 2012 R2 domain controllers had been deployed for some time in the environment. The answer to that is quite simple, when one pieces the sequence of events together. Even though there were other Windows Server 2012 R2 domain controllers in the environment prior to this issue being reported, over the past week their numbers increased greatly, and specifically for the case of Citrix, the site that hosts all the Citrix servers saw its first Windows Server 2012 R2 domain controller being deployed in the past week.

That, in essence, is how using common tools such as nltest.exe and PsGetSID we could find root cause.

I mentioned using klist in the beginning of the post, so even though this post is getting a tad lengthy, here is the output I want to include from klist.exe, specifically the new switch of query_bind. For those unaware, the klist.exe tool can be used to list and if required purge all Kerberos tickets when troubleshooting a Kerberos problem, perceived or real.

Note the DS_8. Flag returned. So if all you had was access to the client side (like many environments that separate server administrators from client administrators, you could execute the klist.exe or nltest.exe command from the client and if you see DS_8 returned in the output you know that the domain controller that is issuing your Kerberos tickets for that realm is Windows Server 2012 or higher. The only catch is that in my limited testing I found that the query_bind switch is available only with klist.exe tool included in Windows 8 or Windows Server 2012 and above.

C:\>klist.exe query_bind
Current LogonId is 0:0x42bd3
The kerberos KDC binding cache has been queried successfully.
KDC binding cache entries: (1)

#0> RealmName: CONTOSO.COM
KDC Address: 192.168.123.3
KDC Name: CONTOSODC2.contoso.com
Flags: 0
DC Flags: 0xe000f1fc -> GC LDAP DS KDC TIMESERV CLOSEST_SITE WRITABLE FULL_SECRET WS DS_8 PING DNS_DC DNS_DOMAIN DNS_FOREST

 

So, let’s wrap up this post. In addition to how we found root cause of an initially puzzling problem using some common tools another point to note is this. That at a minimum the hotfix should be deployed on all Windows 7 clients prior to deploying Windows Server 2012 or Windows Server 2012 R2 domain controllers. Thank you.

Lakshman “keeping it real” Hariharan