Kerberos Authentication problems – Service Principal Name (SPN) issues - Part 1

Kerberos Authentication problems – Service Principal Name (SPN) issues - Part 1

  • Comments 12
  • Likes

Hi Rob here again. I hope that you found the first blog on troubleshooting Kerberos Authentication problems caused by name resolution informative and learned something about how to review network captures as well as how the SMB protocol works at a high level when reviewing a network trace. This time we are going to focus on problems that arise when Service Principal Names are not configured properly to support Kerberos Authentication. We will use a web site having authentication problems to show you network captures and explain at a high level how the HTTP protocol works. There are other ways to troubleshoot Kerberos; one could just read Kerberos event logging outlined in KB 262177. Although you could rely on this method, it will take a longer to resolve the issue and you will be taking an educated guess without a network trace.

I am going to layout my lab configuration in case you want to reproduce the problem and look at the network traces on your own.

Forest layout

The root domain litwareinc.com has one domain controller in the domain, and one member server.

DC network configuration:
Host Name:  FAB-RT-DC1
IP Address: 10.10.101.20
DNS:  10.10.101.20
WINS: 10.10.100.60

Member Server network configuration:
Host Name:  FAB-RT-MEM1
IP Address: 10.10.200.105
DNS:  10.10.100.20
WINS: 10.10.100.60


Windows XP client network  configuration:
Host Name: XPPRO02
IP Address:  10.10.200.110
DNS:  10.10.101.20
WINS: 10.10.100.60

 


The web application’s website address is: http://webapp.fabrikam.com/webapp  

 

The website is being hosted on FAB-RT-MEM1 

 

NOTE: I’m stating the obvious here, I know, but this configuration is for testing only. Having only one DC per domain is a single point of failure and should be avoided.

Problem scenario:

We want to use Kerberos authentication with a web application. The web application is running on IIS 6.0. The web application is using a web application pool. This web application pools Identity is running as a domain user account (FABRIKAM\KerbSvc) because at a future time they will be front ending the web servers with a network load balancer.

When the users visit the website with an Internet Explorer client they are using NTLM and not Kerberos.

Eventually the user name password dialog stops popping up and they get a message within IE stating “You are not authorized to view this page”

We have a website that makes determining web site authentication easier to troubleshoot. As you can see we authenticated using NTLM.

image

If “Audit Logon Events” auditing was enabled for “Success” on the IIS Server would see the following event that would also prove we are authenticating using NTLM.

image

You can see that the Logon Type is “3” (Which means “network logon”) and the Authentication Package was “NTLM”. This proves that we authenticated using NTLM and not Kerberos.

When you troubleshoot using network captures, you want to install the network capture utility on both ends of the communications to make sure that there are no network devices (routers, switches, VPN appliances, etc) that are manipulating the packet in between the two systems. We call this taking a double-sided trace. In support we will typically request a double-sided network capture be taken.

Since my lab does not have any routers in the mix (both systems are on the same subnet) I am only going to trace on the source Windows XP client machine.

So what is the best way to get the network capture?

1.     Make sure that there are no Internet Explorer windows open, and in general close down as many applications as possible so that your network traces are as clean as possible.
2.    Start the network capture utility.
3.    Clear all name resolution cache as well as all cached Kerberos tickets.

  • To clear DNS name cache you type in:  IPConfig /FlushDNS
  • To clear NetBIOS name cache you type in:  NBTStat –R
  • To clear Kerberos tickets will need KList.exe:  KList purge

4.    Launch Internet Explorer and go to the web site.
5.    Once the website comes up or error messages are being displayed, go ahead and stop the network capture.

Reviewing the network capture:

If you are using Wireshark to view the trace, the filter is simple: “dns || Kerberos || ip.addr==<IP Address of Target machine>”. Basically, this filter means “Show me all packets sent to or from the target machine, all DNS name queries and responses, and all Kerberos authentication.”

It should look similar to this:

image

Once you have the network capture, you should see all DNS, Kerberos Authentication (as well as packets that have Kerberos tickets in them), and anything destined for the remote system from the Windows XP client.

Before we go over the capture too much, we should probably cover at a high level the steps taken to connect to a website.

1.    Resolve the host name for the target system to an IP Address.

   a.    Look in HOSTS file.
   b.    Query DNS.
   c.    Look in LMHOSTS file.
   d.    Query WINS / NBNS.

2.    Client connects to the website anonymously using “HTTP GET”

3.    If the website allows anonymous access then it is done.  However if it does not, it responds back to the client with a list of authentication protocols it supports in the HTTP header.

4.    Client attempts to get a Kerberos ticket for the website (from a domain controller) if the website supports Negotiate authentication.

5.    Client then connects to the website and passes its credentials in the HTTP header.

Step 1 - resolve the name:

image

Remember, we did “IPConfig /FlushDNS” so that we can see name resolution on the wire. Frame 10 & 11 is the query and response for FAB-RT-DC1 (DC). In frame 46 and 47 is a query and response for the website name of webapp.fabrikam.com and response with a “A” or HOST record back of IP Address 10.10.200.105.

Step 2 – Client connects to the website anonymously:

image

So as we can see from the trace, the client does a HTTP GET request to the website and does not pass any user credentials. You can also notice that we are using HTTP 1.1 when we connect to the site.

Step 3 – Web Server responds with support authentication protocols:

image

Here we see that the website must require authentication to access the site because the web server responded back with a “401 Unauthorized”. We can also see that the web server supports the authentication types of: “WWW-Authenticate: Negotiate”, and “WWW-Authenticate: NTLM”.

In order for Kerberos authentication to work with IIS we must see Negotiate as an authentication method. If you do not see this you will need to enable this on the IIS web server or web site.

215383 How to configure IIS to support both the Kerberos protocol and the NTLM protocol for network authentication (http://support.microsoft.com/default.aspx?scid=kb;EN-US;215383)

Step 4 - Request a Kerberos ticket for the website:

image

Alright, now to the meat of Kerberos authentication and viewing it in a network trace. If you remember we used KList Purge command to clear out all tickets on the system. That means that the server has to get a TGT first and this is why you are seeing the AS-REQ and AS-REP frames (frames 58 and 59). If Kerberos ticketing is new to you, I would suggest reviewing the blog on how Kerberos works.

Next, we see the TGS-REQ in Frame 60; let’s take a closer look at this packet in the details pane.

image

You can see that the user’s TGT is handed to the KDC under “padata: PA-TGS-REQ” section, and requesting a ticket for server “http/webapp.fabrikam.com” in the FABRIKAM realm (Windows Domain) under “KDC_REQ_BODY” section.

OK, since we now know that we are requesting a Kerberos ticket for “http/webapp.fabrikam.com” in the fabrikam.com domain and the KDC (domain controller) responds to the Kerberos ticket request with KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN this would tell us that the SPN for “http/webapp.fabrikam.com” is missing or possibly that there are multiple accounts with the same Service Principal Name defined on them within the Active Directory Forest.

Step 5 – Client connects and passes Credentials:

image

So we see in the following Frames:

  • Frame 75 there is another HTTP GET command and it wants to connect using NTLMSSP_NEGOTIATE.
  • Frame 77 since credentials were not passed in frame 62 the web server responds again with a 401 Unauthorized, but this time it also sends a NTLMSSP_CHALLENGE response.
  • Frame 79 there is HTTP GET with NTLM_AUTH and we see that the account attempting to authenticate is FABRIKAM\Administrator.
  • Frame 80 the web server responds with a 200 OK.

You can see in the detail pane that I have highlighted packet 79; where the Authorization data is provided; and NLTM credentials that are being passed are domain of FABRIKAM and user account of Administrator from Host XPPRO02.

In frame 80 the website responded back with an HTTP 200 OK message which basically means that it accepted the authentication.

So, now the question become how do we fix the problem?

Well, we now know what the Service Principal Name is that we are requesting (review Step 4, frame 60). The next step is within IIS 6, we need to know what account is running the Application Pool for the website in question. Once we know this information, we can validate / add the proper Service Principal Name to that account. How do you get it?

1. Open up IIS Manager.

2. Find the web site that has the application pool defined, right-click on it and select properties.

3. On the “Home Directory” / “Virtual Directory” tab find the Application Pool field. This is the application pool that we need to find out what identity is being used on.

image

4. Now we can expand the Application Pools folder in IIS Manager and find the Application Pool.

5. Right click and select properties on the Application Pool name, and select the “Identity” tab.

image

6. As you can see we are using a domain account called “FABRIKAM\KerbSvc” to run the web application pool.

7. The next thing that should be done, is to make sure that the SPN “http/webapp.fabrikam.com” is not currently being used by any other accounts within the Active Directory forest. This can be accomplished in several different ways.

  • You could use LIDFDE to search for the SPN within all domains of the forest.
  • You could use LDP to search for the SPN within all domains of the forest.
  • You could use querySPN.vbs. 

You can review the following KB article on how to use each of these tools: KB321044 for more detailed information on how to use these tools. The best method is to use querySPN.vbs, with this tool if you target a Global Catalog it can search through the entire domain tree. If you have multiple domain trees in the forest or you have multiple Forest Trusts, you will need to specify each domain tree root individually and search this way.

So here is what we find when I use querySPN.vbs searching for http/webapp*

image

This is good; this tells us that there are no accounts that have that Service Principal Name in the forest. So if we add the SPN to the “FABRIKAM\KerbSvc” account we will not create a duplicate entry.

8. Once you have validated that you are not going to create a duplicate SPN, you can use SetSPN.exe to set the Service Principal Name of “http/webapp.fabrikam.com” and “http/webapp” on the “FABRIKAM\KerbSvc” user account. Next, you should always verify that the SPN’s have been added by using SetSPN to list the Service Principal Names on the account. You can look at the below screen shot for how the commands were ran:

image

Notice that I have added the Netbios name for the site as well as the FQDN of the website. It is good practice to add both to the account. Applications may ask for SPN’s in either name format and it may not be clear which will be requested in every case.

Now we can test to verify it is working by attempting to access the website again.

image

That worked. Now I know what you guys are starting to ask: how does this look in a network trace? Let’s find out.

  • In the Security Event log on the IIS server we find that “FABRIKAM\Administrator” did authenticate using Kerberos:

image

  • Looking at the network capture we see the entire ticketing process with the domain controller and it responds with a TGS-REP as shown in the below capture:

image

This validates what the application is telling us: we authenticated to the web application using Kerberos Authentication.

For those of you who would like to see the Kerberos Service Ticket being passed to the Web Server here is a screen shot of that functionality.

image

We can see that the web server accepted the authentication. We can also see that a Kerberos ticket was sent in the HTTP header by looking at the KRB5_Blob tag, and that Internet Explorer sent a Kerberos ticket for “http/webapp.fabrikam.com”

Service Principal Name troubleshooting is usually a problem when you are setting up the application to support Kerberos. Typically once the application has been up and running for a while there are not too many SPN problems once the application is working unless the Service Principal Names are changing.

Summary

I hope that you have learned a few new things:

  • How to search for duplicate Service Principal Names as well as how to add Service Principal Names.
  • How to easily filter network traces to confidently determine where Kerberos authentication is failing.
  • How the HTTP protocol works with authentication so that you can determine how you authenticated to the web site.

This blog post is the first in a three-part series that will cover the most common misconfigurations as they relate to Service Principal Name. So if I have not yet covered one of your current issues please check back soon. We will be covering issues like Duplicate SPN’s or the Service Principal Name being configured on the wrong account.

- Robert Greene

  • Hi,

    again an excellent post that helps me to better understand Kerberos authentication and related problems.

    In particular, I like most that you include the network traces so that I can see what really happens on the wire.

    Regards,

    Dominik

  • This week's collection of interesting links! Understanding HTTP Flow with Netmon 3 - Interesting article

  • So, we saw in Part 1 what kind of error you could expect when there is no Service Principal Name defined

  • Now we have seen what it looks like when there is no Service Principal Name defined , and when the Service

  • Thanks for making a difficult subject understandable.  

    How do you set a spn on a domain controller?  No matter what I do, ADSIedit or Setspn, in about 15 minutes or a reboot, the DC removes the entries.

    We have a CNAME record "ldap" pointing to a domain controller DNS A record.  This is because we don't want our many developers explicitly using a domain controller hostname in their code.  When I create "ldap/ldap" and "ldap/ldap.mydomain.com", the ldap queries successfully use Kerberos.  However, after a few minutes, the DC removes the SPN's and then the queries fall back to NTLM.  I thought maybe the "ldap/ldap" or "ldap/ldap.mydomain.com" was special so I simply created a "host/tobeornottobe", rebooted, and the dc removed it.  Any suggestions?

  • Hey JSmith,

    First let me say thank you for the kind remarks from all of us here that write content for AskDS.

    So I think I know at some level what might be happening to you.  

    I think that Netlogon on the domain controller is writing or doing something with the DNSHostName attribute on the domain controller computer account in Active Direcotry.

    So I worked this case one time, where the customer thought it would be cool to change the computers DNS suffix based on the AD site that they belonged to which caused what we call a Disjointed name space on for the client machines.

    When a computer boots up it checks the DNSHostName attribute on the account.  If the name does not reflect the current DNS suffix on the machine it changes it.  

    Then later in LSASS code on the domain controller it will go through and change the Service Principal Name attributes to match the DNSHostName attribute.  Since this is a domain controller it is going to re-write all service names and delete ones that should not be there.

    I am currently not sure why we would be writing these values back if the name is not changing.  However there is some very specific code in LSASS in regards to domain controllers that you might be seeing this behavior.  Another thought that a coworker had is that the KCC might be causing this since it does run every 15 minutes.

    Thinking about your problem, you might try to use the optionalNames registry key on the domain controller.  I am not positive if this will resolve the issue or not but you might want to give it a try.

    891607 The supported method of using the OptionalNames registry entry on a computer that is running Windows 2000 or Windows Server 2003

    http://vkbexternal/VKBWebService/ViewContent.aspx?scid=KB;EN-US;891607

    Rob Greene

  • Hey JSmith,

    So I did some testing today.  Here is how you should be able to make this work on your domain controller.

    Launch ADSIEdit.msc and select the properties of the domain controller.  You will want to add the other DNS name to the following attribute on the DC Computer object.

    msds-additionalDnsHostName

    Hope this helps.

  • Rob,

    Your suggestion about using msds-additionalDnsHostName solved the problem.  Thanks for your expertise and help!

  • Hi :

    Great Post. How do we resolve a missing SPN issue ( We create the SPN but they get deleted randomly)

    Here is the issue raised by our SQL folks:

    =============================================================================================

    SPN entries getting repeatedly deleted from AD . When I was in Seattle at the SQL Server user group conference, I brought this issue up with several engineers from Microsoft, who said it was most likely an AD container sync problem.

    ==============================================================================================

    Our AD infrastructure is pretty healthy and i am unable to see how this is a sync issue.

    Thanks for the Help

    Regards

    Lakshmi

  • Hi,

    Start by running REPADMIN.EXE /SHOWMETA <DN of that object getting its SPN's deleted> after the SPN is deleted *but before you fix the SPN*. This will tell you when the deletion happened and from which DC it originated. If you do this a couple times and find that it's always the same DC, you'd want to investigate for a script or other process running on that server, as well as the computers running in that DC's site. You should also enable DS Access Auditing and set auditing security entries on that object to perhaps see if there's an associated account doing it everytime; that might give more clues.

    I agree that the advice you got from those SQL conference people is likely mistaken.

  • Please do the action plan that Ned is listing out.  

    But also, think about what happened just prior to that time of deletion.  Also I have a question for you.  This SQL Instance where the SPN is getting deleted.  Is this possibly a SQL Cluster?

    There is a known issue with SQL Cluster when it fails over, it will delete the SPN on the node that is stopping.  When this is done, the domain controller removes the SPN.  Then the starting node will register the SPN.  So as you can see there is a possibility that the nodes are communicating with two different domain controllers in the AD site.  Thus last writer wins when AD replication happens.

    If the last writer just happens to be the node that is stopping for some reason, you get a servicePrincipalName attribute that is missing your required MSSQLSvc SPN.

    Also, starting in Windows Vista/2008 and higher, the Terminal Service will also register its SPN at at start.  If this only happens when the computer restarts, then it could be a similar situation as above.  SQL registers the SPN against DC1, then TermServ is registered against DC2.  Last writer wins, and the ServicePrincipalName attribute on DC2 did not include the MSSQLSvc SPN and thus it is not there when AD replication happens.

    Rob Greene

  • Thank You!