Microsoft's official enterprise support blog for AD DS and more
Hi all. This is Sean again and it’s ADFS blog time! Today I’m going to touch on Security Assertion Markup Language (SAML) tokens, and an issue we’ve run into when federating with Tivoli Federated Identity Manager (TFIM). I’ll discuss what a SAML token is, why it’s important, and what happens when TFIM tries to validate one from ADFS.
As you may know, the Active Directory Federation Service (ADFS) uses SAML tokens to represent claims. These claims about a user are made by the Federation Service Account (FS-A) server. The claims located in the SAML token are what allow the Federation Service Resource (FS-R) server to determine what claims to grant the user in the resource’s domain. Generally, the transaction goes something like this:
After this happens, the SAML token is verified, the claims are extracted, and the rest of the ADFS process continues, right?
Well, almost. In addition to validating the SAML token using the public key of the certificate that the FS-A used to sign it, the FS-R also looks at time conditions specified in the token. To view the SAML token, you will need to enable the verbose debug level on the Federation Service Properties page. This can be done on either the FS-A or the FS-R. The log file will be located in the log files directory that you specify.
In the SAML token you will see a condition block close to the top that looks like this:
<saml:Conditions NotBefore="2008-09-11T19:47:41Z" NotOnOrAfter="2008-09-11T20:47:41Z"> <saml:AudienceRestrictionCondition> <saml:Audience>urn:federation:treyresearch</saml:Audience> </saml:AudienceRestrictionCondition> </saml:Conditions>
Note the “NotBefore” and “NotOnOrAfter” conditions that are in bold. These mean exactly that. If the SAML token is presented to the FS-R BEFORE the NotBefore time, or ON or AFTER the NotOnOrAfter time, then the SAML token will fail validation. Generally you’ll get an error message on the FS-R to the effect of “Unable to Validate Signature on SAML Token.”
It’s important to note that the time used to check against these values is the local time on the FS-R (whether it’s an ADFS FS-R or another solution like TFIM). The issue comes when the FS-A and the FS-R clocks are not in sync. Let’s explore this with an example.
Let’s say we have a client from Adatum.com trying to access a resource in Treyresearch.net. The client will hit the website, get redirected to the Treyresearch.net federation server, perform the client realm discover, and get redirected to the Adatum.com federation server. At this point we’ll go through the process outlined above for obtaining a SAML token. Now, the Adatum resource server is going to set the NotBefore time to the time the SAML token was issued based on its local time. So, if it has a time of 11:31AM when the SAML token is issued, that’s what the NotBefore time will be set to. When the client gets redirected back to the Treyresearch federation server, the NotBefore time will be compared to its local time. If the time on the Treyresearch federation server is set to something earlier than 11:31AM then the token validation will fail.
Apparently someone in the ADFS development group understood that this could be a common issue. To help mitigate the issue, in a federated environment with ADFS running on the resource side the FS-R will allow for a token that is sent five minutes “in the future.” This eliminates the need for absolutely strict time consistency between two completely separate organizations.
On the other hand, TFIM strictly enforces the NotBefore setting in the token. If the local time is before the NotBefore setting then the SAML token will fail validation. So, if ADFS is setup as the account partner, and TFIM is setup as the resource partner, the ADFS federation server’s time cannot be ahead of the TFIM federation server’s time. Let’s consider this with another example.
Suppose an ADFS FS-A issued a SAML token with a NotBefore time of 11:31. The client then gets redirected to a TFIM FS-R whose local time is 11:29. When the TFIM server goes to validate the SAML token, it will fail because the NotBefore time hasn’t been reached.
To help alleviate this issue Microsoft released hotfix 956279 that makes the allowed time difference between the ADFS server and the TFIM server configurable. Once the hotfix is installed on the FS-A, the web.config file on the FS-A can include the <TokenIssuanceNotBeforeSkewInMinutes> tag inside of the <FederationServerConfiguration> tag. If you wanted to set a 5 minute skew, the web.config file would contain this:
Right after this:
So, what’s the moral of this story? If you’re running ADFS on the account side of a federation and TFIM is hosting the resource side, make sure you’ve got time synced on both servers or download the hotfix to configure a little wiggle room. Otherwise you will be “Unable to Validate Signature on SAML Token” as well.
For more information about setting up ADFS and TFIM check out the ADFS Step-by-step Guide: Federation with IBM Tivoli Federated Identity Manager.
- Sean “Lurch” Ivey
This is Randy again with an interesting case that I had recently. We were having problems trying to join certain workstations to the domain. We would see that every workstation in one site would join successfully and all the workstations in another site would fail with an error indicating that we could not locate a domain controller for that domain. My first hunch was either the domain controllers in the one site were broken, or there were networking issues in that problem site. The first step in troubleshooting is “Check the Event Logs!” We did not see any alarming events on any of the domain controllers in the problem site. So my next step was to take a network trace of the issue. With the help of a networking engineer here at Microsoft, Tim Quinn, we analyzed the traffic of a successful domain join and a failure. We took a simultaneous trace from the workstation and authenticating domain controller to ensure that we could see both sides of the conversation and uncover any failures across the wire.
We used Network Monitor 3.2 to take the traces. You can find some very helpful webcasts on working with Network Monitor on the Netmon blog. Here is a snapshot of some traffic between the domain controller and the workstation that is attempting to join. This is from the Frame Summary pane and is a general overview of each frame sent on the wire.
You can see the protocol and description of each frame. We have a lot of traffic on RPC and SMB and from the frame description we see that this is communication on named pipes: Netlogon, Samr, and LSARPC. These are the connection points involved in a domain join between a workstation and a domain controller. By highlighting one of these frames in the Frame Summary pane, we can see each network layer of the frame in the Frame Details pane.
It is important to understand that data transfer uses encapsulation to transfer information from a process on one computer to a process on another computer. In the above example we see the LDAP client on a workstation talking to the LDAP server on a domain controller using the defined specifications of the LDAP protocol. This data is packaged in a TCP packet which is built using the specifications of the TCP protocol, which is packaged in an IP datagram, which is packaged on an Ethernet frame. Netmon separates each of these packages in order for you to analyze the behavior of each protocol individually.
So now we have two traces that show every frame on the wire: one from the perspective of the workstation and one from the perspective of the domain controller. So how do we find a frame in one trace to the corresponding frame in the other trace? We have a couple of frame and packet attributes that can help; the first one we will discuss is the identification number…
The identification number is an attribute of the IP datagram that is sent across the wire. This attribute is as simple as it sounds; it is just a random generated number to identify a specific IP datagram. If we expand the IPv4 header information from the above example, we see an attribute named Identification with a value of 3201.
If we look at either the network trace taken on the domain controller or the trace taken on the workstation, there will be a frame with an Identification number of 3201. You can filter both traces for this frame by using the filter IPv4.Identification == 3201.
Another way to line up the conversation of two traces taken simultaneously is to compare the Sequence and Acknowledgement numbers. These attributes are at the TCP layer instead of IPv4. To view these attributes, expand the TCP header. This is from the same frame as above:
We see that the last packet sequence number sent in this frame is 4167329214, and the last packet that we received from our partner in this communication is 1946363494. These numbers can often be misleading, because a router can strip and resend at the network layer (IP layer) and all the numbering can be misleading from the IP layer up (In this case TCP.) To align to simultaneous traces, I use the Identification attribute from above, and I use the sequence and acknowledgement numbers to verify dropped and received packets. To learn more about Sequence and Acknowledgement numbers and how TCP works, check out the following KB article:
Explanation of the Three-Way Handshake via TCP/IP http://support.microsoft.com/kb/172983
In comparing these traces, we see a breakdown in the communication. From the frame summary, it would appear to be an LDAP problem. After further analysis, we see that the issue is at the TCP layer. The next two snapshots are from the simultaneous traces and an explanation under each.
This is from the trace taken on the workstation and we see at the top, frame identification number 3201 which is our LDAP request. After this we get a strange Kerberos packet. This is actually an out of sequence packet that was the last part of the LDAP response from the domain controller. Because it is an incomplete portion of the response, Netmon did not parse the frame correctly and it shows as a Kerberos packet. Beyond that, we see the workstation eventually abandon the LDAP request (frame 133).
This is from the trace taken on the domain controller and we see at the top, frame identification number 3201 which is our LDAP request. We see that the DC does respond, but we send two frames of data, the first which never makes it to the workstation and the second frame (frame 125) that does successfully make it to the workstation as an out of sequence packet. After this, we never receive an acknowledgement from the workstation and we see the domain controller resend the missing packet (frame 127 and 128).
So what ever happened to the mysterious disappearing packet? This was caused by a router that would drop packets with a Maximum Transmission Unit (MTU) size too large to forward. This issue is known as a black hole router. We were able to change the MTU size sent and this resolved our issue.
Even though this blog is AskDS, it is important to understand the networking components used by Directory Services. By using Network Monitor, you can avoid time spent troubleshooting the wrong component.
- Randy Turner
We’ve been at this for over a year (since August 2007), with more than 100 posts (127 to be exact), so maybe we can indulge in a little metablogging to look back on what we’ve done.
First let’s look at the posts that sparked the most conversation – because that is what blogging is all about right? If we wanted to simply publish information, we could just as easily create KB or Technet articles. Well, ok, there are decidedly fewer hoops to jump through to post a blog, but by blogging we also get to hear from you. Being in tech support, we hear from you quite a bit already. But I’m guessing many of you take pride in how infrequently you call us, so the blog opens up conversations with people we may never hear from otherwise.
The big winner here was Ned’s Top 10 Common Causes of Slow Replication with DFSR. It has five times the number of comments as the next highest post. It basically became a support forum for DFSR issues, and Ned was nice enough to oblige. To a lesser degree that is what happened with the other most-commented posts.
Top 10 AskDS Posts by Comments (Aug. 07 – Nov. 08)
Not surprisingly, the same post topped the list for page views. But there are several that show up here with high page views that didn’t generate much conversation at all.
Top 10 AskDS Posts by Page Views (Aug. 07 – Nov. 08)
Our goal with this blog is to get information in the hands of customers so they can more effectively use our products, and hopefully save some of you from having to call us for support. If you have any thoughts about topics you would like to see more (or less) about, please leave us a comment.
Be sure to check out the Troubleshooting: Quick Fixes post over at the Group Policy blog for some tips on solving common Group Policy issues.
- Craig Landis
Ned here again. We recently had a very lively discussion about 'Lag Sites' as a disaster recovery option. If you've been digging around the MS Download Center, you may have already come across Introduction to Windows Server 2008 R2. After some digging, you'll come across:
Improvements in Active Directory Domain ServicesThe Active Directory Domain Service server role in Windows Server 2008 R2 includes the following improvements:• Recovery of deleted objects. Domains in Active Directory now have a Recycle Bin feature that allows you to recover deleted objects. If an Active Directory object is inadvertently deleted, you can restore the object from the Recycle Bin. This feature requires the updated R2 forest functional level.
So while this won't be a replacement for solid backups, it certainly should augment them well and allow admins to get data back quickly without the need for complex lag site arrangements, or worries that the deletion has occured before the backups have had a chance to capture it. As always, this is pre-release documentation and there are no guarantees made about the component availablity or even if it will be included yet. Definitely keep your eyes open for it though. :-)
Definitely skim that document, there are all sorts of interesting tid-bits in there for the sharp-eyed. More news to come...
- Ned Pyle
Ned here again. I recently spent a week with Microsoft Support Engineers from all over the world, and bumped into a colleague that works in MS Spain, out of Madrid. She mentioned that they had a Spanish-language blog focused on Directory Services, networking, and other Windows Platform topics. For all of our Spanish-speaking readers, I highly recommend you visit them; they have some very interesting articles and techniques they offer.
Platformas (Consulta con el equipo de Windows)
¡Hola Paula! :-)
New KB articles related to Directory Services for the week of 11/9-11/16.
List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2003 R2
EFS may not be enabled expectedly after you disable a policy and this policy turn off the EFS feature
A Windows Server 2003-based terminal server that is added to the Terminal Services Computers group cannot obtain a license from a terminal licensing server that has the "License server security group" setting enabled
Error message when you try to perform a metadata cleanup of an ADAM instance on a Windows Server 2003 R2-based computer: “DsRemoveDsServerW error 0x57(The parameter is incorrect.)”
You cannot log on to the domain, join a computer to the domain, or run the Active Directory Installation Wizard (Dcpromo.exe) in Windows Server 2003
Microsoft Windows Vista Service Pack 1 (SP1) Frequently Asked Questions
Event 4106 generated by a Windows Server 2008-based Terminal licensing server may report incorrect number of issued "Per User" licenses
The text filter function may not return any results in the Group Policy Management Editor window on a computer that is running Windows Server 2008 or Windows Vista SP1
The LocalLow folder may not be created on a Windows Vista SP1-based computer or on a Windows Server 2008-based computer when roaming profiles are used in a domain environment
New KB articles related to Directory Services for the week of 11/16-11/23.
Repair options that you can use to recover if you accidentally make an incorrect Distributed File System Replication (DFSR) member authoritative in a Windows Server 2003 R2 environment
Client connections return a "STATUS_INVALID_PARAM" error code when you use a "Send NTLMv2 response only" authentication level in Windows Server 2008 or in Windows Vista
Mandatory user profiles do not work as expected on Windows Vista-based and Windows Server 2008-based client computers when the %LogonServer% environment variable is set in the profile path
User data from the USMT may be deleted unexpectedly by the task sequence engine during the operating system deployment process in System Center Configuration Manager 2007 SP1
The SPN registration of a cluster fails, and Error event IDs 1119 and 1034 are logged in an Exchange Server 2007 Service Pack 1 environment
Hang When Reading StdErr/StdOut Properties of WshScriptExec Object
A name resolution query fails when Windows Server 2003-based DNS servers set the AA bit for the DNS query and forward the query to conditional forwarders
Windows SteadyState may not automatically restart a Windows XP-based computer in certain circumstances