Hi folks. Lakshman Hariharan here again with another real world example, this time with a peer of mine, Victor Zapata who previously authored this post about how to stop a network trace programmatically using Network Monitor. Victor and I worked on an interesting issue not too long ago that we would like to discuss in this post.
As I mentioned in my previous post, in this “Real World Example” series, we want to share the methodology and tools we use to troubleshoot and diagnose issues that, at first glance, appear somewhat complex. The tools we will be discussing in this post are the Windows Event Viewer, The Microsoft Exchange Server Error Code Lookup Tool (Err.exe) and Microsoft Message Analyzer. In addition to outlining the methodology of how we troubleshot the issue, this post also allows us the opportunity to showcase Microsoft Message Analyzer (the successor to Network Monitor 3.4). So let us get started.
In this case, the customer informed us that web sites fail to display when visiting certain SSL sites using Internet Explorer 10 (IE10) on their corporate Windows 7 workstations. These websites were mostly outside the customer’s organization and thus outside their purview from the server side of things. The customer’s IT folks had performed initial troubleshooting and determined that the issue only happened when the browser used was IE 10 and specifically when TLS 1.2 was enabled under Internet Options, as shown in the following screenshot.
The error itself displayed by IE is pictured in the screenshot below.
They also reported that when the page fails to load the System log on the client logs the following SChannel event in the screenshot pictured below.
At this point we had a few pieces of information to begin our troubleshooting journey. The first thing we did was to look up the meaning of Error 43. For that we used Err.exe
Since we know that the error in the System log is Schannel related, we execute the Err.exe command and looked for error messages related to Schannel.
Note the message that says “TLS1_ALERT_UNSUPPORTED_CERT” as highlighted in the screenshot below.
Once we had this information we had an inkling that the error was related to a certificate mismatch or some other certificate related issue.
We also found out that the internal code of 252 maps to “RemoteCert_SigAlgorithm”. This indicated that the error possibly has to do with the Signature Algorithm.
At this point we had good reason to suspect a certificate issue but we still couldn’t pinpoint what it was. Which brings us to the next tool in our tool chest.
Since the issue could be readily reproduced in the environment, we captured a network trace while attempting to open one of the websites that failed to load using IE 10 with TLS 1.2 enabled. The trace can then be analyzed using Message Analyzer. As is recommended (where possible) when troubleshooting issues using network traces we also obtained a trace of “good” and expected behavior for comparison purposes.
Before we get to the details of analyzing the network trace in Message Analyzer, here is another cool feature we would like to show. The ability of Message Analyzer to open the network trace from the client and the corresponding System event log file in the same session, allowing us to see the intermingling of the network traffic along with the event data.
In the following steps we will illustrate* how to do that. After you open Message Analyzer, you would go to File --> New Session and give the session a name if you choose to. In this example we named it “TLS Fail Session”
Then click on “Files” --> Add Files to choose the files to add to the session
Once the “Configure Files” dialog box opens click on the “Add Files” button to select the files you would like to add to the session and click on the “Open” button.
Then click “Ok” to close the “Configure Files” dialog.
Then click “Start” to start the session.
The following three screenshots illustrate this sequence.
In the next steps we will discuss, at a high level, the general sequence of events and establish a pattern. For that we will pay attention to frames 13 through 33 that include the entire conversation we are interested in. This sequence and pattern is pictured in the next screenshot.
Frames 13 through 17 establish the TCP 3 way handshake between the client (22.214.171.124) and the server (126.96.36.199), as indicated by the Syn (S), Syn/Ack (A..S) and Ack (A) flags in the Summary column of those frames. 188.8.131.52 is actually a Forefront Threat Management Gateway (TMG) server that is acting as a proxy but that’s not relevant to the discussion.
Frame 18 shows where we are connected to the site with HTTP status code 200, as displayed in the Summary column of that frame highlighted in blue.
Frame 22 and 23 show the establishment (or the attempt to establish rather) of the TLS session with the Client Hello (frame 22) and Server Hello (frame 23)
Frames 30 through 33 is when the client proceeds to tear down the TCP session**, as indicated by the Fin/Ack (A…F), Ack (A), Fin/Ack (A…F), Ack (A) flags in the Summary section of those frames. This indicates to us that there was something in the preceding frame (23) that the client did not like, for the lack of a better description.
This prompts us to pay closer attention to frame 23, which is nothing but a set of frames re-assembled automatically by Message Analyzer. This is a cool feature that was key to diagnosing this issue. Using Message Analyzer’s predecessor (Network Monitor 3.4, aka Netmon), one would have to re-assemble the frames separately.
Before we proceed to analyze frame 23 in detail let us pay attention to message number 2855 in the screenshot above, sandwiched between frames 23 and 30. Message 2855 is actually from the System log that we opened along with the network trace in the same session. After the Server Hello in frame 23 is the fatal alert in the event log followed by the session teardown. This further validates our theory that there was content in frame 23 that caused our connection to be terminated.
Message Analyzer has pieced together for us the sequence of events from two separate logs. Pretty cool, huh? We thought so too.
The screenshot below shows message 2855 expanded and shown in detail. Note the highlighted parts showing the message that a fatal alert was sent along with the alert description and error state.
Now let us return to the detailed analysis of frame 23. As we select that frame in the Analysis Pane and focus on the details of that frame in the Details pane we see there are three records, one of which includes the certificate list (highlighted in red) as shown in the screenshot below.
As you may recall from our Err.exe output of “TLS1_ALERT_UNSUPPORTED_CERT” code we suspected some sort of certificate mismatch or other error pertaining to certificates.
In the Details pane of frame 23, expanding record 1 --> fragment -->  --> body --> certificate list we see that there are four certificates in the chain as shown in the screenshot below.
At this point we start looking at each of the certificates in the certificate chain to check for potential issues. When we expanded the details on every certificate in the certificate chain paying special attention to the SignatureAlgorithm field (again because of the unsupported cert code from Err.exe) we found nothing out of the ordinary, especially in the SignatureAlgorithm field. Especially when compared to the “good” trace. Below is the screenshot of the second certificate (normal) in the certificate chain.
All the certificates appeared normal. All but one in the chain. When we expanded the last certificate in the chain (our problem child) we observe our culprit. The screenshot with the encryption algorithm highlighted below.
As it turned out, one of the certificates in the certificate chain was using the MD2 algorithm as can be seen from the Algorithm*** field, which is legacy and not secure enough. When the client saw a certificate using the legacy algorithm it aborted the connection.
The real, long term solution in this scenario is for the server to not employ certificates using the legacy signature algorithm, for security reasons if nothing else. However, seeing that from the client side of things we rarely (if ever) have control over what server side certificates are being used, our solution or workaround had to be client side. Especially so in this case since most of these sites were outside of the purview of the customer's IT department. The issue manifests itself as an IE issue but in reality what really transpired is that since the client detected that one of the certificates in the trust chain was not secure, it aborted the connection. In this event the workaround, though not optimal and not recommended, would be to uncheck the "TLS 1.2" in the properties. The other option is to upgrade to IE11, because after follow up with folks on the IE team it was clarified that this issue doesn’t happen with IE 11. So if the TLS 1.2 handshake fails there will be a graceful failover to TLS 1.1 so the page is still displayed.
We would be remiss not to reiterate that the real solution should be server side and not using certificates signed with legacy signature algorithms. Failing which, the aforementioned client side workaround or solution can be implemented.
This is how, using the Windows Event Viewer, Err.exe, Message Analyzer and some Bing searches one can go from a problem description of “certain SSL sites fail to load” to what actually caused it.
Lakshman Hariharan and Victor Zapata
*The steps for adding files and opening them within the same session are different in the latest public version but the functionality is the same.
**Note that there are two ways a TCP session can be torn down. One abrupt, via a TCP Reset and another more graceful four way session close of Fin/Ack, Ack, Fin/Ack, Ack. The latter is what happened in this scenario. This is not to say that one way of tearing down a session is better than the other, just that one is used over another depending on the situation.
***The latest public version of Message Analyzer does not have the name of the algorithm parsed out. It has the Object Identifier (OID) as shown in the first screenshot below. This is documented in this MSDN reference. The second screenshot shows the OID mapping to friendly name as it appears in the MSDN reference. The parsing of the OID to name is slated to be included in the next release of Message Analyzer.