I decided that we needed some more detail and to give a walk through scenario on this downgrade attack deal I mentioned a while back in a blog post.
As a recap, a customer called in after noticing the events below appearing intermittently but repeatedly-and always in the sequence of one after the other- in the System event log:
Event Type: Warning
Event Source: LSASRV
Event Category: SPNEGO (Negotiator)
Event ID: 40960
Time: 8:07:01 PM
Description: The Security System detected an attempted downgrade attack for server cifs/dc5.sales.adatum.com. The failure code from authentication protocol Kerberos was "There are currently no logon servers available to service the logon request.”
Event ID: 40961
Description: The Security System could not establish a secured connection with the server cifs/dc5.sales.adatum.com. No authentication protocol was available.
Of course the part that was most alarming was the attempted downgrade attack text. Attack is not a very friendly sounding word and usually infers that there is a person or identity behind the attack, instigating it. Naturally this is something an administrator would want to follow up on!
Let’s start by defining what a downgrade attack can be. A downgrade attack would be where a connection to obtain a resource starts with an more secure method of authentication but due to some reason must settle for a less secure method of authentication in order to authenticate and gain access to a resource. Kerberos, for example, is a more secure authentication method than NTLM and hence would be preferred and in fact is preferentially selected in security negotiation in every situation where it can be.
The word “attack” though suggests that in every case where we attempt Kerberos and end up using NTLM there was a malicious entity behind that when there generally would not be. There are situations where the 40960 and 40961 event sequence will be useful in identifying actual maliciously inspired behavior but for the most part the cause will be something far less dramatic or evil sounding.
A quick search on the interwebs finds several references to these events. The most informative is here. This Technet event description does a good job of telling us that there can be multiple causes of this event and suggests that it should appear in the event reason code info. The example given is STATUS_NO_LOGON_SERVERS. This is an excellent example since it is probably the most common instigator of this series of events.
So let’s go over a scenario where the 40960 and 40961 can occur from STATUS_NO_LOGON_SERVERS. Picture our file server FS123 is doing it’s normal business as a domain joined member server when a user on it or a service on it suddenly needs to access a file on DC5. FS123 keeps track of where domain controllers for its domain (Sales) are located by having a cache of this information which is maintained by the Netlogon service, and this cache contains information on where a responsive KDC for Sales is on the network.
So naturally when FS123 attempts to access a file on DC5 it negotiates Kerberos as the selected authentication method. That negotiation is what you will see on the network as SPNEGO information embedded in the SMB traffic from FS123 to DC5 and back. When DC5 responds in that SPNEGO response that it supports Kerberos FS123 knows that it needs to get a ticket for DC5 for the file service. In other words it needs a ticket for the service principal name of cifs/dc5.sales.adatum.com, and so FS123 sends that request out to the KDC it knows of in its cache.
But here’s where a problem comes in-the KDC it knows of is not responsive suddenly. As a result the Netlogon service provides a status saying that back to the file request: STATUS_NO_LOGON_SERVERS. The file request then must be completed using another authentication method like NTLM. Our events 40960 and 40961 are then logged in this case in order to show that we attempted this more secure authentication method but were not successful.
In our scenario above the file access and the application or user who initiated it probably succeeded in getting access to that file or files without ever noticing this transaction or a delay. But that leaves us with some questions around why that occurred in the first place? Why were we not able to use Kerberos?
The most common cause for this if the events are seen intermittently is that there is a transient network problem between the client (in our scenario FS123) and the DC it is looking to at that time for authentication. There could be many other causes making that DC less responsive, up to and including the domain controller seeing a performance “spike” and becoming too busy to respond quickly to the Kerberos ticket request from FS123.
From the FS123 side of things the Netlogon service will actually locate a new, more responsive DC when these things occur but there will be a short interval where things like this may happen. That’s the window where occasional events from our topic occur.
So how can you use this information? This can be used as a guideline to understand whether there is a transient issue going on or perhaps an actual intrusion where someone is making the authentication method used for connections intentionally less secure in order to more easily break it. The former (transient issues resulting in our 40960 and 40961 event sequence) is not a surprising thing to see occasionally in an enterprise environment. The latter (maliciously intentional cause) is rare to say the least but a good administrator slash security person will explore each and every one of these events. To do that simply enabled netlogon debug logging on the servers or workstations that see the events and look for corresponding errors occurring at the same time as the events, or look through the event logs for other corresponding events at or around that time.
As a post script, I’ve gotten several great questions from folks via the blog over the past few weeks. I intend to respond to them but have to confess it may be delayed-my apologies for that folks.
PingBack from http://sixthsenses.net/?p=104