Microsoft's official enterprise support blog for AD DS and more
UPDATE: The hotfix is now available for this issue! Get it at http://support.microsoft.com/kb/2989971
This hotfix applies to Windows Server 2012 R2 domain controllers and should prevent the specific problem discussed below from occurring.
It’s important to note that the symptoms of users and computers not being able to log on can happen for a number of different reasons. Many of the folks in the comments have posted that they have these sorts of issues but don’t have Windows Server 2003 domain controllers, for example. If you’re still having problems after you have applied the hotfix, please call in a support case so that we can help you get those fixed!
We have been getting quite a few calls lately where Kerberos authentication fails intermittently and users are unable to log on. By itself, that’s a type of call that we’re used to and we help our customers with all the time. Most experienced AD admins know that this can happen because of broken AD replication, unreachable DCs on your network, or a variety of other environmental issues that all of you likely work hard to avoid as much as possible - because let’s face it, the last thing any admin wants is to have users unable to log in – especially intermittently.
Anyway, we’ve been getting more calls than normal about this lately, and that led us to take a closer look at what was going on. What we found is that there’s a problem that can manifest when you have Windows Server 2003 and Windows Server 2012 R2 domain controllers serving the same domain. Since many of you are trying very hard to get rid of your last Windows Server 2003 domain controllers, you might be running into this. In the case of the customers that called us, the login issues were actually preventing them from being able to complete their migration to Windows Server 2012 R2.
We want all of our customers to be running their Active Directory on the latest supported OS version, which is frankly a lot more scalable, robust, and powerful than Windows Server 2003. We realize that upgrading an enterprise environment is not easy, and much less so when your users start to have problem during your upgrade. So we’re just going to come out and say it right up front:
We are working on a hotfix for this issue, but it’s going to take us some time to get it out to you. In the meantime, here are some details about the problem and what you can do right now.
1. When any domain user tries to log on to their computer, the logon may fail with “unknown username or bad password”. Only local logons are successful.
If you look in the system event log, you may notice Kerberos event IDs 4 that look like this:
Event ID: 4Source: KerberosType: Error"The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server host/myserver.domain.com. This indicates that the password used to encrypt the Kerberos service ticket is different than that on the target server. Commonly, this is due to identically named machine accounts in the target realm (domain.com), and the client realm. Please contact your system administrator."
2. Operating Systems on which the issue has been seen: Windows 7, WS2008 R2, WS2012 R2
3. This can affect Clients and Servers(including Domain Controllers)
4. This problem specifically occurs after the affected machine has changed its password. It can vary from a few minutes to a few hours post the change before the symptoms manifest.
So, if you suspect you have a machine with this issue, check when it last changed its password and whether this was around the time when the issue started.
This can be done using repadmin /showobjmeta command.
Repadmin /showobjmeta * “CN=mem01,OU=Workstations,,DC=contoso,DC=com”
This command will get the object metadata for mem01 server from all DC’s.
In the output check the pwdlastSet attribute and see if the timestamp is around the time you started to see the problem on this machine.
Why this happens:
The Kerberos client depends on a “salt” from the KDC in order to create the AES keys on the client side. These AES keys are used to hash the password that the user enters on the client, and protect it in transit over the wire so that it can’t be intercepted and decrypted. The “salt” refers to information that is fed into the algorithm used to generate the keys, so that the KDC is able to verify the password hash and issue tickets to the user.
When a Windows 2012 R2 DC is promoted in an environment where Windows 2003 DCs are present, there is a mismatch in the encryption types that are supported on the KDCs and used for salting. Windows Server 2003 DCs do not support AES and Windows Server 2012 R2 DCs don’t support DES for salting.
You might be wondering why these encryption types matter. As computer hardware gets more powerful, older encryption methods become easier and easier to break. Thus, we are constantly incorporating newer, more powerful encryption into Windows and Kerberos in order to help protect your user passwords (and your data and your network).
If users are having the problem:
Restart the computer that is experiencing the issue. This recreates the AES key as the client machine or member server reaches out to the KDC for Salt. Usually, this will fix the issue temporarily. (at least until the next password change).
To prevent this from happening, please apply the hotfix to all Windows Server 2012 R2 domain controllers in the environment.
How to prevent this from happening:
Option 1: Query against Active Directory the list of computers which are about to change their machine account password and proactively reset their password against a Windows Server 2012 R2 DC and follow that by a reboot.
There’s an advantage to doing it this way: since you are not disabling any encryption type and keeping things set at the default, you shouldn’t run into any other authentication related issue as long as the machine account password is reset successfully.
Unfortunately, doing this will mean a reboot of machines that are about to change their passwords, so plan on doing this during non-business hours when you can safely reboot workstations.
We’ve created a quick PowerShell script that you can run to do this.
Sample PS script:
> Import-module ActiveDirectory
> Get-adcomputer -filter * -properties PasswordLastSet | export-csv machines.csv
This will get you the list of machines and the dates they last set their password. By default machines will reset their password every 30 days. Open the created csv file in excel and identify the machines that last set their password 28 or 29 days prior (If you see a lot of machines that have dates well beyond the 30 days, it is likely these machines are no longer active).
Once you have identified the machines that are most likely to hit the issue in the next couple of days, proactively reset their password by running the below command on those machines. You can use tools such as psexec, system center or other utilities that allow you to remotely execute the command instead of logging in interactively to each machine.
nltest /SC_CHANGE_PWD:<DomainName> /SERVER:<Target Machine>
Option 2: Disable machine password change or increase duration to 120 days.
You should not run into this issue at all if password change is disabled. Normally we don’t recommend doing this since machine account passwords are a core part of your network security and should be changed regularly. However because it’s an easy workaround, the best mitigation right now is to set it to 120 days. That way you buy time while you wait for the hotfix.
If you go with this approach, make sure you set your machine account password duration back to normal after you’ve applied the hotfix that we’re working on.
Here’s the relevant Group Policy settings to use for this option:
Computer Configuration\Windows Settings\Security Settings\Local Polices\Security Options
Domain Member: Maximum machine account password age:
Domain Member: Disable machine account password changes:
Option 3: Disable AES in the environment by modifying Supported Encryption Types for Kerberos using Group Policy. This tells your domain controllers to use RC4-HMAC as the encryption algorithm, which is supported in both Windows Server 2003 and Windows Server 2012 and Windows Server 2012 R2.
You may have heard that we had a security advisory recently to disable RC4 in TLS. Such attacks don’t apply to Kerberos authentication, but there is ongoing research in RC4 which is why new features such as Protected Users do not support RC4. Deploying this option on a domain computer will make it impossible for Protected Users to sign on, so be sure to remove the Group Policy once the Windows Server 2003 DCs are retired.
The advantage to doing this is that once the policy is applied consistently, you don’t need to chase individual workstations. However, you’ll still have to reset machine account passwords and reboot computers to make sure they have new RC4-HMAC keys stored in Active Directory.
You should also make sure that the hotfix https://support.microsoft.com/kb/2768494 is in place on all of your Windows 7 clients and Windows Server 2008 R2 member servers, otherwise they may have other issues.
Remember if you take this option, then after the hotfix for this particular issue is released and applied on Windows Server 2012 R2 KDCs, you will need to modify it again in order to re-enable AES in the domain. The policy needs to be changed again and all the machines will require reboot.
Here are the relevant group policy settings for this option:
Network Security: Configure encryption types allowed for Kerberos:
Be sure to check: RC4_HMAC_MD5
If you have unix/linux clients that use keytab files that were configured with DES enable: DES_CBC_CRC, DES_CBC_MD5
Make sure that AES128_HMAC_SHA1, and AES256_HMAC_SH1 are NOT Checked
Finally, if you are experiencing this issue please revisit this blog regularly for updates on the fix.
- The Directory Services Team
Thanks for posting this. We are currently affected by this after a 2003 to 2012R2 AD upgrade. Note the article title suggests this happens only when 2003/2012R2 DCs are mixed, however the problem continues even AFTER the 2003 DCs have been removed from the
domain. We are now only running 2012R2 DCs on 2012R2 FL, and the problem remains.
Its also worth highlighting that when a DC itself is affected by the issue, it stops sharing SYSVOL and clients can no longer talk to it, nor can you log onto the DC with any account. If you have 2 DCs and they reset at the same time say over a weekend, no
one in the domain can login on Monday morning. We are also seeing machines happening to reset their password whilst the user has left it on a locked session. They are then unable to log back in when they return, and are unfortunately losing work. I'm sure
you are aware of these additional observations, but just thought it worth mentioning in case not
Hoping for a fix very soon!
David - appreciate the post. Had this exact issue about a month ago and MSFT had me increase secure channel expiration as you said. Glad to see it wasn't just, and Microsoft has recognized the issue and is working on a fix.
What about 2012 R1 (one) DCs? Is it just R2 that does not use DES for the salt?
What if this issue has effected your CNO-object in your cluster? I have tried simulate failure and repairing, and reset the password. Also taking the whole cluster down, no difference.
Right now the properties of the CNO states the following:
DNS Status: The handle is invalid
Kerberos Status: When trying to update a password, this return status indicates that the value provided for the new password contains values that are not allowed in passwords.
Also, when trying to run nltest against the CNO I just get access denied.
I believe that Premier Field Engineering has a Best Practices Analyzer that they've developed for AD. It seems like it would be a really good idea to have DcPromo run this as its first step. And as my colleague Brian points out, why doesn't AD negotiate
a mutual cipher suite between client and DC in this case? Either don't allow incompatibilities to happen, or test and warn before creating such a situation.
Similar issue was there mix environment of 2003 and 2008 R2 DCs when 2003 server holds FSMO roles. KDC Event ID 27, I think both the issues looks similar to me. If they are different please explain how?
I have a current ticket open with Microsoft right now. Opened it on 7/19. My TAM sent me this post this morning. We decommissioned our last 2003 R2 domain controller in June 2014. We introduced 2012 R2 domain controllers in April 2014. We've had 2008 R2
domain controllers for years. We have 2008 R2 and 2012 R2 member servers that we cannot login after the password reset. We have to logon locally and reboot the server. We have about 5 servers per day in a 1800 server environment. It's getting pretty painful.
We were wondering if we raised our Forest Functional Level to 2008 R2 that maybe the issue would go away. Hasn't happened with 2003/R2 member servers or our Win7 workstations. But our standard troubleshooting at the service desk could be masking workstation
issues. We have reproduced the issue in one of our lab environments. The issue is getting painful problem is starting to who up on our 2012 R2 Hyer-V hosts.
Any chance this could cause similar issues on a 2008 RTM / 2012 R2 combo if the Domain Functional Level is 2003? We elevated to 2008 after reading this post because we were having some bizarre uncharacteristically long logon times from our servers after
bringing in a 2012 R2 DC into the mix.
Would this effect a trust between a 2012 r2 domain and 2003, or 2008 r2 for that matter?
Rob Greene from Directory Services. Let see if I can answer a few of these questions for the group:
This problem is very specific - You have to have Windows Server 2003 domain controllers in the domain, and then start to move to Windows Server 2012 R2 domain controllers to replace the 2003 domain controllers.
Windows Server 2012 domain controllers are not affected by this DES not supported so therefore no SALT created.
We have not heard of any issues currently with Cluster CNO's. I know that Windows Server 2008 - 2008 R2 clusters CNO's actually do not support AES encryption and only support RC4_HMAC_MD5. Not sure actually for Windows Server 2012 Clusters if that is what you
As far as your comment about running NLTest against the CNO what exactly are you trying to do with the CNO object using NLTest?
As far as needing to do something in DCPromo this is not a DCPromo problem this is a Kerberos authentication. It is being worked by the development team as a bug and is going to be fixed. Please just bear with us while this process happens they need time to
thoughtfully plan the fix, and then do testing to verify it is not going to cause more issues.
The Kerberos client and the KDC are actually negotiating the authentication. The KDC is searching and using the strongest encryption type supported by itself and the end service which AES is the strongest encryption on 2008 and higher computers.
For those of you who have decommissioned your last Windows Server 2003 domain controller after adding Windows Server 2012 R2 DC's, all that you should need to do is force the password change one time with the 2012 R2 domain controllers. The issue can actually
take up to two password changes to happen and clear up from talking with our Escalation Engineers.
This problem is not going to cause a slow logon problem. This issue causes a NO LOGON issue. If you are experiencing a slow logon problem you might want to determine if it is an issue with connectivity to the domain, group policy issues, logon script issues.
In theory this should not affect trusts as long as the trusts are not configured to use AES encryption. By Default the should be using RC4_HMAC_MD5. In Windows Server 2008 and higher you can modify the trust to support AES encryption however I do not believe
that Windows Server 2003 will know how to handle that trust password (Inter-realm ticket).
Thanks for this information!
Hi Rob, Thats very useful information thankyou for posting. Unfortunately Support are giving much conflicting information over this. We are running 2012R2 only, and only this week I have been advised we will continue seeing the issue every 30 days, and
that we still need to implement the workarounds. Typically I implemented option 2 only last night, before reading your post this morning!
We are facing similar issues, coming from a 2003 domain, adding 2012R2 DC's.
I saw Kerberos authentication failing because the newer clients use AES encryption type for Kerberos and the domain was still running 2003 DFL (supporting DES and RC4) so encryption type not supported was logged and RC4 was used. My assumption was rasing the
DFL to 2008 would fix this issue because AES enctype is supported.
We have fileover clusters running 2012, RDS servers running 2008R2 and DC's running 2012R2.
If the fileshare resource runs on one node we can successfully browse and access the fileshare, when we failover the resource we get access denied. what is strange is the Security-Kerberos ID4 error KRB_AP_ERR_MODIFIED from the server "nodename" The target
named used was cifs/filesharename.domain.local.
Yes we have correct spn's, even tried rejoining domain, repair cluster/fileserver resource and the same spn's get registered. Same compared to other similar configured environments were we didn't see these errors so far.
Should we open a support-case to receive this hotfix or will it be public available?