Microsoft's official enterprise support blog for AD DS and more
http://www.economist.com/node/18529895
And before you disagree, examine all paper mail coming from your company to see if it includes the same disclaimer…
- Ned “hostile witness” Pyle
Hi folks, Ned here again. Customers periodically ask us for a rumored replacement for the Windows 2000 acctinfo.dll that works on 64-bit Windows 7 and Windows Server 2008 R2. That old DLL added an extra tab to the Active Directory Users and Computers snap-in to centralize some user account info:
Pretty cool. You can see account lockout status, last logon time, account is locked out, and other goo.
Even though that newer unsupported acctinfo2.dll file exists - yes, even in x64 format - there is a supported way to see this info as long as your admins use Win7 + RSAT or Win2008 R2.
Windows Server 2008 R2 introduced a new service called the Active Directory Management Gateway (also called the AD Web Service). AD PowerShell uses this component as a sort of proxy into Active Directory. This is fine if you are native Windows Server 2008 R2 and Windows 7, but for most companies, that's going to take a while. While this includes quasi-web traffic - the .Net Negotiated Stream protocol - we sign and seal all LDAP traffic. You can further secure the traffic by deploying Domain Controller or Domain Controller Auth certificates - something that happens automatically with the deployment of an Enterprise PKI in your domain.
Knowing this, we released an out of band version of this service - you can grab it here. Install it and its support files on Windows Server 2003 or Windows Server 2008 and your shiny new Win7 clients can run AD PowerShell even if you don't have a single Win2008 R2 server in the forest. The service is fine to install only a few DCs as well - as long as those are up and routable to Win7 clients, you’re good to go; not all DCs need run the service. Your Windows 7 clients will find the updated servers automatically, or you can point to them specifically.
The Active Directory Administrative Center is another new component introduced by Windows Server 2008 R2. Many admins gave it a glance, thought to themselves "another ADUC, why bother?", and went back to their familiar old tool. If you like acctinfo.dll though, you should like ADAC.
With Win7 RSAT installed and the AD tools enabled (or RDP'ed into your Win2008 R2 servers for AD administration), run DSAC.EXE. You'll see this:
You can browse to users or search for them. I'll search for Sarah Davis:
I open the user and so far, it looks pretty much like ADUC.
It saved me a click of the "Account" tab to see things like logon name and password options. However, I want all that account status goo!
Say, what does that little nubbin arrow icon down in the lower left do?
<Clickity click>
Blammo!
Compare that to the old acctinfo.dll. :) Password info, lockout info, and logon info. Even the SID and GUID. There are a few things missing like "account unlocks at such and such date" but their usefulness is questionable - if you care that someone is locked out, you mostly care about seeing when they locked and unlocking now. Acctinfo2.dll doesn't show that either, btw. Moreover, if you want to change someone's password on a certain DC, just retarget to that DC. Better yet, always target the PDCE; DCs contact the PDCE after a failed logon for another try, as it is likely to have the latest password from so-called urgent replication. In addition, account lockouts should be the most up to date on a PDCE.
Oh, and stop using account lockout policy anyway, it's vile. Use complex passwords and get ACS to track for brute force bad password attempts. Turning on account lockouts is a way to guarantee someone with no credentials can deny service to your entire domain. Locking out your boss' account will be a very convincing demonstration… mmm, maybe just a test admin account. J
Need to see the “Attribute Editor” extension on RSAT client versions of ADAC to get more "ADUC parity" at examining users? You can enable that tab by adding the {c7436f12-a27f-4cab-aaca-2bd27ed1b773} displayspecifier with these steps (again, using my sample domain cohowinery.com as the example):
1. Logon as an enterprise admin.
2. Run ADSIEDIT.MSC and connect to the Configuration Partition.
3. Navigate to:
CN=user-Display,CN=409,CN=DisplaySpecifiers,CN=Configuration,DC=cohowinery,DC=com Note: 409 is "US English". You will need to repeat all these steps for any other languages used in your environment.
CN=user-Display,CN=409,CN=DisplaySpecifiers,CN=Configuration,DC=cohowinery,DC=com
Note: 409 is "US English". You will need to repeat all these steps for any other languages used in your environment.
4. Edit user-Display container.
5. Edit the adminPropertyPages attribute.
6. Determine the highest index value listed. For example in mine, it was 10.
7. Add the GUID {c7436f12-a27f-4cab-aaca-2bd27ed1b773} set with the next highest index number. For example, here I added:
11,{c7436f12-a27f-4cab-aaca-2bd27ed1b773}
8. Close adsiedit. Replicate this value to all DCs in the forest through natural convergence or forcing with repadmin.exe.
9. Once done replicating to all DCs, anytime you start DSAC.EXE all users will show the Attribute Editor.
Note: before you get all froggy about these steps - if you end up using acctinfo2.dll you will have to modify its displaySpecifier data as well. :)
Update 12/10/2012
Your feedback in the Comments was heard in Windows Server 2012. ADAC now also shows Password Last Set and Password Expiration info:
You will find a great many copies of acctinfo2.dll floating around, but none hosted on Microsoft websites (we never released it publically, it was just a side-project for a Support engineer here in Charlotte). Before you install those, consider this: you plan to load a DLL from some random place on the Internet into one of your most powerful AD admin tools, and then run that tool as a Domain Admin. And you have no way to know if that's some leaked MS version of the file or one adulterated by hackers.
Still sound like a good plan?
If you absolutely must use this DLL, you should only get it by contacting MS through a support case. If I wrote malware, an admin-only file injected into AD management tools would be my first choice for pwnage.
Until next time.
Ned "the guy that wrote acctinfo2.dll has ostrich legs" Pyle
Hi all, Ned here again. We usually get asked for a more portable version of our multi-part blog posts so - for once - I am creating it before the yelling starts. Chris’ “Designing and Implementing a PKI” series is included below in a few common file formats:
Along with the entire web series here:
Thanks for all the pleasant comments, Chris appreciated them.
- Ned “not Ned, Chris Delay” Pyle
Sean again, here for Part 3 of the Advanced Group Policy Management (AGPM) blog series, following the lifecycle of a Group Policy Object (GPO) as it transitions through various AGPM-related events. In this installment, we investigate what takes place when you check-in a controlled GPO.
Before editing an AGPM controlled GPO, it is checked-out. Similarly, after editing the GPO, it is checked in before the changes are deployed to production. Many of the same failure points exist for both the check-out and check-in processes. Network communications during the restore can drop, leaving the production GPO only partially updated. Disk corruption can cause the Archive copy of the GPO to fail to restore correctly. The AGPM service account could fail to authenticate when attempting to perform the requested operation. We use the same tools to collect data for these blog posts and to troubleshoot most issues affecting AGPM operations.
In Part 1 of this series (Link), we introduced AGPM and followed an uncontrolled “Production” GPO through the process of taking control of it with the AGPM component of the Group Policy Management Console (GPMC). If unfamiliar with AGPM, I would recommend you refer to the first installment of this series before continuing.
Part 2 of the series (Link) continued the analysis of this GPO as it was Checked-Out using AGPM. We revealed the link between AGPM controlled GPOs and the AGPM Archive as well as how AGPM provides for offline editing of GPOs. If you haven’t read Part 2, I recommend doing that now.
Environment Overview:
The environment has three computers: a domain controller, a member server, and a client.
For additional information regarding the environment and tools mentioned below, please refer to Part 1 of this series (Link).
Getting Started:
We start out on our Windows 7 computer, logged in as our AGPM Administrator account (AGPMAdmin). We need GPMC open, and viewing the Change Control section, which is the AGPM console. We are using the “Dev Client Settings” GPO from the previous blog post so let’s review the GPO details:
We also want to log into the AGPM Server and the Domain Controller and start the data capture from each of the tools mentioned in the previous section.
Picking up where we left off from the previous blog post, we now have our GPO checked out and modified with some new settings. When we’ve made the desired changes to the Group Policy Object, we close the Editor and return to the AGPM Console. In order to check it back in, we right-click the GPO in the AGPM console and select the “Check In…” option. We have the option to enter a comment for the check-in operation. The red-outlined GPO icon returns to normal once checked back in.
The AGPM Client
As we might expect, Network Monitor shows traffic is mainly between the AGPM Client and AGPM Server. It is TCP traffic between the client and port 4600 on the AGPM Server.
Process Monitor shows MMC writing to the AGPM.log file, but otherwise has few entries that relate to the Check-In process. As before, this shows the AGPM client does not perform any of the operations on the GPO itself. It simply relays the instructions to the AGPM Server.
There were no entries generated in the GPMC log during the Check-In operation. Considering the only entries in the log pertained to the startup of GPMC, these actions within the AGPM console obviously do not flag any GPMC logging events.
The AGPM.log shows nearly identical information in the Check-In operation as it did in the Check-Out. The AGPM Client contacts the AGPM Server and notifies it of incoming instructions. When the AGPM Server is ready, the AGPM Client sends the instructions and awaits return information. Once the AGPM Server returns the resulting data the function exits successfully.
AGPM Server
We covered the AGPM client network traffic in the previous section. Once the AGPM client gives instructions to the AGPM server, that server opens an LDAP connection to the Domain Controller. The AGPM server accesses the checked out GPO information within Active Directory and SYSVOL. While we can’t see exactly what’s being read from the directory, we do see the SMB traffic as the AGPM server reads the information from SYSVOL.
Process Monitor shows quite a lot of activity from the Agpm.exe process. It starts out by looking up the AGPM Archive path from the registry, and accessing gpostate.xml to determine the status of the GPO.
Within the gpostate.xml, each GPO has its status and check-in history listed.
The "agpm:type" entry indicates the “CHECKED_OUT” status, the time of the operation, the comment entered during the check-out operation and the SID of the user performing the operation. This is also where the reference to the "agpm:offlineId" is found, which is the Offline GPO's GUID created during the Check-Out process.
The AGPM process then looks to the manifest.xml file, which contains entries for every time a GPO was backed up to the AGPM Archive. From Part 1 of this blog series, we learned taking control of a production GPO initiated a backup of that production GPO into the AGPM Archive. At this point, AGPM.exe uses the manifest.xml to check the current backup status.
Next, we see the AGPM server read the SYSVOL folder for the Offline GPO, and start verifying the folder structure within the AGPM archive matches.
AGPM then copies files from the GPO’s SYSVOL folders to their corresponding location in the AGPM Archive path. Here we see the copy of the Computer Configuration registry settings file.
Once copied, AGPM updates the manifest.xml and bkupInfo.xml files within the GPOs Archive folder.
Where the bkupInfo.xml file contains the information of the GPO it has created, manifest.xml has a copy of that same information for every GPO in the Archive. The following is the bkupInfo.xml for the GPO check-in.
AGPM updates Backup.xml with the modified GPO’s security settings, as well as any new GP Extensions required. GPreport.xml contains all of the settings within the checked out GPO.
Now that the checked out and modified GPO is backed up to the Archive, the gpostate.xml file is updated to reflect the new “CHECKED_IN” status of the GPO. Notice the AGPM Archive path has changed from {85B77C99-1C4B-473C-A4E5-0AF10DD552F9} to {CD595C25-5EC6-4653-8E24-0E640588C654}.
It’s important to note what we do not see here: AGPM does not write the modified GPO to SYSVOL under the production GPOs GUID {01D5025A-5867-4A52-8694-71EC3AC8A8D9}. This is evidence that checking in a GPO we modified in AGPM does not commit the changes to production. In order to do that, we must ‘Deploy’ the GPO within AGPM.
Reviewing the gpmgmt.log entries from the Check-In operation mirror much of what we saw in Process Monitor. AGPM backs up the Offline GPO to a newly created Archive path, and then updates gpostate.xml, bkupInfo.xml and Manifest.xml to associate the production GPO with the new path.
The AGPMserv.log has a very limited view of the process, simply recording a GPO Check-In “CheckInGPO()” function was called.
The Domain Controller
We’ve already covered the network traffic between the AGPM Client and Server and the Domain controller, so let’s move on to the Process Monitor output. Similar to the activity during the Check-Out operation, lsass.exe is accessing the Active Directory database, pulling the GPO information from the corresponding GP Container.
The security event log should have events correlating to the removal of the Offline GPO. Look for Event ID: 5136.
In Closing
In this third installment, I covered part of a procedure repeated every time there’s need to modify a GPO within AGPM. To rehash from Part 2 of this blog series, during the Check-Out of a GPO, the following steps are performed:
The Archive copy of the GPO is copied to a temp folder.
During the Check-In process, we have observed the following:
A new Archive path is created with a new GUID
From this information, we can make a few important connections: any changes made to an AGPM-controlled GPO outside of the AGPM console (i.e. the rogue Domain Admin that doesn’t bother with the AGPM console, and edits the GPO directly through GPMC.msc) are overwritten the next time the GPO is deployed from the AGPM console. Since the Check-Out procedure builds the editable “Offline” GPO from the AGPM Archive data, the Admin’s changes are not included automatically. We do have the option of using the “Import from…” feature to pull the settings from the production GPO again prior to the Check-Out, which updates the Archive data with any changes made outside of AGPM. As mentioned earlier, the Check-In operation does NOT commit the changes to the production GPO. We must follow the Check-In operation with a “Deploy” in order to have our changes released to production.
Complete series
http://blogs.technet.com/b/askds/archive/2011/01/31/agpm-production-gpos-under-the-hood.aspxhttp://blogs.technet.com/b/askds/archive/2011/04/04/agpm-operations-under-the-hood-part-2-check-out.aspxhttp://blogs.technet.com/b/askds/archive/2011/04/11/agpm-operations-under-the-hood-part-3-check-in.aspxhttp://blogs.technet.com/b/askds/archive/2011/04/26/agpm-operations-under-the-hood-part-4-import-and-export.aspx
Sean "my head will not shift when stored in the overhead compartment" Wright
Hey all, Ned here again. Jeff Sigman let me know that the new pre-beta version of Security Compliance Manager became available last month. It adds the number one feature request you’ve all been demanding: GPO Import.
Remember, this is a CTP release so keep it in test and out of production for now. If you are scratching your head at what SCM does and why you should be using it, check it out this and this. Really!
- Ned “SCMbag” Pyle
Hi folks, Ned here again. Around six years ago we released Service Pack 1 for Windows Server 2003. Like Windows XP SP2, it was a security-focused update. It was the first major server update since the Trustworthy Computing initiative began so there were things like a bootstrapping firewall, Data Execution Protection, and the Security Configuration Wizard.
Amongst all this, the RPC developers added these new configurable group policy settings:
Computer Configuration \ <policies> \ Administrative Templates \ System \ Remote Procedure Call Restrictions for unauthenticated RPC clients RPC endpoint mapper client authentication
Computer Configuration \ <policies> \ Administrative Templates \ System \ Remote Procedure Call
Restrictions for unauthenticated RPC clients RPC endpoint mapper client authentication
Which map to the DWORD registry settings:
HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT\Rpc EnableAuthEpResolution RestrictRemoteClients
These two settings add an additional authentication "callback capability" to RPC connections. Ordinarily, no authentication is required to make the initial connection to the endpoint mapper (EPM). The EPM is the network service that tells a client what TCP/UDP ports to use in further communications. In Windows, those further communications to the actual application are what typically get authenticated and encrypted. For example, DFSR is an RPC application that uses RPC_C_AUTHN_LEVEL_PKT_PRIVACY with Kerberos required, with Mutual Auth required, and with Impersonation blocked. The EPM connection not requiring authentication is not critical, as there is no application data transmitted: EPM is like a phone book or perhaps more appropriately, a switchboard with an operator.
That quest for Trustworthy Computing added these extra security policies. In doing so, it introduced a very dangerous scenario for domain-based computing: one of the possible policy settings requires all applications that initiate the RPC conversation send along this authentication data or be able to understand a callback request to authenticate.
The problem is most applications have no idea how to satisfy the setting's requirements.
One of the options for Restrictions for unauthenticated RPC clients is "Authenticated without Exceptions".
When enabled, RPC applications are required to authenticate to RPC service on the destination computer. If your application doesn't know how to do this, it is no longer allowed to connect at all.
Which brings us to…
Having configured this policy in your domain on your DCs, members, and clients, you will now see the following issues no matter your credentials or admin rights:
GPUPDATE /FORCE returns:
The processing of Group Policy failed. Windows could not resolve the computer name. This could be caused by one of more of the following: a) Name Resolution failure on the current domain controller. b) Active Directory Replication Latency (an account created on another domain controller has not replicated to the current domain controller). Computer Policy update has completed successfully. To diagnose the failure, review the event log or invoke gpmc.msc to access information about Group Policy results.
The System Event log returns errors 1053 and 1055 for group policy:
The processing of Group Policy failed. Windows could not resolve the user name. This could be caused by one of more of the following: a) Name Resolution failure on the current domain controller. b) Active Directory Replication Latency (an account created on another domain controller has not replicated to the current domain controller).
The Group Policy Operational event log will show error 7320:
Error: retrieved account information. Error code 0x5. Error: Failed to register for connectivity notification. Error code 0x32.
Repadmin.exe returns: DsBindWithCred to RPC <servername> failed with status 5 (0x5) DSSites.msc returns:
Repadmin.exe returns:
DsBindWithCred to RPC <servername> failed with status 5 (0x5)
DSSites.msc returns:
Directory Service event log returns:
Warning 1655: Active Directory Domain Services attempted to communicate with the following global catalog and the attempts were unsuccessful. Global catalog: \\somedc.cohowineyard.com The operation in progress might be unable to continue. Active Directory Domain Services will use the domain controller locator to try to find an available global catalog server. Additional Data Error value: 5 Access is denied. Error 1126:
Warning 1655: Active Directory Domain Services attempted to communicate with the following global catalog and the attempts were unsuccessful. Global catalog: \\somedc.cohowineyard.com The operation in progress might be unable to continue. Active Directory Domain Services will use the domain controller locator to try to find an available global catalog server. Additional Data Error value: 5 Access is denied.
Error 1126:
Active Directory Domain Services was unable to establish a connection with the global catalog. Additional Data Error value: 1355 The specified domain either does not exist or could not be contacted. Internal ID: 3200e7b
Warning 2092:
This server is the owner of the following FSMO role, but does not consider it valid. For the partition which contains the FSMO, this server has not replicated successfully with any of its partners since this server has been restarted. Replication errors are preventing validation of this role. Operations which require contacting a FSMO operation master will fail until this condition is corrected.
Changing the primary domain DNS name of this computer to "" failed. The name will remain "<something>". The error was: Access is denied
After failed join above, rebooting computer and attempting a domain logon fails with error: The security database on the server does not have a computer account for this workstation trust relationship.
After failed join above, rebooting computer and attempting a domain logon fails with error:
The security database on the server does not have a computer account for this workstation trust relationship.
Win32: Access is denied.
You do not have sufficient permissions to complete the operation
You do not have access rights to logical disk manager
Either the machine does not exist or you don't have permission to access this machine
Domain Controller is unreachable Cannot access the local WMI repository Cannot connect to reporting DCOM server
DFSR Event log error 1202:
The DFS Replication service failed to contact domain controller to access configuration information. Replication is stopped. The service will try again during the next configuration polling cycle, which will occur in 60 minutes. This event can be caused by TCP/IP connectivity, firewall, Active Directory Domain Services, or DNS issues. error: 160 (one or more arguments are not correct)
The DFS Replication service failed to contact domain controller to access configuration information. Replication is stopped. The service will try again during the next configuration polling cycle, which will occur in 60 minutes. This event can be caused by TCP/IP connectivity, firewall, Active Directory Domain Services, or DNS issues.
error: 160 (one or more arguments are not correct)
"Unable to connect to the Primary DC's AD. Please make sure that the PDC is reachable and retry the command later"
Could not bind to a Domain Controller. Will try again at next polling cycle.
You do not have the correct permissions to open the Windows Firewall with Advanced Security Console. Error code: 0x5
Connection to the Virtual Disk Service failed. A VDS (Virtual Disk Service) error occurred while performing the requested operation.
Access is denied.
The Windows Server Backup engine is not accessible on the computer that you want to manage backups on. Make sure you are a member of the Administrators or Backup Operators group on that computer.
Access is Denied
Note how the client (10.90.0.94) attempts to bind to the EPM on a DC (10.90.0.101) and gets rejected with status 0x5 (Access is Denied).
Depending on the calling application - in this case, the Group Policy service running on a Win7 client that is trying to refresh policy - it may continue to try binding many times before giving up. Again, the DC responds with the unhelpful error "REASON_NOT_SPECIFIED" and keeps rejecting the GP service.
For comparison, a normal working EPM bind of the GP service looks like this:
Anyone notice the Catch-22 above? If you deployed this setting using domain-based group policy to your DCs, you have no way to undo it! This is another example of “always test security changes before deploying to production”. Many virtualization products are free, like Hyper-V and Virtual PC – even a single virtualized DC environment would have shown gross problems after you tried to use this policy.
To fix your environment:
1. You must delete or unlink the whole policy that includes this RPC setting:
2. Delete or rename this specific policy's GUID folder from each DCs SYSVOL folders (remember, file replication is not working so it must be done on all individual servers).
3. Manually visit all DCs and delete the RestrictRemoteClients registry setting.
4. Reboot all DCs to get your domain back in operation. Not all at once, of course!
These are only the affected Windows in-box applications and components that I have identified. The full list probably includes 99% of all third party RPC applications ever written.
Some security audit consulting company may ask you to turn this policy on to be compliant with their standards. Make sure you show them this article and make them explain why. You can also point out that our Security Compliance Manager tool does not recommend enabling "Authenticated without Exceptions" even in Specialized Security Limited Functionality networks (and SSLF is far too restrictive for most businesses). This setting is really only useful in an unmanaged, standalone, non-domain joined member computer environment such as a DMZ network where you want to close an RPC connection vector. Probably just web servers with local policy.
You should always get in-depth explanation from any third party security audit's findings and recommendations; many a CritSit case here started with a customer implicitly trusting an auditor's recommendations. That auditor is not going to be there to troubleshoot for you when everything goes to crap. Disconnecting all your DCs from the network makes them more secure. So does disabling all your user accounts. Neither is practical.
If you absolutely must turn on Restrictions for unauthenticated RPC clients, make sure it is set only to "Authenticated", and guarantee RPC endpoint mapper client authentication is also enabled. Then test like your job depends on it - because it does. Your applications may still fail with this setting in its less restrictive mode. Not all group policies are intended for domains.
By the way, if you are a software development company you should be giving the Security Development Lifecycle a frank appraisal. It is a completely free force for good.
Ned "2005? I am feeling old" Pyle
Ned here. The Remote Server Administration Toolkit update to support Windows 7 Service Pack 1 has released. Come and get it:
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=7d2f6ad7-656b-4313-a005-4e344e43997d
- Ned “all complaints go here” Pyle
The series:
Designing and Implementing a PKI: Part I Design and Planning
Designing and Implementing a PKI: Part II Implementation Phases and Certificate Authority Installation
Designing and Implementing a PKI: Part III Certificate Templates
Designing and Implementing a PKI: Part IV Configuring SSL for Web Enrollment and Enabling Key Archival
Designing and Implementing a PKI: Part V Disaster Recovery
Chris here again. We are now going to move onto Disaster Recovery. One of the many tasks you want to complete during the planning phase is to plan for disaster recovery. When planning for disaster recovery not only is the backup/restore process important, but the actual design of the PKI can affect how resilient your PKI infrastructure is. Additionally, proper planning can alleviate the impact of a system failure.
When the system hosting Certificates Services becomes unusable due to a failure, there are a couple of consequences of that failure.
1. The CA can no longer sign its Certificate Revocation List(CRL) or delta CRL(dCRL) 2. The CA can no longer issue certificates. 3. The CA database includes a record of certificates that have been issue or revoked, and is unavailable until the CA is recovered.
1. The CA can no longer sign its Certificate Revocation List(CRL) or delta CRL(dCRL)
2. The CA can no longer issue certificates.
3. The CA database includes a record of certificates that have been issue or revoked, and is unavailable until the CA is recovered.
CRLs and delta CRLs are used by clients to determine if a certificate has been revoked. In general, applications will fail when they cannot determine the revocation status for a certificate, though some applications have the ability to disable revocation checking while others do not.
Like certificates, CRLs and delta CRLs have a period during which they are valid. Once the CRL and/or delta CRL expires an application checking the revocation status of a certificate against the expired CRL will fail. The point of this discussion is that typically the first impact you will see when a Certification Authority fails is the inability of applications to the check revocation status of any certificates.
When you design and implement a PKI you configure the validity period of the CA’s CRL and delta CRL. This design consideration has an impact in terms of disaster recovery. The maximum time you have after a CA failure to institute your recovery process without impacting certificate validation is determined by these settings.
Example 1. You have an issuing Certification Authority and it is publishing a base CRL once every 7 days and delta CRL once every day. You have approximately 24 hours since the last delta CRL was published to either restore the CA or re-sign the delta CRL before certificate validation starts failing.
Example 2. You have an issuing Certification Authority and it signs a CRL once every 7 days, but is not configured to publish a delta CRL. In this scenario you have 7 days – (the number of days since the base CRL was signed) before validation will begin to fail due to the inability to check revocation status against a valid CRL.
Mitigation
There are several ways that you can minimize the impact that a CA failure will have on certificate validation.
One way is to install a clustered issuing certification authority. If the active node of the cluster fails the CA can be failed over to the second node. Clustering, however, will not protect against the failure of a shared component such as storage or a Hardware Security Module (HSM). So these devices should have methods to provide failover as well, if possible.
Another option is to increase the period the base and delta CRL publication intervals (and hence, their validity periods). This can potentially give you more time to kick off your recovery process, but if the CA fails shortly before the new base or delta CRL is about to be published increasing the publication interval has done little good. One must also realize there is a trade-off involved here. Increasing the publication interval means that it will take longer for certificate consumers to become aware that a certificate has been revoked and added to the CRL.
A more complicated strategy is to set the automatic publishing interval to a longer period, and then manually publish the CRL more often. In other words you set the CRL publication interval to 7 days, and then publish a new CRL every day. This way, if the CA fails you have 6 or 7 days recognize the problem and start your recovery process. The Windows CA does not automatically publish CRLs in this fashion, but you can set up a scheduled task on the CA server to publish the CRL every 24 hours using the command line utility, certutil.exe. The command certutil -crl will will instruct the CA to publish a new base CRL with the validity period defined in the CA configuration.
There are also some group policies that you can consider as part of your overall disaster recovery planning. If you have workstations and servers running Windows Vista, 7, Server 2008, or Server 2008 R2 there is a group policy setting that extends the period of time for which the OS will consider a given CRL valid, independent of the actual validity period of the CRL. The group policy setting is located in the following location:
Computer Configuration\Windows Settings\Security Settings\Public Key Policies\Certificate Path Validation Settings.
This setting forces the client to consider the CRL or OCSP response to be valid for longer than it actually is. Below is a screenshot of the specific settings:
Recovery
In terms of recovery there is a short term workaround and a long term resolution. The short term workaround is to use a process called CRL re-signing to manually re-sign an existing CRL and extend its validity period. By doing this, you can give yourself additional time to recover the CA. CRL re-signing requires that you have a backup of the CA’s public/private key pair. I will be covering this process later in this blog posting.
The longer term fix is to restore the certification authority. This of course is not possible unless you have previously backed up the certification authority. I will also cover this later in the blog post.
Another issue that occurs when you have a CA failure is that it can no longer issue certificates. In some scenarios where certificates are issued less frequently, the inability to issue certificates may not have a business impact. In other cases, however, the impact could be considerable. For example, if a CA dedicated to issuing certificates for Network Access Protection (NAP) fails the problem would be almost immediately noticeable. NAP certificates have a lifetime of only 24 hours, so a failed CA can be a considerable problem.
One way to eliminate this issue completely is to have multiple CAs that are issuing certificates based on the same certificate templates. In this way, if one CA fails clients can still enroll for certificates on one of the other certificate authorities.
A clustered issuing certification authority is another way to mitigate against a failed CA. If one of the CAs in the cluster fails the cluster will fail over to the second node. Clustering, as mentioned earlier, will not protect against the failure of a shared component such as storage or an HSM. I’ll re-iterate the need for these devices to have methods for failover as well.
Ultimately, recovering from the inability to issue certificates can be resolved by recovering the failed certification authority or installing a new issuing certification authority to issue certificates. The preferred method would be to restore the failed certification authority since it already has information about issued certificates in its CA Database.
By default, the CA database contains a copy of every certificate issued, every certificate that has been revoked, and a copy of failed and pending requests. The CA Manager may decide, however, to clear out any expired certificates from the CA database in order to recover free space in the database.
Note: In Windows Server 2008 R2 you can configure a template such that issued certificates based on that template are not stored in the CA database. These so call “ephemeral certificates” generally have validity periods shorter than the publication interval of the issuing CA, so recording them so they can be later revoked makes little sense. Further, these short-lived certificates may be issued in great numbers and with great frequency. Storing them in the database can dramatically increase the database’s rate of growth. Certificates issued for NAP are examples of these ephemeral certificates.
If a CA is configured for key Archival and Recovery, the CA database will also contain the private keys for any certificates whose templates are configured for archival. Failure to recover the CA database in this case would result in losing all of these archived keys.
When a certificate authority fails the database is unavailable which makes it difficult to revoke certificates that were previously issued by the CA. It also makes it impossible to recover any certificates that have been archived in the database. Again, the database will be unavailable when the CA is unavailable. However, in rare circumstances it is possible that the CA database can become corrupted. Like all ESE databases, the CA database can be affected by hardware or disk issues that impact the database or log files.
One option to mitigate the database becoming unavailable due to a CA failure is to set up a clustered certification authority. Another option is to take regular backups of the CA. If the CA fails, you can then restore the CA from the backup. Below I discuss options for backing up the CA as well as for restoring the CA.
For corrupt databases, repairs can be made with esentutil.exe. However, in most case it would be preferred to restore from a backup to avoid data loss that can be incurred when using some of the functions in esentutil.exe. Esentutil.exe can repair the structure of the database, but usually at the expense of the data stored within that structure.
There are two different ways to backup the Certification Authority. The first is through a System State backup. A system state backup will back up the entire CA as well as its configuration. If the private key is stored on the CA and not on an HSM, the private key will be backed up as well. Here is additional information on System State. A system state backup should be used when you will need to restore to the same hardware.
1. To start NT Backup, click Start then Run, type ntbackup.exe and press Enter. 2. If this is the first time you’ve run this tool, it will start the Welcome to the Backup or Restore Wizard. 3. Uncheck the Always start in wizard mode, and then click Cancel. 4. Launch NT Backup again. 5. Once NT Backup launches, select the Backup Tab, and check just System State as the item to backup. 6. Under the Backup media or file name section, select your backup media or file location where you wish to save the backup. 7. Click the Start Backup button. This will bring up the Backup Job Information dialogue box. 8. If you wish to start the backup immediately, click Start Backup. 9. If you wish to schedule the backup, click the Schedule button. 10. When prompted You must save the backup selections before you can schedule a backup. Do you want to save your current selections now?, click Yes. 11. Save the selection script. 12. After you save the selection script, the Scheduled Job Options dialogue box will open. Give the Job a name. Then click the Properties button. 13. Configure the desired schedule, and click OK. Then enter the credentials for the user that you wish the backup to run under. This account will need to either have Back up files and directories right or be a member of the Backup Operators group on the CA. Then click OK again. Click OK again, you will be prompted for the credentials again. 14. You can then click on the Schedule Jobs tab in NT Backup to check the schedule.
1. To start NT Backup, click Start then Run, type ntbackup.exe and press Enter.
2. If this is the first time you’ve run this tool, it will start the Welcome to the Backup or Restore Wizard.
3. Uncheck the Always start in wizard mode, and then click Cancel.
4. Launch NT Backup again.
5. Once NT Backup launches, select the Backup Tab, and check just System State as the item to backup.
6. Under the Backup media or file name section, select your backup media or file location where you wish to save the backup.
7. Click the Start Backup button. This will bring up the Backup Job Information dialogue box.
8. If you wish to start the backup immediately, click Start Backup.
9. If you wish to schedule the backup, click the Schedule button.
10. When prompted You must save the backup selections before you can schedule a backup. Do you want to save your current selections now?, click Yes.
11. Save the selection script.
12. After you save the selection script, the Scheduled Job Options dialogue box will open. Give the Job a name. Then click the Properties button.
13. Configure the desired schedule, and click OK. Then enter the credentials for the user that you wish the backup to run under. This account will need to either have Back up files and directories right or be a member of the Backup Operators group on the CA. Then click OK again. Click OK again, you will be prompted for the credentials again.
14. You can then click on the Schedule Jobs tab in NT Backup to check the schedule.
1. On the Windows Server 2003 system on which you plan on restoring system state, open the NT Backup utility. 2. Click on the Restore and Manage Media tab. 3. Navigate to the backup of the system state, make sure that System State is checked. Under Restore files to, make sure Original location is selected, and click Start Restore. 4. You will then be prompted that Restoring System State will always overwrite current System State unless restore to an alternate location. Click OK. Then click OK, to Confirm Restore. 5. When the Restore completes, click Close. 6. You will then be prompted to restart your computer, click Yes.
1. On the Windows Server 2003 system on which you plan on restoring system state, open the NT Backup utility.
2. Click on the Restore and Manage Media tab.
3. Navigate to the backup of the system state, make sure that System State is checked. Under Restore files to, make sure Original location is selected, and click Start Restore.
4. You will then be prompted that Restoring System State will always overwrite current System State unless restore to an alternate location. Click OK. Then click OK, to Confirm Restore.
5. When the Restore completes, click Close.
6. You will then be prompted to restart your computer, click Yes.
1. If you have not installed Windows Backup, you will first have to install this feature. Open Server Manager, select the Features node, then click Add Features. 2. In the Add Features Wizard, select Windows Server Backup Features, then click Next, and then Install. When the installation completes, click Close. 3. You can then launch the Windows Server Backup tool, by clicking Start, then Administrative Tools, then Windows Server Backup. 4. Also, to use Windows Server Backup, you have to have an additional drive or a network location to backup to. In other words you cannot save the backup on the system drive. 5. The wizard allows you to configure a one-time backup, or schedule a backup. 6. To schedule a backup, click Backup Schedule…, under the Actions sections of the Windows Server Backup tool. 7. This will start the Backup Schedule Wizard, click Next. 8. On the Select Backup Configuration page, select Custom, and then click Next. 9. On the Select Items for Backup page of the wizard, click the Add Items button. 10. Select System State, and click OK, then click Next. 11. On the Specify Backup Time page of the wizard, select the time that you would like the backup to be scheduled for, and click Next. 12. On the Specify Destination Type page of the wizard, select either Hard Disk, Volume, or Shared Network Folder, and click Next. In this example, I am selecting Hard Disk 13. Select the Hard Disk you would like to use for backup, if it is not listed, click Show All Available Disks…, and select the appropriate disk, and click OK. Click Next. 14. You will be prompted that the disk will be reformatted and existing volumes will be deleted, click Yes if you are using this disk solely for backups, if not choose another backup destination. 15. On the Confirmation page, click Finish. 16. On the Summary page, click Close.
1. If you have not installed Windows Backup, you will first have to install this feature. Open Server Manager, select the Features node, then click Add Features.
2. In the Add Features Wizard, select Windows Server Backup Features, then click Next, and then Install. When the installation completes, click Close.
3. You can then launch the Windows Server Backup tool, by clicking Start, then Administrative Tools, then Windows Server Backup.
4. Also, to use Windows Server Backup, you have to have an additional drive or a network location to backup to. In other words you cannot save the backup on the system drive.
5. The wizard allows you to configure a one-time backup, or schedule a backup.
6. To schedule a backup, click Backup Schedule…, under the Actions sections of the Windows Server Backup tool.
7. This will start the Backup Schedule Wizard, click Next.
8. On the Select Backup Configuration page, select Custom, and then click Next.
9. On the Select Items for Backup page of the wizard, click the Add Items button.
10. Select System State, and click OK, then click Next.
11. On the Specify Backup Time page of the wizard, select the time that you would like the backup to be scheduled for, and click Next.
12. On the Specify Destination Type page of the wizard, select either Hard Disk, Volume, or Shared Network Folder, and click Next. In this example, I am selecting Hard Disk
13. Select the Hard Disk you would like to use for backup, if it is not listed, click Show All Available Disks…, and select the appropriate disk, and click OK. Click Next.
14. You will be prompted that the disk will be reformatted and existing volumes will be deleted, click Yes if you are using this disk solely for backups, if not choose another backup destination.
15. On the Confirmation page, click Finish.
16. On the Summary page, click Close.
1. In the Actions page of the Windows Server Backup tool, click Recover… 2. This will start the Recovery Wizard, select the location of the backup, and click Next. 3. On the Select Backup Date of the wizard, select the date and time of the backup and click Next. 4. On the Select Recovery Type, select System state, and click Next. 5. On the Select Location for System State Recovery page, select Original location, and click Next. 6. On the Confirmation page of the wizard, click the Recover button. 7. You will be prompted that the recovery cannot be paused or cancelled once started, click Yes.
1. In the Actions page of the Windows Server Backup tool, click Recover…
2. This will start the Recovery Wizard, select the location of the backup, and click Next.
3. On the Select Backup Date of the wizard, select the date and time of the backup and click Next.
4. On the Select Recovery Type, select System state, and click Next.
5. On the Select Location for System State Recovery page, select Original location, and click Next.
6. On the Confirmation page of the wizard, click the Recover button.
7. You will be prompted that the recovery cannot be paused or cancelled once started, click Yes.
A good guide to user for backing up and restoring a certification authority is:
298138 How to move a certification authority to another server http://support.microsoft.com/default.aspx?scid=kb;EN-US;298138
Steps 1 through 3 of this document cover manually backing up the CA.
Essentially, you want to do a manual back up of the private key, CA certificate, and CA database. If you are using an HSM to protect the private key pair, you will either need to backup the private key through a method provide by the HSM vendor or have a highly available configuration for the HSMs. In general, if the private key is stored on an HSM, you do not want to backup the private key to any type of media, as this will degrade the overall security and protection of the private key. The configuration for the Certification Authority is stored in the registry so you would want to backup that registry location as well. The registry location is HKLM\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>.
Generally the private key, CA certificate and CA configuration are going to remain relatively static. You will, however, need to perform a fresh backup should you ever renew the CA certificate or update the configuration. However, the CA database is going to grow over time as certificates are issued, requests are denied, and certificates are revoked, so you are going to want to periodically backup the database. How often you perform this back up will depend on how rapidly changes to the database are made and how tolerant you are to discrepancies between the back up and the live data.
The first time you run the backup you will want to back up the CA’s certificate and private key, the CA database, and the certificate database log. To perform this task through the GUI, open up the Certification Authority MMC snap-in (certsrv.msc).
1. Right click on the certification authority name and select All Tasks from the context menu, and then select Back up CA… 2. This will launch the Certification Authority Backup Wizard, click Next. 3. Select Private key and CA certificate and Certificate database and certificate database log. Browse to a local or network location to save the backup. The backup location must be an empty folder, and click Next. 4. Enter a password to protect the private key, and click Next, then Finish.
1. Right click on the certification authority name and select All Tasks from the context menu, and then select Back up CA…
2. This will launch the Certification Authority Backup Wizard, click Next.
3. Select Private key and CA certificate and Certificate database and certificate database log. Browse to a local or network location to save the backup. The backup location must be an empty folder, and click Next.
4. Enter a password to protect the private key, and click Next, then Finish.
To backup the CA via the command line, open an elevated command prompt and type certutil –backup Path. Path is the empty directory where the backed up information will be stored. You will then be prompted for a password to protect the private key. Enter the password and then press the Enter key. You will then be prompted to confirm the password. Confirm the password and press the Enter key. A message will be sent to the console indicating what has been backed up and that the certutil –backup command completed successfully.
To backup the registry run the following command: REG EXPORT "HKLM\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>" caconfig.reg
Copy caconfig.reg to your backup directory so that all the necessary data is in the same place.
Once you have completed a full back up of the Certification Authority, you can perform incremental backups of the CA database. Alternatively, you could choose to periodically backup the entire CA database.
Although, you can back up the database through the Certification Authority console, you will most likely want to use some sort of script of scheduled task to perform the backup periodically.
Once you relocate the server that will serve as the replacement for the failed CA, you must do some initial configuration of the server. Give that server the same name as the failed CA and join it to the same domain
Since you have brought online a new machine to be the CA we need to modify the security of Active Directory to allow the new machine to be able to update PKI configuration information in AD. This is because the new machine will have a new SID associated with the machine account, even though the machine account has the same name.
Open ADSIEDIT.MSC. Open the Configuration container of the Active Directory database. Browse to CN=Public Key Services, CN=Services, CN=Configuration. Next open the AIA container. Locate the object that is associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control. This will associate the permissions with the new account.
Next open the CDP container. Locate the container associated with the failed CA. Open that container and then select the CRL object contained within that container. Right click on the CRL object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control.
Next open the Enrollment Services container. Locate the object associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then click Advanced. In the Permissions tab of the Advanced Security Settings dialog box, click Add… Add the computer object for the CA. On the Permission Entry screen, select Allow for all Permissions except Full Control. Click OK 3 times.
Next open the KRA container. Locate the object that is associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control. This will associate the permissions with the new account.
Next we need to restore the Certification Authority. Log on with an account that has Enterprise Admin credentials. The first thing we will need to do is to install the Certification Authority Role. The instructions below are for a Windows Server 2008 and Windows Server 2008 R2 based CA. For exact procedures in Windows Server 2003. Please see the following article:
1. Open Server Manager. 2. Click on the Roles Node, then click Add Roles. 3. When the Add Roles Wizard opens, click Next. 4. Select Active Directory Certificate Services and click Next. 5. Then Click Next Again. 6. On the Select Role Services page of the wizard, select Certification Authority, and then click Next. 7. On the Specify Setup Type page of the wizard, Select Enterprise or Standalone depending on the configuration of the failed CA, and then click Next. 8. On the Specify CA Type page of the wizard, select either Root CA or Subordinate CA, depending on the configuration of the failed CA, and then click Next. 9. On the Set Up private key page of the wizard, select Use existing private key, and the sub-option of Select a certificate and use its associated private key, then click Next. 10. On the Select Existing Certificate page of the wizard, click Import. 11. Browse to the backup of the failed CA and select the P12 file from the backup, click Open. Then enter the password for the P12 file, and click OK. 12. Then click Next. 13. On the Configure Certificate Database page of the wizard, select the same database and log file locations as were specified on the failed CA, then click Next, then Install. 14. When the installation completes, click Close.
1. Open Server Manager.
2. Click on the Roles Node, then click Add Roles.
3. When the Add Roles Wizard opens, click Next.
4. Select Active Directory Certificate Services and click Next.
5. Then Click Next Again.
6. On the Select Role Services page of the wizard, select Certification Authority, and then click Next.
7. On the Specify Setup Type page of the wizard, Select Enterprise or Standalone depending on the configuration of the failed CA, and then click Next.
8. On the Specify CA Type page of the wizard, select either Root CA or Subordinate CA, depending on the configuration of the failed CA, and then click Next.
9. On the Set Up private key page of the wizard, select Use existing private key, and the sub-option of Select a certificate and use its associated private key, then click Next.
10. On the Select Existing Certificate page of the wizard, click Import.
11. Browse to the backup of the failed CA and select the P12 file from the backup, click Open. Then enter the password for the P12 file, and click OK.
12. Then click Next.
13. On the Configure Certificate Database page of the wizard, select the same database and log file locations as were specified on the failed CA, then click Next, then Install.
14. When the installation completes, click Close.
Open an elevated command prompt and use the following command to import the previously backed up CA configuration: REG IMPORT <Previously backed up registry file>.
At this point, you can restore the CA database from your backup.
1. Right click on the certification authority name and select All Tasks from the context menu, and then select Restore CA… 2. You will be prompted to stop Certificate Services. Click Ok. 3. When the Certification Authority Backup Wizard starts, click Next. 4. Select Certificate database and certificate database log. Browse to a local or network location of your previously saved backup. 5. Click Next. 6. Click Finish. 7. You will be prompted to restart the CA. Unless you have further incremental backups to restore, click Yes. If you have incremental backups then click No, and walk through the steps above to restore your incremental backups.
1. Right click on the certification authority name and select All Tasks from the context menu, and then select Restore CA…
2. You will be prompted to stop Certificate Services. Click Ok.
3. When the Certification Authority Backup Wizard starts, click Next.
4. Select Certificate database and certificate database log. Browse to a local or network location of your previously saved backup.
5. Click Next.
6. Click Finish.
7. You will be prompted to restart the CA. Unless you have further incremental backups to restore, click Yes. If you have incremental backups then click No, and walk through the steps above to restore your incremental backups.
Now if there were any additional Certificate Services roles such as Online Responder (OCSP) or Web Enrollment, you can go ahead and install those at this point.
CRL re-signing is a manual process whereby the Administrator can use the CA's backed up certificate and private keys to re-sign an existing CRL file. This process allows you to extend the lifetime of the existing CRL, and even add certificates to the CRL, effectively revoking them.
Importing the CA certificate and private key
To begin, you will need to have a backup of the private key of the CA. If you have the private key stored on an HSM, you will have to follow the HSM vendor’s instructions for making the private key available to another machine. If you are not using an HSM, perform the following to import the CA public and private key pair to the machine where you will be re-signing the CRLs.
1. Click Start, then Run, and type MMC, and the press Enter. 2. Select the File Menu, and then select Add/Remove Snap-in… 3. Select Certificates, and then click Add >. 4. Then select Computer account, and click Next. 5. Then select Local computer, and then click Finish. 6. Then click OK. 7. Expand the Certificates (Local Computer) node. 8. Right click on the Personal node, then select All Tasks from the context menu, and then select Import… 9. This will open the Certificate Import Wizard, click Next. 10. Click the Browse button, to browse to the P12 file located in the CA's backup location. 11. In the drop down for the extension type, select Personal Information Exchange (*.pfx;*.p12) 12. Locate the P12 file that was previously backed up, and click Open. 13. Click Next. 14. Type the Password for the P12 file and click Next, click Next again, and click Finish. 15. Click OK to acknowledge that the import was successful.
1. Click Start, then Run, and type MMC, and the press Enter.
2. Select the File Menu, and then select Add/Remove Snap-in…
3. Select Certificates, and then click Add >.
4. Then select Computer account, and click Next.
5. Then select Local computer, and then click Finish.
6. Then click OK.
7. Expand the Certificates (Local Computer) node.
8. Right click on the Personal node, then select All Tasks from the context menu, and then select Import…
9. This will open the Certificate Import Wizard, click Next.
10. Click the Browse button, to browse to the P12 file located in the CA's backup location.
11. In the drop down for the extension type, select Personal Information Exchange (*.pfx;*.p12)
12. Locate the P12 file that was previously backed up, and click Open.
13. Click Next.
14. Type the Password for the P12 file and click Next, click Next again, and click Finish.
15. Click OK to acknowledge that the import was successful.
To re-sign the CRL and Delta CRL with the same validity period as they have been previously published, use the following command:
certutil -sign <existing CRL file name> <re-signed CRL file name> http://technet.microsoft.com/en-us/library/cc782041(WS.10).aspx
certutil -sign <existing CRL file name> <re-signed CRL file name>
http://technet.microsoft.com/en-us/library/cc782041(WS.10).aspx
You will then have to manually publish the CRL to all CDP locations.
If you wish to adjust the validity period you can specify the validity period at the end of command in the following format DD:HH, where D=Days, and H=Hours. For example, the following command would re-sign a CRL that is valid for 14 days:
certutil -sign <existing CRL file name> <resigned CRL file name> 14:00
If you wish to add one or more issued certificates to the CRL, you specify the serial numbers in a comma separated list on the command line. For example, the following command would add serial numbers to the CRL:
certutil -sign <exiting CRL file name> <resigned CRL file name> +SerialNumber1,SerialNumber2,SerialNumber3
When building a PKI infrastructure it is critical to take into consideration how your design will have an effect on the availability of your PKI. However, the design also affects the way in which you may have to recover the CAs in the PKI.
You should definitely consider the criticality of PKI to your environment, and how much downtime is acceptable. This will help drive your decisions when designing the PKI and implementing the Certification Authorities.
Also, many customers make the mistake of either not being aware of how to recover a Certification Authority or do not have a documented process for doing so. When designing and implementing your PKI, I recommend that you test recovery and document the recovery steps for CAs in your PKI.
Chris "CLEAR!" Delay