Blog - Title

April, 2011

  • You probably don't need ACCTINFO2.DLL

    Hi folks, Ned here again. Customers periodically ask us for a rumored replacement for the Windows 2000 acctinfo.dll that works on 64-bit Windows 7 and Windows Server 2008 R2. That old DLL added an extra tab to the Active Directory Users and Computers snap-in to centralize some user account info:

    image

    Pretty cool. You can see account lockout status, last logon time, account is locked out, and other goo.

    Even though that newer unsupported acctinfo2.dll file exists - yes, even in x64 format - there is a supported way to see this info as long as your admins use Win7 + RSAT or Win2008 R2.

    Some Background

    Windows Server 2008 R2 introduced a new service called the Active Directory Management Gateway (also called the AD Web Service). AD PowerShell uses this component as a sort of proxy into Active Directory. This is fine if you are native Windows Server 2008 R2 and Windows 7, but for most companies, that's going to take a while. While this includes quasi-web traffic - the .Net Negotiated Stream protocol - we sign and seal all LDAP traffic. You can further secure the traffic by deploying Domain Controller or Domain Controller Auth certificates - something that happens automatically with the deployment of an Enterprise PKI in your domain.

    Knowing this, we released an out of band version of this service - you can grab it here. Install it and its support files on Windows Server 2003 or Windows Server 2008 and your shiny new Win7 clients can run AD PowerShell even if you don't have a single Win2008 R2 server in the forest. The service is fine to install only a few DCs as well - as long as those are up and routable to Win7 clients, you’re good to go; not all DCs need run the service. Your Windows 7 clients will find the updated servers automatically, or you can point to them specifically.

    The New Way to Look at Users

    The Active Directory Administrative Center is another new component introduced by Windows Server 2008 R2. Many admins gave it a glance, thought to themselves "another ADUC, why bother?", and went back to their familiar old tool. If you like acctinfo.dll though, you should like ADAC.

    With Win7 RSAT installed and the AD tools enabled (or RDP'ed into your Win2008 R2 servers for AD administration), run DSAC.EXE. You'll see this:

    image

    You can browse to users or search for them. I'll search for Sarah Davis:

    image

    image

    I open the user and so far, it looks pretty much like ADUC.

    image

    It saved me a click of the "Account" tab to see things like logon name and password options. However, I want all that account status goo!

    Say, what does that little nubbin arrow icon down in the lower left do?

    image

    <Clickity click>

    image

    Blammo!

    Compare that to the old acctinfo.dll. :) Password info, lockout info, and logon info. Even the SID and GUID. There are a few things missing like "account unlocks at such and such date" but their usefulness is questionable - if you care that someone is locked out, you mostly care about seeing when they locked and unlocking now. Acctinfo2.dll doesn't show that either, btw. Moreover, if you want to change someone's password on a certain DC, just retarget to that DC. Better yet, always target the PDCE; DCs contact the PDCE after a failed logon for another try, as it is likely to have the latest password from so-called urgent replication. In addition, account lockouts should be the most up to date on a PDCE.

    Oh, and stop using account lockout policy anyway, it's vile. Use complex passwords and get ACS to track for brute force bad password attempts. Turning on account lockouts is a way to guarantee someone with no credentials can deny service to your entire domain. Locking out your boss' account will be a very convincing demonstration… mmm, maybe just a test admin account. J

    While we're in here

    Need to see the “Attribute Editor” extension on RSAT client versions of ADAC to get more "ADUC parity" at examining users? You can enable that tab by adding the {c7436f12-a27f-4cab-aaca-2bd27ed1b773} displayspecifier with these steps (again, using my sample domain cohowinery.com as the example):

    1. Logon as an enterprise admin.

    2. Run ADSIEDIT.MSC and connect to the Configuration Partition.

    3. Navigate to:

    CN=user-Display,CN=409,CN=DisplaySpecifiers,CN=Configuration,DC=cohowinery,DC=com

    Note: 409 is "US English". You will need to repeat all these steps for any other languages used in your environment.

    4. Edit user-Display container.

    5. Edit the adminPropertyPages attribute.

    6. Determine the highest index value listed. For example in mine, it was 10.

    7. Add the GUID {c7436f12-a27f-4cab-aaca-2bd27ed1b773} set with the next highest index number. For example, here I added:

    11,{c7436f12-a27f-4cab-aaca-2bd27ed1b773}

    image

    8. Close adsiedit. Replicate this value to all DCs in the forest through natural convergence or forcing with repadmin.exe.

    9. Once done replicating to all DCs, anytime you start DSAC.EXE all users will show the Attribute Editor.

    image

    Note: before you get all froggy about these steps - if you end up using acctinfo2.dll you will have to modify its displaySpecifier data as well. :)

    Update 12/10/2012

    Your feedback in the Comments was heard in Windows Server 2012. ADAC now also shows Password Last Set and Password Expiration info:

    A final note

    You will find a great many copies of acctinfo2.dll floating around, but none hosted on Microsoft websites (we never released it publically, it was just a side-project for a Support engineer here in Charlotte). Before you install those, consider this: you plan to load a DLL from some random place on the Internet into one of your most powerful AD admin tools, and then run that tool as a Domain Admin. And you have no way to know if that's some leaked MS version of the file or one adulterated by hackers.

    Still sound like a good plan?

    If you absolutely must use this DLL, you should only get it by contacting MS through a support case. If I wrote malware, an admin-only file injected into AD management tools would be my first choice for pwnage.

    Until next time.

    Ned "the guy that wrote acctinfo2.dll has ostrich legs" Pyle

  • Restrictions for Unauthenticated RPC Clients: The group policy that punches your domain in the face

    Hi folks, Ned here again. Around six years ago we released Service Pack 1 for Windows Server 2003. Like Windows XP SP2, it was a security-focused update. It was the first major server update since the Trustworthy Computing initiative began so there were things like a bootstrapping firewall, Data Execution Protection, and the Security Configuration Wizard.

    Amongst all this, the RPC developers added these new configurable group policy settings:

    Computer Configuration \ <policies> \ Administrative Templates \ System \ Remote Procedure Call

    Restrictions for unauthenticated RPC clients
    RPC endpoint mapper client authentication

    Which map to the DWORD registry settings:

    HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT\Rpc
    EnableAuthEpResolution
    RestrictRemoteClients

    These two settings add an additional authentication "callback capability" to RPC connections. Ordinarily, no authentication is required to make the initial connection to the endpoint mapper (EPM). The EPM is the network service that tells a client what TCP/UDP ports to use in further communications. In Windows, those further communications to the actual application are what typically get authenticated and encrypted. For example, DFSR is an RPC application that uses RPC_C_AUTHN_LEVEL_PKT_PRIVACY with Kerberos required, with Mutual Auth required, and with Impersonation blocked. The EPM connection not requiring authentication is not critical, as there is no application data transmitted: EPM is like a phone book or perhaps more appropriately, a switchboard with an operator.

    That quest for Trustworthy Computing added these extra security policies. In doing so, it introduced a very dangerous scenario for domain-based computing: one of the possible policy settings requires all applications that initiate the RPC conversation send along this authentication data or be able to understand a callback request to authenticate.

    The problem is most applications have no idea how to satisfy the setting's requirements.

    The Argument

    One of the options for Restrictions for unauthenticated RPC clients is "Authenticated without Exceptions".

    image

    When enabled, RPC applications are required to authenticate to RPC service on the destination computer. If your application doesn't know how to do this, it is no longer allowed to connect at all.

    Which brings us to…

    The Brawl

    Having configured this policy in your domain on your DCs, members, and clients, you will now see the following issues no matter your credentials or admin rights:

    Group policy fails to apply with errors:

    GPUPDATE /FORCE returns:

    The processing of Group Policy failed. Windows could not resolve the computer name. This could be caused by one of more of the following:
    a) Name Resolution failure on the current domain controller.
    b) Active Directory Replication Latency (an account created on another domain controller has not replicated to the current domain controller).
    Computer Policy update has completed successfully.
    To diagnose the failure, review the event log or invoke gpmc.msc to access information about Group Policy results
    .

    The System Event log returns errors 1053 and 1055 for group policy:

    The processing of Group Policy failed. Windows could not resolve the user name. This could be caused by one of more of the following:
    a) Name Resolution failure on the current domain controller.
    b) Active Directory Replication Latency (an account created on another domain controller has not replicated to the current domain controller).

    The Group Policy Operational event log will show error 7320:

    Error: retrieved account information. Error code 0x5.
    Error: Failed to register for connectivity notification. Error code 0x32.

    Active Directory Replication fails with errors:

    Repadmin.exe returns:

    DsBindWithCred to RPC <servername> failed with status 5 (0x5)

    DSSites.msc returns:

    image

    Directory Service event log returns:

    Warning 1655:
       
    Active Directory Domain Services attempted to communicate with the following global catalog and the attempts were unsuccessful.
    Global catalog:
    \\somedc.cohowineyard.com
    The operation in progress might be unable to continue. Active Directory Domain Services will use the domain controller locator to try to find an available global catalog server.
    Additional Data
    Error value:
    5 Access is denied.

    Error 1126:

    Active Directory Domain Services was unable to establish a connection with the global catalog.
     
    Additional Data
    Error value:
    1355 The specified domain either does not exist or could not be contacted.
    Internal ID:
    3200e7b

    Warning 2092:

    This server is the owner of the following FSMO role, but does not consider it valid. For the partition which contains the FSMO, this server has not replicated successfully with any of its partners since this server has been restarted. Replication errors are preventing validation of this role. Operations which require contacting a FSMO operation master will fail until this condition is corrected.

    Domain join fails with error:

    Changing the primary domain DNS name of this computer to "" failed.
    The name will remain "<something>".
    The error was:
    Access is denied

    image

    After failed join above, rebooting computer and attempting a domain logon fails with error:

    The security database on the server does not have a computer account for this workstation trust relationship.

    image

    Remotely connecting to WMI returns error:

    Win32: Access is denied.

    image

    Remotely connecting to Routing and Remote Access returns error:

    You do not have sufficient permissions to complete the operation

    image

    Remotely connecting to Disk Management returns error:

    You do not have access rights to logical disk manager

    image

    Remotely connecting to Component Services (DCOM) returns error:

    Either the machine does not exist or you don't have permission to access this machine

    image

    Running DFSR Health Reports returns errors:

    Domain Controller is unreachable
    Cannot access the local WMI repository
    Cannot connect to reporting DCOM server

    image

    DFSR does not replicate nor start initial sync, with errors:

    DFSR Event log error 1202:

    The DFS Replication service failed to contact domain controller to access configuration information. Replication is stopped. The service will try again during the next configuration polling cycle, which will occur in 60 minutes. This event can be caused by TCP/IP connectivity, firewall, Active Directory Domain Services, or DNS issues.

    error: 160 (one or more arguments are not correct)

    DFSRMIG does not allow configuration of SYSVOL migration and returns error:

    "Unable to connect to the Primary DC's AD. Please make sure that the PDC is reachable and retry the command later"

    FRS does not replicate and returns event log warning 13562:

    Could not bind to a Domain Controller. Will try again at next polling cycle.

    Remotely connecting to Windows Firewall with Advanced Security returns error:

    You do not have the correct permissions to open the Windows Firewall with Advanced Security Console.
    Error code: 0x5

    image

    Remotely connecting to Share and Storage Management returns error:

    Connection to the Virtual Disk Service failed. A VDS (Virtual Disk Service) error occurred while performing the requested operation.

    image

    Remotely connecting to Storage Explorer returns error:

    Access is denied.

    image

    Remotely connecting to Windows Server Backup returns error:

    The Windows Server Backup engine is not accessible on the computer that you want to manage backups on. Make sure you are a member of the Administrators or Backup Operators group on that computer.

    image
    Remotely connecting to DHCP Management returns error:

    Access is Denied

    RPC Endpoint connections seen through network capture shows errors:

    Note how the client (10.90.0.94) attempts to bind to the EPM on a DC (10.90.0.101) and gets rejected with status 0x5 (Access is Denied).

    image

    Depending on the calling application - in this case, the Group Policy service running on a Win7 client that is trying to refresh policy - it may continue to try binding many times before giving up. Again, the DC responds with the unhelpful error "REASON_NOT_SPECIFIED" and keeps rejecting the GP service.

    image

    For comparison, a normal working EPM bind of the GP service looks like this:

    image

    Restitution

    Anyone notice the Catch-22 above? If you deployed this setting using domain-based group policy to your DCs, you have no way to undo it!  This is another example of “always test security changes before deploying to production”. Many virtualization products are free, like Hyper-V and Virtual PC – even a single virtualized DC environment would have shown gross problems after you tried to use this policy.

    To fix your environment:

    1. You must delete or unlink the whole policy that includes this RPC setting:

    image

    2. Delete or rename this specific policy's GUID folder from each DCs SYSVOL folders (remember, file replication is not working so it must be done on all individual servers).

    image

    image

    3. Manually visit all DCs and delete the RestrictRemoteClients registry setting.

    image

    4. Reboot all DCs to get your domain back in operation. Not all at once, of course!

    These are only the affected Windows in-box applications and components that I have identified. The full list probably includes 99% of all third party RPC applications ever written.

    Parole

    Some security audit consulting company may ask you to turn this policy on to be compliant with their standards. Make sure you show them this article and make them explain why. You can also point out that our Security Compliance Manager tool does not recommend enabling "Authenticated without Exceptions" even in Specialized Security Limited Functionality networks (and SSLF is far too restrictive for most businesses). This setting is really only useful in an unmanaged, standalone, non-domain joined member computer environment such as a DMZ network where you want to close an RPC connection vector. Probably just web servers with local policy.

    You should always get in-depth explanation from any third party security audit's findings and recommendations; many a CritSit case here started with a customer implicitly trusting an auditor's recommendations. That auditor is not going to be there to troubleshoot for you when everything goes to crap. Disconnecting all your DCs from the network makes them more secure. So does disabling all your user accounts. Neither is practical.

    If you absolutely must turn on Restrictions for unauthenticated RPC clients, make sure it is set only to "Authenticated", and guarantee RPC endpoint mapper client authentication is also enabled. Then test like your job depends on it - because it does. Your applications may still fail with this setting in its less restrictive mode. Not all group policies are intended for domains.

    By the way, if you are a software development company you should be giving the Security Development Lifecycle a frank appraisal. It is a completely free force for good.

    Until next time.

    Ned "2005? I am feeling old" Pyle

  • Sites Sites Everywhere…

    …Without a DC to spare! Hey all, this is Sean. You may remember me from a few old ADFS posts. I’m no longer on the Directory Services team but I still do a lot of DS stuff in Premier Field Engineering (PFE). Anyway, I recently ran into a few “interesting” site topologies while in the field. I want to discuss them with you and explain their impact on automatic site coverage.

    When was the last time you created a site? Have you noticed that you can’t click OK until you select a site link to associate with the site?

    image
    Figure 1

    image
    Figure 2

    There’s a reason for this: even if a site has no domain controller in it, a site link is necessary between it and other sites. Domain Controllers cannot calculate site coverage without all of the sites being “linked” with site links. Unfortunately, once the site is created and added to a site link, you can remove it. Nothing will check to see if it’s contained in another site link first; that’s up to you.

    To understand this and its impact, let’s talk about a few other concepts.

    DNS

    Ah DNS. As DNS goes, so goes AD. There’s no way I can explain everything you need to know about DNS here, but I urge you to eat your TechNet Wheaties and read this document from top to bottom:

    How DNS Support for Active Directory Works
    http://technet.microsoft.com/en-us/library/cc759550(WS.10).aspx

    For our purposes, there are a few things I want to make sure you’re aware of. First, the Netlogon service on a domain controller is responsible for registering SRV records. These SRV records are what clients use to find services such as LDAP or Kerberos. DC’s register generic and site specific SRV records. A simple DNS structure would look something like this:

    image
    Figure 3

    These are generic records. They are not associated with any site in Active Directory.

    image
    Figure 4

    These are site-specific records. It is very inefficient for a client computer to have to talk to a DC outside of its own site for client logon and Active Directory searches. For more on optimal Site Topology design, please click here.

    The site specific records help clients find DCs offering the services they’re looking for that are closest to them.

    So how does that work? Enter the DCLocator…

    DCLocator

    I’m a client and I need to authenticate to a domain controller. How do I find a domain controller to authenticate me? To get the full details check out this link. I’ll give you a quick summary:

    Basically, it goes like this:

    1. Client does a DNS search for DC’s in _LDAP._TCP.dc._msdcs.domainname
    2. DNS server returns list of DC’s.
    3. Client sends an LDAP ping to a DC asking for the site it is in based on the clients IP address (IP address ONLY! The client’s subnet is NOT known to the DC).
    4. DC returns…
      1. The client’s site or the site that’s associated with the subnet that most matches the client’s IP (determined by comparing just the client’s IP to the subnet-to-site table Netlogon builds at startup).
      2. The site that the current domain controller is in.
      3. A flag (DSClosestFlag=0 or 1) that indicates if the current DC is in the site closest to the client.
    5. The client decides whether to use the current DC or to look for a closer option.
      1. Client uses the current DC if it’s in the client’s site or in the site closest to the client as indicated by DSClosestFlag reported by the DC.
      2. If DSClosestFlag indicates the current DC is not the closest, the client does a site specific DNS query to: _LDAP._TCP.sitename._sites.domainname (_LDAP or whatever service you happen to be looking for) and uses a returned domain controller.

    Automatic Site Coverage

    Great, so now we know a little about DNS and DCLocator. Without the site specific query, clients would access domain controllers randomly instead of the ones closest to them. Now let’s talk about the sites you have setup. Are they exclusively for AD? Maybe, but I’m guessing you have some sites without DCs for Configuration Manager or other Active Directory aware applications. Take the following site configuration for example:

    image
    Figure 5

    image
    Figure 6

    I’ve got 5 sites configured, HQ and Branch1 through 4. On the right you can see the site links I have created. HQ is my hub and each branch is setup with HQ in a site link…except 4. Also notice there are no domain controllers in Branch3 or Branch4.

    In this case, there are no DCs in Branch3. If I’m a client in Branch3, how do I know which domain controller to authenticate against? It’s no different, I still query DNS to see which DC is in that site. To keep me from picking a random DC somewhere in the forest, the DCs perform automatic site coverage. Every domain controller in the forest follows this procedure:

    1. Build a list of target sites — sites that have no domain controllers for this domain (the domain of the current domain controller).
    2. Build a list of candidate sites — sites that have domain controllers for this domain.
    3. For every target site, follow these steps:
      1. Build a list of candidate sites of which this domain is a member. (If none, do nothing.)
      2. Of these, build a list of sites that have the lowest site link cost to the target site. (If none, do nothing.)
      3. If more than one, break ties (reduce this list to one candidate site) by choosing the site with the largest number of domain controllers.
      4. If more than one, break ties by choosing the site that is first alphabetically.
      5. Register target-site-specific SRV records for the domain controllers for this domain in the selected site.

    That’s straight from the “How DNS Support for Active Directory Works” article listed in the DNS section. Notice I highlighted point “B”. Here it is, the reason I wanted to blog about this “…build a list of sites that have the lowest site link cost to the target site. (If none, do nothing.)”. Basically we’re saying that if the site that has no domain controller has no “lowest site link cost” then do NOTHING! Whaddya know, that’s exactly what happens:

    image
    Figure 7

    Do you see site “Branch4” listed in DNS anywhere? Nope, because it’s not part of any site link! What does this mean? Any client trying to do a site specific operation from Branch4 will ultimately end up using any domain controller instead of a domain controller closest to it based on site topology. This isn’t good! It’s as if a site and subnet is not defined for a client. In some regards it’s worse because we don’t log this information like we do about clients authenticating from undefined subnets. This would also cause problems with DFS site costed referrals (which all of your DC’s should have enabled - starting in Windows Server 2008 it is always enabled by default if the registry value is not set).

    Need some help with this? Ok. but only because PowerShell is so awesome!

    First things first:

    Launch Powershell

    You’ll need the “activedirectory” module.

    import-module activedirectory

    Now let’s get to the meat of this thing and get all of the sites listed in site links:

    $site_links = get-adobject -filter 'objectclass -eq "sitelink"' -searchbase 'cn=configuration,dc=contoso,dc=com' -properties sitelist | foreach {$_.sitelist} | sort | get-unique

    The first part of this command uses “get-adobject” to return the sitelist property of each site link. We then pipe it to “foreach” to get a list of the sites. Next we pipe that list to “sort”. Now we take that sorted list and pipe it to “get-unique”. This is necessary because different site links could contain the same sites (for example, site HQ is contained in each of my site links because it’s my hub). Finally we take this sorted, unique list of sites found in all of the site links and stuff them in a variable called $site_links:

    image
    Figure 8

    Next we need to get a list of sites:

    $sites = get-adobject -filter 'objectclass -eq "site"' -searchbase 'cn=configuration,dc=contoso,dc=com' -properties distinguishedname | foreach {$_.distinguishedname} | sort

    Here we use the same “get-adobject” but return the DN of each site. Next we pipe it to “foreach” to get our list, and then “sort” to get a sorted list. We don’t have to worry about duplicates here since you can’t create two sites with the same site name. In the end, $sites will contain this information from my lab:

    image
    Figure 9

    Finally, we use “compare-object” to compare $site_links to $sites.

    Compare-object -referenceobject $site_links -differenceobject $sites

    This will return the differences between the two objects. As we can see, the only difference is site Branch4:

    image
    Figure 10

    This means Branch4 is a site not contained in any site link. As I indicated earlier in the post, this could effectively cause a client computer to make inefficient queries to domain controllers that are not even close to its own site. If you have a computer that is having latency issues and a SET L reveals that the domain controller that authenticated the user is on the other side of the planet, you may need to get a report on your AD Sites and configure them optimally.

    Ok, now seriously, go check ’em …

    Sean “my shoe size would be 60 in Europe” Ivey

  • Designing and Implementing a PKI: Part IV Configuring SSL for Web Enrollment and Enabling Key Archival

    The series:

    Designing and Implementing a PKI: Part I Design and Planning

    Designing and Implementing a PKI: Part II Implementation Phases and Certificate Authority Installation

    Designing and Implementing a PKI: Part III Certificate Templates

    Designing and Implementing a PKI: Part IV Configuring SSL for Web Enrollment and Enabling Key Archival

    Designing and Implementing a PKI: Part V Disaster Recovery

    Chris here again. Today we are going to cover configuring SSL for the Web Enrollment website which will allow Windows Server 2008 and Windows clients to use the Web Enrollment website. We are also going to cover enabling Key Archival.

    Configuring SSL for Web Enrollment

    Windows Server 2008 (R2) requires SSL in order to connect to the Web Enrollment pages. The first thing that must be done after installing the Web Enrollment role is to enable SSL on the web site within IIS. To begin, I am going to go through the process to request an SSL certificate for my web server.

    Requesting an SSL Certificate

    In Part III I covered implementing certificate templates. The fictional company, Fabrikam, created a customized template for web servers called Fabrikam WebServer. This template was configured to construct the certificate subject information from Active Directory. When a server requests a Fabrikam Webserver certificate from the CA, the CA will place the DNS name of the server in the Subject of the issued certificate. Below are the steps to follow in order to request a certificate based on the Fabrikam WebServer template.

    Requesting an SSL Certificate

    In Part III I covered implementing certificate templates. The fictional company, Fabrikam, created a customized template for web servers called Fabrikam WebServer. This template was configured to construct the certificate subject information from Active Directory. When a server requests a Fabrikam Webserver certificate from the CA, the CA will place the DNS name of the server in the Subject of the issued certificate. Below are the steps to follow in order to request a certificate based on the Fabrikam WebServer template.

    1. Click Start button, then Run, and enter MMC in the Run box and click OK.

    2. Click on File from the menu bar and select Add/Remove Snap-in…

    3. Select Certificates and click the Add button

    4. When prompted for the context that the Certificates MMC should run in select Computer Account, and then click Next, then Finish.

    5. Click OK, to close the Add or Remove Snap-ins page.

    6. Expand Certificates (Local Computer), right-click on Personal, and select Request New Certificate… from the context menu.

    7. This starts the Certificate Enrollment wizard. Click Next to continue.

    8. Select the Fabrikam WebServer certificate template, and then click Next to request the certificate.

    9. As seen below, the certificate has been successfully requested. Click Finish to close the wizard.

    image

    Enabling SSL

    Now that the certificate has been requested, the next step is to bind the certificate to the default web site in IIS.

    To enable SSL for the Web Enrollment site on the CA server:

    1. Launch the IIS Manager MMC located in Administrative Tools.

    2. Expand the server name, then Sites, and then select Default Web Site.

    image

    3. In the Actions menu, select Bindings…

    4. The Site Bindings settings will open. Click Add…

    image

    5. Select https for Type, and select the appropriate certificate from the SSL certificate drop down. Review the settings, and click OK.

    image

    6. Click Close to commit the changes to IIS. The selected server authentication certificate is now bound to port 443 on the IIS server.

    image

    The Web Enrollment website is now configured to support HTTP over SSL connections via the fully qualified domain name. Since the site is accessed via FQDN, the server, in this example https://fabca01.fabrikam.com, must be added to the list of trusted sites in Internet Explorer of clients that will attempt to access this page. This is so that so that user credentials are automatically passed to the Web Enrollment site. For domain clients, this can be done via Group Policy (see Site to Zone Assignment List policy).

    Key Archival

    Next, we’ll look at setting up Key Archival. There are two parts for setting up Key Archival. The first is designating a Key Recovery Agent for the CA. The second is configuring the Certificate Template for archival which we touched on in the previous part, Configuring Certificate Templates.

    Key Archival is important for certificates that are used for encryption. If a user’s encryption private key is lost for some reason, any encrypted data can be recovered by extracting the archived private key from the CA database and returned to the user.

    Designating a Key Recover Agent (KRA)

    In the previous part, we configured the CA to issue KRA certificates. Specifically, we added the default KRA template to the list of certificate templates available on the CA, and set the permission on template to allow members of the Fabrikam KRA security group enroll. The next step is to have at least one member of that group request the KRA certificate.

    The user, Magnus Hedlund, is a member of the Fabrikam KRA group. Here’s how he’d request a KRA certificate.

    1. Connect to the Web Enrollment site and select Request a certificate from the Web Enrollment webpage.

    image

    2. Select Create and submit a request to this CA.

    3. From the Certificate Template drop-down, select Key Recovery Agent. Magnus should also make sure the option Mark keys as exportable is selected, but the rest of the default settings can be accepted.

    4. Set the friendly name to KRA Cert, and click Submit.

    5. The default Key Recovery Agent template is configured to require Certificate Manager approval (on the Request Handling tab), meaning that a Certificate Manager (a local Administrator on the server, by default) must manually issue the certificate. As such, Magnus will see a message stating that his request is in a pending state. Magnus then emails the Certificate Manager to get the request approved.

    6. The Certificate Manger launches the Certificate Services MMC, selects Pending Requests, and locates Magnus’ request. She then right-clicks on the request and selects All Tasks from the context menu and then selects Issue.

    7. In order to retrieve his issued certificate, Magnus must returns to the Web Enrollment site. This time he must select View the status of a pending certificate request. REQUIRED: Magnus must reconnect to the site using the same client he used to submit the request. The Web Enrollment pages use a cookie to record pending requests.

    8. Magnus then clicks on his request that is identified by the date and time that it was submitted.

    9. Magnus then has a link to Install this certificate.

    10. Magnus is then presented with a Potential Scripting Violation error asking if the certificate should be added to the certificate store He clicks Yes to acknowledge the warning.

    image

    The Certificate is then successfully installed.

    User’s with Key Recovery Agent certificates should take care to protect their certificates and keys. One way of doing that is exporting the KRA certificate and private key to a PFX file – deleting the private key stored on the client – and keeping that password protected file in a safe location. The KRA certificate and private key can then be imported as needed.

    To do this, Magnus follows these steps:

    1. Open the Certificates MMC targeted to his user account (Certmgr.msc).

    2. Expand Personal, then Certificates. Locate the KRA Cert, and right-click on it. Select Export from the context menu.

    3. This launches the Certificate Export Wizard. Click Next to continue.

    4. On the Export Private Key page of the wizard, select Yes, export the private key, and click Next.

    5. On the Export File Format page, select Delete the private key if the export is successful and make sure all the other options are deselected. Click Next.

    6. Enter a password to secure the Private Key, and click Next.

    7. On the File to Export page, click Browse….

    8. Browse to a secure location in the file system, give the PFX file a name, and click Save.

    9. Click Next on the File to Export page.

    10. Click Finish to complete the export.

    11. He is then prompted that the export was successful, and clicks OK.

    In a high security setting, one option may be to save the PFX file to removable media, and then secure that media in a locked safe until it is needed. Why are such measures necessary? Well, they may not be; it totally depends on your environment. What is important to realize is that Key Recover Agents can decrypt the encrypted key blob for any user with an archived key. The CA’s design tries to mitigate this risk somewhat by requiring a Certificate Manager to actually export the encrypted key blob from the CA database. A Key Recovery Agent can only decrypted the exported key blob, but he can’t actually export the key blob.

    Enabling Key Archival

    Now that we actually have a Key Recovery Agent published in Active Directory (any KRA certificate issued by the CA is published to the CN=KRA container in Active Directory), we can proceed to enable Key Archival on the CA.

    To enable Key Archival:

    1. Open the Certificate Services MMC. Right-click on the name of the CA in the tree-view pane and then select Properties from the context menu.

    image

    2. Select the Recovery Agents tab, and select Archive the key. In this case, we only have one Key Recovery Agent, so we’ll leave Number of recovery key agents to use at the default of 1. Click Add… to add the KRA certificate to the CA.

    image

    3. Select the appropriate KRA Certificate, and click OK.

    image

    4. Click OK to close the properties and commit the changes we’ve made to the CA.

    image

    5. When prompted to restart Certificate Services, click Yes.

    image

    Key archival is now enabled on the CA.

    Additional information on the mechanics of Key Archival is available here: http://blogs.technet.com/pki/archive/2009/08/07/understanding-key-archival.aspx

    Conclusion

    And that’s it for this part of the series. Today, we configured the IIS server hosting our Web Enrollment pages to use SSL when serving the site. We also configured our CA with a Key Recovery Agent to enable Key Archival. Neither of these steps are required in order for the CA to issue certificates, but setting up the features properly will increase the usefulness of your PKI.

    In the final segment in this series, I will cover Disaster Recovery Scenarios.

    Chris "Ol' Dusty" Delay

  • USMT pauses at "starting the migration process" for many minutes then works

    Hi folks, Ned here again. Occasionally, someone pings me to explain why USMT 4.0 scanstate is running slowly. My first demand is to see the scanstate.log file, having provided the argument /v:5. That log shows the command-line, the XML files selected, and what "slow" really means.

    In this particular example the cause is like Soylent Green: it's peeeeoplllllle!!!

    The Scenario and Symptoms

    • Scanstate is stuck for a long time at "starting the migration process" on a production computer. Often for many minutes, even though a test computer goes through that same phase in a few seconds.

    image

    • The Examination and Gathering phases are as quick as a test computer.
    • There are no errors in the console and the migration eventually completes.

    image

    • The scanstate.log shows many entries like:

    Waiting 6000 msec to retry hive load (tries remaining: 17)...

    IndirectKeyMapper: RegLoadKey(HKEY_USERS,S-1-5-21-2007108519-768573118-610138143-1118,C:\Documents and Settings\bshirley\NTUSER.DAT) failed (3)

    Dumping hive list at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\hivelist...

    [\REGISTRY\MACHINE\HARDWARE] =>

    [\REGISTRY\MACHINE\SECURITY] => \Device\HarddiskVolume1\WINDOWS\system32\config\SECURITY

    [\REGISTRY\MACHINE\SOFTWARE] => \Device\HarddiskVolume1\WINDOWS\system32\config\software

    [\REGISTRY\MACHINE\SYSTEM] => \Device\HarddiskVolume1\WINDOWS\system32\config\system

    [\REGISTRY\USER\.DEFAULT] => \Device\HarddiskVolume1\WINDOWS\system32\config\default

    [\REGISTRY\MACHINE\SAM] => \Device\HarddiskVolume1\WINDOWS\system32\config\SAM

    [\REGISTRY\USER\S-1-5-20] => \Device\HarddiskVolume1\Documents and Settings\NetworkService\NTUSER.DAT

    [\REGISTRY\USER\S-1-5-20_Classes] => \Device\HarddiskVolume1\Documents and Settings\NetworkService\Local Settings\Application Data\Microsoft\Windo

    [\REGISTRY\USER\S-1-5-19] => \Device\HarddiskVolume1\Documents and Settings\LocalService\NTUSER.DAT

    [\REGISTRY\USER\S-1-5-19_Classes] => \Device\HarddiskVolume1\Documents and Settings\LocalService\Local Settings\Application Data\Microsoft\Windows

    [\REGISTRY\USER\S-1-5-21-2007108519-768573118-610138143-500] => \Device\HarddiskVolume1\Documents and Settings\Administrator.CONTOSO\NTUSER.DAT

    [\REGISTRY\USER\S-1-5-21-2007108519-768573118-610138143-500_Classes] => \Device\HarddiskVolume1\Documents and Settings\Administrator.CONTOSO\Local

    End of hive list

    Waiting 6000 msec to retry hive load (tries remaining: 16)...

    IndirectKeyMapper: RegLoadKey(HKEY_USERS,S-1-5-21-2007108519-768573118-610138143-1118,C:\Documents and Settings\bshirley\NTUSER.DAT) failed (3)

    • Looking closer, you see 20 of these entries repeated for one or more user profiles.
    • After each of those 20 tries, USMT gives up with a message like  "Incomplete user profile detected and ignored:C:\Users\ned. Error: 3"
    • Cancelling processing with CTRL+C takes just as long as waiting for the pause to end naturally. You have to kill the scanstate.exe process manually if you want it to stop sooner.

    What is the data telling us?

    I’ll start Windows Explorer and regedit.exe, and then look at the user profile folder and the registry ProfileList key:

    If XP  --> C:\Documents and Settings

    If Vista/Win7 –> C:\Users

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList

    See the issue? My file system looks like this:

    image

    But my registry looks like this:

    image

    image

    Notice how there are six user SIDs in the ProfileList key, but only three user profiles on the file system. It is supposed to look like this:

    image

    Where are Brett Shirley and the others? I'll tell you where: in the bit bucket, because somebody thinks that deleting a folder on the file system is the same as deleting profiles. It's not. That error we kept seeing the log is:

    C:\>net helpmsg 3

    The system cannot find the path specified.

    Why does USMT keep looking?

    USMT 4.0 tries to enumerate each profile 20 times, waiting 6 seconds a try, before giving up and moving to the next profile. This design handled a particular third party product that temporarily locked and renamed ntuser.dat files - a user's "HKEY_CURRENT_USER" registry - then changed it back after it was done doing… whatever. You can probably guess what kind of software would do such a thing. You can also do the math now on how long only a few orphaned profiles will delay your migration.

    USMT hoped to "wait out" the software and migrate once the profile reappeared. The bad side effect of this error handling is that profiles with no ntuser.dat also cause a wait and retry. The only way to have orphaned profiles is by a human deleting the user profile folders manually instead of using Windows profile tools. There's no bug here with profiles or USMT, it’s all expected behavior.

    Don’t you love the law of unintended consequences?

    Let's fix it

    Luckily, there are a number of easy ways to get your migration back in order. Pick one or more.

    • Delete any orphaned profiles using supported tools before running USMT. After all, they point to non-existent data. This can be done a number of ways:

    o User Profiles applet within sysdm.cpl (works on all OS; for XP it's the only in-box option)

    image

    o Call WMI class Win32_UserProfile to delete profiles on Windows Vista or Windows 7. There are plenty of VB and batch examples on the internet. Here's mine with PowerShell on Win7, where I am zapping all profiles called bshirley:

    (Get-WmiObject Win32_UserProfile | Where {$_.LocalPath -like 'c:\users\user'}).Delete()

    image

    o Script REG.EXE and RD.EXE (this will only work on Vista or later; the reg.exe included with XP lacks sophisticated search options). For example:

    @echo off
    REM Commandline needs a deleted user folder to reconcile.
    REM Example: RMProf.CMD "c:\users\bshirley"

    REG QUERY "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList" /s /d /f %1 | findstr /i "S-1-5-21" > delme.txt
    FOR /F "tokens=1* delims=\" %%i in (delme.txt) do REG DELETE "HKLM\%%j" /f
    RD %1 /q /s
    Del delme.txt /q

    image

    o Various third party tools
    o Techniques people discuss in the Comments section below. Before you say Delprof.exe, give it a try with a deleted user folder. :)

    Then you can run scanstate.exe normally.

    • Set the following environment variable in the CMD prompt running scanstate.exe:

    SET MIG_IGNORE_PROFILE_MISSING=1

    image
    Hardlink migrations haul #%&

    This environment variable forces USMT to skip profiles that have no NTUSER.DAT file. This is generally safe to use: Windows Easy Transfer (the consumer-oriented graphical version of USMT) always has this set on internally, so it is well tested and proven. The only downside is that if you ran into that third party software you'd see issues. You would also be skipping profiles that don't contain an ntuser.dat (but do contain a real folder); that's low risk as they are not truly useable profiles anymore either - no one could have logged in with them.

    • If using SCCM/MDT to call USMT, the awesome Deployment Guys wrote a script that can detect these orphaned profiles and let you act on them in some fashion. It won't delete these profiles but you now have scriptable options. Setting MIG_IGNORE_PROFILE_MISSING=1 works for MDT/SCCM as well, but automating lite touch/zero touch profile migration to automatically ignore damaged profiles is probably not a great idea; better to detect issues and let a human act up on it.

    Update 8-3-2011: Michael Murgolo from the Deployment Guys wrote an awesome sample script that allows you to set environment variables through SCCM/MDT - and the example is for MIG_IGNORE_PROFILE_MISSING! http://blogs.technet.com/b/deploymentguys/archive/2011/08/03/setting-environment-variables-in-a-task-sequence.aspx

    Rumor has it that Mike Stephens is working on a series of blog posts about user profile architecture. You can nag him about it here.

    Until next time.

    Ned "Make Room! Make Room!" Pyle

  • Designing and Implementing a PKI: Part V Disaster Recovery

    The series:

    Designing and Implementing a PKI: Part I Design and Planning

    Designing and Implementing a PKI: Part II Implementation Phases and Certificate Authority Installation

    Designing and Implementing a PKI: Part III Certificate Templates

    Designing and Implementing a PKI: Part IV Configuring SSL for Web Enrollment and Enabling Key Archival

    Designing and Implementing a PKI: Part V Disaster Recovery

    Chris here again. We are now going to move onto Disaster Recovery. One of the many tasks you want to complete during the planning phase is to plan for disaster recovery. When planning for disaster recovery not only is the backup/restore process important, but the actual design of the PKI can affect how resilient your PKI infrastructure is. Additionally, proper planning can alleviate the impact of a system failure.

    When the system hosting Certificates Services becomes unusable due to a failure, there are a couple of consequences of that failure.

    1. The CA can no longer sign its Certificate Revocation List(CRL) or delta CRL(dCRL)

    2. The CA can no longer issue certificates.

    3. The CA database includes a record of certificates that have been issue or revoked, and is unavailable until the CA is recovered.

    Signing CRLs and Delta CRLs

    CRLs and delta CRLs are used by clients to determine if a certificate has been revoked. In general, applications will fail when they cannot determine the revocation status for a certificate, though some applications have the ability to disable revocation checking while others do not.

    Like certificates, CRLs and delta CRLs have a period during which they are valid. Once the CRL and/or delta CRL expires an application checking the revocation status of a certificate against the expired CRL will fail. The point of this discussion is that typically the first impact you will see when a Certification Authority fails is the inability of applications to the check revocation status of any certificates.

    When you design and implement a PKI you configure the validity period of the CA’s CRL and delta CRL. This design consideration has an impact in terms of disaster recovery. The maximum time you have after a CA failure to institute your recovery process without impacting certificate validation is determined by these settings.

    Example 1. You have an issuing Certification Authority and it is publishing a base CRL once every 7 days and delta CRL once every day. You have approximately 24 hours since the last delta CRL was published to either restore the CA or re-sign the delta CRL before certificate validation starts failing.

    Example 2. You have an issuing Certification Authority and it signs a CRL once every 7 days, but is not configured to publish a delta CRL. In this scenario you have 7 days – (the number of days since the base CRL was signed) before validation will begin to fail due to the inability to check revocation status against a valid CRL.

    Mitigation

    There are several ways that you can minimize the impact that a CA failure will have on certificate validation.

    One way is to install a clustered issuing certification authority. If the active node of the cluster fails the CA can be failed over to the second node. Clustering, however, will not protect against the failure of a shared component such as storage or a Hardware Security Module (HSM). So these devices should have methods to provide failover as well, if possible.

    Another option is to increase the period the base and delta CRL publication intervals (and hence, their validity periods). This can potentially give you more time to kick off your recovery process, but if the CA fails shortly before the new base or delta CRL is about to be published increasing the publication interval has done little good. One must also realize there is a trade-off involved here. Increasing the publication interval means that it will take longer for certificate consumers to become aware that a certificate has been revoked and added to the CRL.

    A more complicated strategy is to set the automatic publishing interval to a longer period, and then manually publish the CRL more often. In other words you set the CRL publication interval to 7 days, and then publish a new CRL every day. This way, if the CA fails you have 6 or 7 days recognize the problem and start your recovery process. The Windows CA does not automatically publish CRLs in this fashion, but you can set up a scheduled task on the CA server to publish the CRL every 24 hours using the command line utility, certutil.exe. The command certutil -crl will will instruct the CA to publish a new base CRL with the validity period defined in the CA configuration.

    There are also some group policies that you can consider as part of your overall disaster recovery planning. If you have workstations and servers running Windows Vista, 7, Server 2008, or Server 2008 R2 there is a group policy setting that extends the period of time for which the OS will consider a given CRL valid, independent of the actual validity period of the CRL. The group policy setting is located in the following location:

    Computer Configuration\Windows Settings\Security Settings\Public Key Policies\Certificate Path Validation Settings.

    This setting forces the client to consider the CRL or OCSP response to be valid for longer than it actually is. Below is a screenshot of the specific settings:

    image

    Recovery

    In terms of recovery there is a short term workaround and a long term resolution. The short term workaround is to use a process called CRL re-signing to manually re-sign an existing CRL and extend its validity period. By doing this, you can give yourself additional time to recover the CA. CRL re-signing requires that you have a backup of the CA’s public/private key pair. I will be covering this process later in this blog posting.

    The longer term fix is to restore the certification authority. This of course is not possible unless you have previously backed up the certification authority. I will also cover this later in the blog post.

    CA can no longer issue certificates

    Another issue that occurs when you have a CA failure is that it can no longer issue certificates. In some scenarios where certificates are issued less frequently, the inability to issue certificates may not have a business impact. In other cases, however, the impact could be considerable. For example, if a CA dedicated to issuing certificates for Network Access Protection (NAP) fails the problem would be almost immediately noticeable. NAP certificates have a lifetime of only 24 hours, so a failed CA can be a considerable problem.

    Mitigation

    One way to eliminate this issue completely is to have multiple CAs that are issuing certificates based on the same certificate templates. In this way, if one CA fails clients can still enroll for certificates on one of the other certificate authorities.

    A clustered issuing certification authority is another way to mitigate against a failed CA. If one of the CAs in the cluster fails the cluster will fail over to the second node. Clustering, as mentioned earlier, will not protect against the failure of a shared component such as storage or an HSM. I’ll re-iterate the need for these devices to have methods for failover as well.

    Recovery

    Ultimately, recovering from the inability to issue certificates can be resolved by recovering the failed certification authority or installing a new issuing certification authority to issue certificates. The preferred method would be to restore the failed certification authority since it already has information about issued certificates in its CA Database.

    CA Database

    By default, the CA database contains a copy of every certificate issued, every certificate that has been revoked, and a copy of failed and pending requests. The CA Manager may decide, however, to clear out any expired certificates from the CA database in order to recover free space in the database.

    Note: In Windows Server 2008 R2 you can configure a template such that issued certificates based on that template are not stored in the CA database. These so call “ephemeral certificates” generally have validity periods shorter than the publication interval of the issuing CA, so recording them so they can be later revoked makes little sense. Further, these short-lived certificates may be issued in great numbers and with great frequency. Storing them in the database can dramatically increase the database’s rate of growth. Certificates issued for NAP are examples of these ephemeral certificates.

    If a CA is configured for key Archival and Recovery, the CA database will also contain the private keys for any certificates whose templates are configured for archival. Failure to recover the CA database in this case would result in losing all of these archived keys.

    When a certificate authority fails the database is unavailable which makes it difficult to revoke certificates that were previously issued by the CA. It also makes it impossible to recover any certificates that have been archived in the database. Again, the database will be unavailable when the CA is unavailable. However, in rare circumstances it is possible that the CA database can become corrupted. Like all ESE databases, the CA database can be affected by hardware or disk issues that impact the database or log files.

    Mitigation

    One option to mitigate the database becoming unavailable due to a CA failure is to set up a clustered certification authority. Another option is to take regular backups of the CA. If the CA fails, you can then restore the CA from the backup. Below I discuss options for backing up the CA as well as for restoring the CA.

    Recovery

    For corrupt databases, repairs can be made with esentutil.exe. However, in most case it would be preferred to restore from a backup to avoid data loss that can be incurred when using some of the functions in esentutil.exe. Esentutil.exe can repair the structure of the database, but usually at the expense of the data stored within that structure.

    CA Backup

    System State

    There are two different ways to backup the Certification Authority. The first is through a System State backup. A system state backup will back up the entire CA as well as its configuration. If the private key is stored on the CA and not on an HSM, the private key will be backed up as well. Here is additional information on System State. A system state backup should be used when you will need to restore to the same hardware.

    Backing up system state in Windows Server 2003.

    1. To start NT Backup, click Start then Run, type ntbackup.exe and press Enter.

    2. If this is the first time you’ve run this tool, it will start the Welcome to the Backup or Restore Wizard.

    3. Uncheck the Always start in wizard mode, and then click Cancel.

    4. Launch NT Backup again.

    5. Once NT Backup launches, select the Backup Tab, and check just System State as the item to backup.

    6. Under the Backup media or file name section, select your backup media or file location where you wish to save the backup.

    7. Click the Start Backup button. This will bring up the Backup Job Information dialogue box.

    8. If you wish to start the backup immediately, click Start Backup.

    9. If you wish to schedule the backup, click the Schedule button.

    10. When prompted You must save the backup selections before you can schedule a backup. Do you want to save your current selections now?, click Yes.

    11. Save the selection script.

    12. After you save the selection script, the Scheduled Job Options dialogue box will open. Give the Job a name. Then click the Properties button.

    13. Configure the desired schedule, and click OK. Then enter the credentials for the user that you wish the backup to run under. This account will need to either have Back up files and directories right or be a member of the Backup Operators group on the CA. Then click OK again. Click OK again, you will be prompted for the credentials again.

    14. You can then click on the Schedule Jobs tab in NT Backup to check the schedule.

    Restore System State in Windows Server 2003

    1. On the Windows Server 2003 system on which you plan on restoring system state, open the NT Backup utility.

    2. Click on the Restore and Manage Media tab.

    3. Navigate to the backup of the system state, make sure that System State is checked. Under Restore files to, make sure Original location is selected, and click Start Restore.

    4. You will then be prompted that Restoring System State will always overwrite current System State unless restore to an alternate location. Click OK. Then click OK, to Confirm Restore.

    5. When the Restore completes, click Close.

    6. You will then be prompted to restart your computer, click Yes.

    Performing System State Backup Windows Server 2008 R2

    1. If you have not installed Windows Backup, you will first have to install this feature. Open Server Manager, select the Features node, then click Add Features.

    2. In the Add Features Wizard, select Windows Server Backup Features, then click Next, and then Install. When the installation completes, click Close.

    3. You can then launch the Windows Server Backup tool, by clicking Start, then Administrative Tools, then Windows Server Backup.

    4. Also, to use Windows Server Backup, you have to have an additional drive or a network location to backup to. In other words you cannot save the backup on the system drive.

    5. The wizard allows you to configure a one-time backup, or schedule a backup.

    6. To schedule a backup, click Backup Schedule…, under the Actions sections of the Windows Server Backup tool.

    7. This will start the Backup Schedule Wizard, click Next.

    8. On the Select Backup Configuration page, select Custom, and then click Next.

    9. On the Select Items for Backup page of the wizard, click the Add Items button.

    10. Select System State, and click OK, then click Next.

    11. On the Specify Backup Time page of the wizard, select the time that you would like the backup to be scheduled for, and click Next.

    12. On the Specify Destination Type page of the wizard, select either Hard Disk, Volume, or Shared Network Folder, and click Next. In this example, I am selecting Hard Disk

    13. Select the Hard Disk you would like to use for backup, if it is not listed, click Show All Available Disks…, and select the appropriate disk, and click OK. Click Next.

    14. You will be prompted that the disk will be reformatted and existing volumes will be deleted, click Yes if you are using this disk solely for backups, if not choose another backup destination.

    15. On the Confirmation page, click Finish.

    16. On the Summary page, click Close.

    Restoring System State in Windows Server 2008 R2.

    1. In the Actions page of the Windows Server Backup tool, click Recover…

    2. This will start the Recovery Wizard, select the location of the backup, and click Next.

    3. On the Select Backup Date of the wizard, select the date and time of the backup and click Next.

    4. On the Select Recovery Type, select System state, and click Next.

    5. On the Select Location for System State Recovery page, select Original location, and click Next.

    6. On the Confirmation page of the wizard, click the Recover button.

    7. You will be prompted that the recovery cannot be paused or cancelled once started, click Yes.

    Manual Backup of the Certification Authority

    A good guide to user for backing up and restoring a certification authority is:

    298138 How to move a certification authority to another server
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;298138

    Steps 1 through 3 of this document cover manually backing up the CA.

    Essentially, you want to do a manual back up of the private key, CA certificate, and CA database. If you are using an HSM to protect the private key pair, you will either need to backup the private key through a method provide by the HSM vendor or have a highly available configuration for the HSMs. In general, if the private key is stored on an HSM, you do not want to backup the private key to any type of media, as this will degrade the overall security and protection of the private key. The configuration for the Certification Authority is stored in the registry so you would want to backup that registry location as well. The registry location is HKLM\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>.

    Generally the private key, CA certificate and CA configuration are going to remain relatively static. You will, however, need to perform a fresh backup should you ever renew the CA certificate or update the configuration. However, the CA database is going to grow over time as certificates are issued, requests are denied, and certificates are revoked, so you are going to want to periodically backup the database. How often you perform this back up will depend on how rapidly changes to the database are made and how tolerant you are to discrepancies between the back up and the live data.

    The first time you run the backup you will want to back up the CA’s certificate and private key, the CA database, and the certificate database log. To perform this task through the GUI, open up the Certification Authority MMC snap-in (certsrv.msc).

    1. Right click on the certification authority name and select All Tasks from the context menu, and then select Back up CA…

    2. This will launch the Certification Authority Backup Wizard, click Next.

    3. Select Private key and CA certificate and Certificate database and certificate database log. Browse to a local or network location to save the backup. The backup location must be an empty folder, and click Next.

    4. Enter a password to protect the private key, and click Next, then Finish.

    To backup the CA via the command line, open an elevated command prompt and type certutil –backup Path. Path is the empty directory where the backed up information will be stored. You will then be prompted for a password to protect the private key. Enter the password and then press the Enter key. You will then be prompted to confirm the password. Confirm the password and press the Enter key. A message will be sent to the console indicating what has been backed up and that the certutil –backup command completed successfully.

    To backup the registry run the following command: REG EXPORT "HKLM\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>" caconfig.reg

    Copy caconfig.reg to your backup directory so that all the necessary data is in the same place.

    Once you have completed a full back up of the Certification Authority, you can perform incremental backups of the CA database. Alternatively, you could choose to periodically backup the entire CA database.

    Although, you can back up the database through the Certification Authority console, you will most likely want to use some sort of script of scheduled task to perform the backup periodically.

    Manual Restore of the Certification Authority

    Once you relocate the server that will serve as the replacement for the failed CA, you must do some initial configuration of the server. Give that server the same name as the failed CA and join it to the same domain

    Configure AD permissions

    Since you have brought online a new machine to be the CA we need to modify the security of Active Directory to allow the new machine to be able to update PKI configuration information in AD. This is because the new machine will have a new SID associated with the machine account, even though the machine account has the same name.

    Open ADSIEDIT.MSC. Open the Configuration container of the Active Directory database. Browse to CN=Public Key Services, CN=Services, CN=Configuration. Next open the AIA container. Locate the object that is associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control. This will associate the permissions with the new account.

    Next open the CDP container. Locate the container associated with the failed CA. Open that container and then select the CRL object contained within that container. Right click on the CRL object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control.

    Next open the Enrollment Services container. Locate the object associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then click Advanced. In the Permissions tab of the Advanced Security Settings dialog box, click Add… Add the computer object for the CA. On the Permission Entry screen, select Allow for all Permissions except Full Control. Click OK 3 times.

    Next open the KRA container. Locate the object that is associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control. This will associate the permissions with the new account.

    Installing the Certification Authority Role

    Next we need to restore the Certification Authority. Log on with an account that has Enterprise Admin credentials. The first thing we will need to do is to install the Certification Authority Role. The instructions below are for a Windows Server 2008 and Windows Server 2008 R2 based CA. For exact procedures in Windows Server 2003. Please see the following article:

    298138 How to move a certification authority to another server
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;298138

    1. Open Server Manager.

    2. Click on the Roles Node, then click Add Roles.

    3. When the Add Roles Wizard opens, click Next.

    4. Select Active Directory Certificate Services and click Next.

    5. Then Click Next Again.

    6. On the Select Role Services page of the wizard, select Certification Authority, and then click Next.

    7. On the Specify Setup Type page of the wizard, Select Enterprise or Standalone depending on the configuration of the failed CA, and then click Next.

    8. On the Specify CA Type page of the wizard, select either Root CA or Subordinate CA, depending on the configuration of the failed CA, and then click Next.

    9. On the Set Up private key page of the wizard, select Use existing private key, and the sub-option of Select a certificate and use its associated private key, then click Next.

    10. On the Select Existing Certificate page of the wizard, click Import.

    11. Browse to the backup of the failed CA and select the P12 file from the backup, click Open. Then enter the password for the P12 file, and click OK.

    12. Then click Next.

    13. On the Configure Certificate Database page of the wizard, select the same database and log file locations as were specified on the failed CA, then click Next, then Install.

    14. When the installation completes, click Close.

    Open an elevated command prompt and use the following command to import the previously backed up CA configuration: REG IMPORT <Previously backed up registry file>.

    Restore the CA Database

    At this point, you can restore the CA database from your backup.

    1. Right click on the certification authority name and select All Tasks from the context menu, and then select Restore CA…

    2. You will be prompted to stop Certificate Services. Click Ok.

    3. When the Certification Authority Backup Wizard starts, click Next.

    4. Select Certificate database and certificate database log. Browse to a local or network location of your previously saved backup.

    5. Click Next.

    6. Click Finish.

    7. You will be prompted to restart the CA. Unless you have further incremental backups to restore, click Yes. If you have incremental backups then click No, and walk through the steps above to restore your incremental backups.

    Now if there were any additional Certificate Services roles such as Online Responder (OCSP) or Web Enrollment, you can go ahead and install those at this point.

    CRL Re-signing

    CRL re-signing is a manual process whereby the Administrator can use the CA's backed up certificate and private keys to re-sign an existing CRL file. This process allows you to extend the lifetime of the existing CRL, and even add certificates to the CRL, effectively revoking them.

    Importing the CA certificate and private key

    To begin, you will need to have a backup of the private key of the CA. If you have the private key stored on an HSM, you will have to follow the HSM vendor’s instructions for making the private key available to another machine. If you are not using an HSM, perform the following to import the CA public and private key pair to the machine where you will be re-signing the CRLs.

    1. Click Start, then Run, and type MMC, and the press Enter.

    2. Select the File Menu, and then select Add/Remove Snap-in…

    3. Select Certificates, and then click Add >.

    4. Then select Computer account, and click Next.

    5. Then select Local computer, and then click Finish.

    6. Then click OK.

    7. Expand the Certificates (Local Computer) node.

    8. Right click on the Personal node, then select All Tasks from the context menu, and then select Import…

    9. This will open the Certificate Import Wizard, click Next.

    10. Click the Browse button, to browse to the P12 file located in the CA's backup location.

    11. In the drop down for the extension type, select Personal Information Exchange (*.pfx;*.p12)

    12. Locate the P12 file that was previously backed up, and click Open.

    13. Click Next.

    14. Type the Password for the P12 file and click Next, click Next again, and click Finish.

    15. Click OK to acknowledge that the import was successful.

    To re-sign the CRL and Delta CRL with the same validity period as they have been previously published, use the following command:

    certutil -sign <existing CRL file name> <re-signed CRL file name>

    http://technet.microsoft.com/en-us/library/cc782041(WS.10).aspx

    You will then have to manually publish the CRL to all CDP locations.

    If you wish to adjust the validity period you can specify the validity period at the end of command in the following format DD:HH, where D=Days, and H=Hours. For example, the following command would re-sign a CRL that is valid for 14 days:

    certutil -sign <existing CRL file name> <resigned CRL file name> 14:00

    If you wish to add one or more issued certificates to the CRL, you specify the serial numbers in a comma separated list on the command line. For example, the following command would add serial numbers to the CRL:

    certutil -sign <exiting CRL file name> <resigned CRL file name> +SerialNumber1,SerialNumber2,SerialNumber3

    Summary

    When building a PKI infrastructure it is critical to take into consideration how your design will have an effect on the availability of your PKI. However, the design also affects the way in which you may have to recover the CAs in the PKI.

    You should definitely consider the criticality of PKI to your environment, and how much downtime is acceptable. This will help drive your decisions when designing the PKI and implementing the Certification Authorities.

    Also, many customers make the mistake of either not being aware of how to recover a Certification Authority or do not have a documented process for doing so. When designing and implementing your PKI, I recommend that you test recovery and document the recovery steps for CAs in your PKI.

    Chris "CLEAR!" Delay

  • DFSN and DFSR-RO interoperability: When Good DFS Roots Go Bad…

    Hello, Ken here. Today I want to talk about some behaviors that can occur when using Distributed File System Namespaces (DFSN) along with Distributed File System Replication (DFSR) Read-Only members.

    The DFSN and DFSR services are independent of each other, and as such have no idea what the other is doing at any given time. Aside from having a name that is confusingly similar, they do completely different jobs. DFSN is for creating transparent connectivity between shared folders located on different servers and DFSR is for replicating files.

    With the advent of Windows 2008 R2, we gave you the ability in DFSR to create “Read-Only” members. This was a frequent request in previous versions, and as such, it works well.

    Historically, a common configuration we have seen with DFSN is replication of the actual namespace root, replicating a namespace root folder can cause weirdness. Every time the DFSN service starts, it creates reparse points that link to the various shared folders on the network. If a replication service is monitoring the file system for changes, then we can run into some timing issues with the creation and deletion of those reparse points, leading to the possible inability of clients to connect to a namespace on a given server. If you have ever talked to a Microsoft Directory Services engineer they probably discouraged your plans to replicate roots.

    Today I am going to show you what will happen if you have a DFS Namespace and you are using DFSR to replicate it, and then you decide to use the handy-dandy read-only feature on one or more of the members.

    The setup and the issue

    First we start with a word from our sponsor about DFSR: if you would like to know more specifically how read-only members work, check out Ned’s Bee-Log, Read-Only Replication in R2.

    Here we have a basic namespace:

    image

    Here I am accessing it from a Windows 7 machine, so far so good:

    image

    Now I have realized that the scheduled robocopy job that “the guy” who used to work here before me setup is not the greatest solution in the world, so I’m going to implement DFSR and keep all my files in sync all the time, 'cause I can and it’s awesome, so I create a DFSR replication group.

    Now because I’m working at the DFS root, I don’t have the “Replication” tab option in the namespace properties…hmmm. That’s ok, I can just go down to the Replication section of the DFS management console and create one, like this:

    image

    Now that’s all replicating fine and I can create and modify new data.

    image

    image

    Quick check of my connection, and I see that I am now connected to the read-only server:

    image

    I attempt to make changes to the data in the share, and get the expected error:

    image

    This is the DFSRRO.sys filter driver blocking us from making the changes. When a member is marked as read-only, and the DFSRRO.sys driver is loaded, only DFSR itself can change the data in the replicated folder. You cannot give yourself or anybody enough permission to modify the data.

    So far everything is great and working according to plan.

    Fast-forward a few weeks, and now it’s time to reboot the server for <insert reason here>. No big deal, we have to do this from time to time, so I put it in the change schedule for Friday night and reboot that bad boy like a champ. The server comes up, looks good, and I go home to enjoy the weekend.

    Come Monday I get to work and start hearing reports that users in the remote site cannot access the DFS data and they are getting a network error 0x80070035:

    image

    This doesn’t add up because all the servers are online and I can RDP to them, but I am seeing this same error on the clients and the server itself. In addition, if I try to access the file share on the server outside of the namespace I get this error "A device attached to the system is not functioning":

    image

    What is happening is normal and expected, given my configuration. All the servers are online and on the network, but what I have done is locked myself out of my own share using DFSR. However, the issue here is not really DFSR, but rather with DFSN.

    Remember earlier, I said when a member is marked as read-only and the DFSRRO.sys driver is loaded, only the DFSR service can make changes to the data in the folder, this includes the “folder” itself. This is where we run into the issue. When the DFS Namespace server starts, it attempts to create the reparse points for the namespace and all the shares below it. The reparse points are stored on the root target server; in my case, it’s the default “DFSRoots” folder. What I have done here has made the root share inaccessible to the DFSN service. Using ETW tracing for the DFSN service, we see the following errors happening under the covers:

    ·         [serverservice]Root 00000000000DF110, name Stuffs

    ·         [serverservice]Opened DFSRoots: Status 0

    ·         [serverservice]Opened Stuffs: Status c0000022

    ·         [serverservice]DFSRoots\Stuffs: 0xc0000022(STATUS_ACCESS_DENIED)

    ·         [serverservice]IsRootShareMountPoint failed share for root 00000000000DF110, (Stuffs) (\??\C:\DFSRoots\Stuffs) 0xc0000022(STATUS_ACCESS_DENIED)

    ·         [serverservice]Root 00000000000DF110, Share check status 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]AcquireRoot share for root 00000000000DF110, (Stuffs) 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]Root folder for Stuffs, status: 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]Done with recognize new dfs, status 0(ERROR_SUCCESS), rootStatus 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]5(ERROR_ACCESS_DENIED)

    This DFSRRO filter driver is doing its job of not letting anyone change the data.

    Note: You may be wondering how I gathered this awesome logging for DFS, I used a utility called tracelog.exe. Tracelog is a part of the Windows Driver Kit, and is an event tracing controller that runs from the command line, and can be used for all kinds of other ETW tracing also. Since the tracelog output requires translation, you will need to open a support case with Microsoft in order to read it. Your friendly neighborhood Microsoft Directory Services engineer will be able to help you get it translated.

    The resolution

    So what do we do to fix this? Well, the quickest way to resolve the issue is to remove the read-only configuration for the replicated folder. When you add or remove the read-only configuration for a replicated folder, DFSR will run an initial sync process. This should not take too long as the data should already be mostly in sync, if the data Is not fully in sync then is could take longer, and will be handled as pre-seeded. Once the initial sync is complete, you will need to restart the DFS Namespace service on the member to allow DFSN to recreate the reparse points. After doing this you will be back in business:

    image

    Moving forward

    Moving forward, we need to make some decisions about how to avoid outages.

    The best recommendation would be to stop storing data in the root folder of the DFS Namespace. Instead create a new folder- either on the same server in a different location or on another server - and create a new file share for it; I called mine “Replicated-Data”. Then you create a new DFS share under the namespace; I named this “Data”. Configure the folder target to the new file share with the recently moved data. Once you have your new DFS share, DFSN will even give you the option of configuring replication once you add a second folder target:

    image

    image

    When you select the “Replicate Folder Wizard”, it launches the DFSR creation wizard, and we can configure the replication group. Once we run through that, it will look something like this:

    image

    Now we have our namespace root, and our replicated folder separated. The namespace root is located at “C:\DFSroots\Stuffs” and the replicated folder is located at “C:\DFSR_Replicated_Folders\Data” so when we configure the replicated folder as read-only, it will not affect our reparse points in the DFS root folder.

    image

    Now if we reboot the server DFSN and DFSR are able to do their respective jobs, without any conflict and clients disruptions.

    image

    image

    And all was right with the DFSR world once again. Thanks for sticking it out this long.

    Ken "I don’t like the name Ned gave me" McMahan

  • Disk Image Backups and Multi-Master Databases (or: how to avoid early retirement)

    Hi folks, Ned here again. We published a KB a while back around the dangers of using virtualized snapshots with DFSR:

    Distributed File System Replication (DFSR) no longer replicates files after restoring a virtualized server's snapshot

    Customers have asked me some follow up questions I address today. Not because the KB is missing info (it's flawless, I wrote it ;-P) but because they were now nervous about their DCs and backups. With good reason, it turns out.

    Today I discuss the risks of restoring an entire disk image of a multi-master server. In practical Windows OS terms, this refers to Domain Controllers, servers running DFSR, or servers running FRS; the latter two servers might be member servers or also DCs. All of them use databases to interchange files or objects with no single server being the only originator of data.

    The Dangerous Way to Backup Multi-Master Servers

    • Backing up only a virtualized multi-master server's VHD file from outside the running OS. For example, running Windows Server Backup or DPM on a hyper-V host machine and backing up all the guest VHD files. This includes full volume backups of the hyper-v host.
    • Backing up only a multi-master server's disk image from outside the running OS. For example, running a SAN disk block-based backup that captures the servers disk partitions as raw data blocks, and does not run a VSS-based backup within the running server OS.

    Note: It is ok to take these kinds of outside backups as long as you are also getting a backup that runs within the running multi-master guest computers. Naturally, this internal backup requirement makes the outside backup redundant though.

    What happens

    What's the big deal? Haven't you read somewhere that we recommend VSS full disk backups?

    Yes and no. And no. And furthermore, no.

    Starting in Windows Server 2008, we incorporated special VSS writer and Hyper-V integration components to prevent insidiously difficult-to-fix USN issues that came from restoring domain controllers as "files". Rather than simply chop a DC off at the knees with USN Rollback protection, the AD developers had a clever idea: the integration components tell the guest OS that the server is a restored backup and resets its invocation ID.

    After restore, you'll see this Directory Services 1109 event when the DC boots up:

    image

    This only prevents a problem; it's not the actual solution. Meaning that this DC immediately replicates inbound from a partner and discards all of its local differences that came from the restored "backup". Anything created on that DC before it last replicated outbound is lost forever. Quite like these "oh crap" steps we have here for the truly desperate who are fighting snapshot USN rollbacks; much better than nothing.

    Now things get crummy:

    • This VSS+Hyper-V behavior only works if you back up the running Windows Server 2008 and 2008 R2 DC guests. If backed up while turned off, the restore will activate USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103) and trash AD on that DC.
    • Windows Server 2008 and 2008 R2 only implement this protection as part of Hyper-V integration components so third party full disk image restores or other virtualization products have to implement it themselves. They may not, leading to USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103) and trash AD on that DC.
    • Windows Server 2003 DCs do not have this restore capability even as part of Hyper-V. Restoring their VHD as a file immediately invokes USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103), again leading to trashed AD on that DC.
    • DFSR (for SYSVOL or otherwise) does not have this restore capability in any OS version. Restoring a DFSR server's VHD file or disk image leads to the same database destruction as noted in KB2517913 (events 2212, 2104, 2004, 2106).
    • FRS (for SYSVOL or otherwise) does not have this restore capability in any OS version. Restoring an FRS server's VHD file or disk image does not stop FRS replication for new files. However, all subfolders under the FRS-replicated folder (such as SYSVOL) - along with their file and folder contents - disappear from the server. This deletion will not replicate outbound, but if you add a new DC and use this restored server as a source DC, the new DC will have inconsistent data. There is no indication of the issue in the event logs. Files created in those subfolders on working servers will not replicate to this server, nor will their parent folders. To repair the issue, perform a "D2 burflag" operation on the restored server for all FRS replicas, as described in KB290762.

    Multi-master databases are some of the most complex software in the world and one-size-fits all backup and restore solutions are not appropriate for them.

    The Safe Way to Backup Multi-Master Servers

    When dealing with any Windows server that hosts a multi-master database, the safest method is taking a full/incremental (and specifically including System State) backup using VSS within the running operating system itself. System state backs up all aspects of a DC (including SYSVOL DFSR and FRS), but does not include custom DFSR or FRS, which is why we recommend full/incremental backups for all the volumes. This goes for virtualized guests or physical servers. Avoid relying solely on techniques that involve backing up the entire server as a single virtualized guest VHD file or backing up the raw disk image of that server. As I've shown above, this makes the backups easier, but you are making the restore much harder.

    And when it gets to game time, the restore is what keeps you employed: your boss doesn't care how easy you made your life with backups that don’t work.

    Final thoughts

    Beware any vendor that claims they can do zero-impact server restores like those that I mentioned in the "Dangerous" section and make them prove that they can restore a single domain controller in a two-DC domain without any issues and where you created new users and group policies after the backup. Don't take the word of some salesman: make them demonstrate my scenario above. You don’t want to build your backup plans around something that doesn’t work as advertised.

    Our fearless writers are banging away on TechNet as I write this to ensure we're not giving out any misleading info around virtualized server backups and restores. If you find any articles that look scary, please feel free to send us an email and I'll see to the edits.

    Until next time.

    - Ned "one of these servers is not like the other" Pyle