What does DCDIAG actually… do?

What does DCDIAG actually… do?

  • Comments 6
  • Likes

Hi folks, Ned here again. I recently wrote a KB article about some expected DCDIAG.EXE behaviors. This required reviewing DCDIAG.EXE as I wasn’t finding anything deep in TechNet about the “Services” test that had my interest. By the time I was done, I had found a dozen other test behaviors I had never known existed. While we have documented the version of DCDIAG that shipped with Windows Server 2008 – sometimes with excellent specificity, like Justin Hall’s article about the DNS tests – mostly it’s a black box and you only find out what it tests when the test fails. Oh, we have help of course: just run DCDIAG /? to see it. But it’s help written by developers. Meaning you get wording like this:

Advertising
Checks whether each DSA is advertising itself, and whether it is advertising itself as having the capabilities of a DSA.

So, it checks each DSA (whatever that is) to see if it’s advertising (whatever that means). The use of an undefined acronym is an especially nice touch, as even within Microsoft, DSA could mean:

Naturally, this brings out my particular brand of OCD. What follows is the result of my compulsion to understand. I’m not documenting every last switch in DCDIAG, just the tests. I am only documenting Windows Server 2008 R2 SP1 behavior – I have no idea where the source code is for the ancient Support Tools version of DCDIAG and you aren’t paying me enough here to find it :-).  The Windows Server 2008 RTM through Windows Server 2008 R2 SP1 versions are nearly identical except for bug fixes:

KB2401600 The Dcdiag.exe VerifyReferences test fails on an RODC that is running Windows Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;en-US;2401600

KB979294 The Dcdiag.exe tool takes a long time to run in Windows Server 2008 R2 and in Windows 7
http://support.microsoft.com/default.aspx?scid=kb;EN-US;979294

KB978387 FIX: The connectivity test that is run by the Dcdiag.exe tool fails together with error code 0x621
http://support.microsoft.com/default.aspx?scid=kb;EN-US;978387

Everything I describe below you can discover and confirm yourself with careful examination of network captures and logging, to include the public functions being used – but why walk when you can ride? Using /v can also provide considerable details on some tests. No internal source code is described nor do I show any special hidden functionality.

For info on all the network protocols I list out – or if you run into network errors when using DCDIAG – see Service overview and network port requirements for the Windows Server system. I went pretty link-happy in general in this post to help people using it as a reference; that way if you just look at your one little test it has all the info you need. I don’t always call out name resolution being tested because it is implicit; it’s also testing TCP, UDP, and IP.

Finally: this post is more of a reference than my usual lighthearted fare. Do not operate heavy machinery while reading.

Initial Required Tests

This tests general connectivity and responsiveness of a DC, to include:

  • Verifying the DC can be located in DNS.
  • Verifying the DC responds to ICMP pings.
  • Verifying the DC allows LDAP connectivity by binding to the instance.
  • Verifying the DC allows binding to the AD RPC interface using the DsBindWithCred function.

The DNS test can be satisfied out of the client cache so restarting the DNS client service locally is advisable when running DCDIAG to guarantee a full test of name resolution. For example:

Net stop "dns client" & net start "dns client" & dcdiag /test:verifyreplicas /s:DC-01

The initial tests cannot be skipped.

The initial tests use ICMP, LDAP, DNS, and RPC on the network.

Editorial note: Blocking ICMP will prevent DCDIAG from working. While blocking ICMP is highly recommended at the Internet-edge of your network, internally blocking ICMP traffic mainly just leads to administrative headaches like breaking legacy group policy, breaking black hole router detection (or leading to highly inefficient MTU sizes due to lack of a discovery option), and breaking troubleshooting tools like ping.exe or tracert.exe. It creates an illusion of security; there are a great many other easy ways for a malicious internal user to locate computers.

Advertising

This test validates that the public DsGetDcName function used by computers to locate domain controllers will correctly locate any DCs specified with in the command line with the /s, /a, or /e parameter. It checks that the server successfully reports itself with DS_Flags for:

  • DC
  • LDAP server
  • Writable or Read-Only DC
  • KDC
  • Time Server
  • GC or not (and if claiming to be a GC, if the is GC ready to respond to requests )

Note that “advertising” is not the same as “working”. For instance, if the KDC service is stopped the Advertising test will fail since the flag returned from DsGetDcName will not include KDC. But if port 88 over TCP and UDP are blocked on a firewall, the Advertising test will pass – even though the KDC is not going to be able to answer requests for Kerberos tickets.

This test is done using RPC over SMB (using a Netlogon named pipe) to the DC plus LDAP to locate the DCs site information.

CheckSDRefDom

This test validates that your application partition cross reference objects (located in “cn=partitions,cn=configuration,dc=<forest root domain>”) contain the correct domain names in their msDS-SDReferenceDomain attributes. The test uses LDAP.

I find no history of anyone ever seeing the error message that can be displayed here.

The test uses LDAP.

CheckSecurityError

This test does a variety of checks around the security components of a DC like Kerberos. For it to be more specifically useful you should provide /replsource:<some partner DC> as the default checks are not as comprehensive.

This test:

  • Validates that at least one KDC is online for each domain and they are reachable (first in the same site, then anywhere in the domain if that fails)
  • Checks if packet fragmentation of Kerberos over UDP might be an issue based on current MTU size by sending non-fragmenting ICMP packets
  • Checks if the DC’s computer account exists in AD, if it’s within the default “Domain Controllers” OU, if it has the correct UserAccountControl flags for DCs, that the correct ServerReference attributes are set, and if the minimum Service Principal Names are set
  • Validates that the DCs computer object has replicated to other DCs
  • Validates that there are no replication or KCC connection issues for connected partners by querying the function DsReplicaGetInfo to get any security-related errors

When the /replsource is added, a few more tests happen. The partner is checked for all of the above also, then:

  • Time skew is calculated between the servers to verify it is less than 300 seconds for Kerberos. It does not check the Kerberos policy to see if allowed skew has been modified
  • Permissions are checked on all the naming contexts (such as Schema, Configuration, etc.) on the source DC to validate that replication and connectivity will work between DCs
  • Connectivity is checked to validate that the user running DCDIAG (and therefore in theory, all other users) can connect to and read the SYSVOL and NETLOGON shares without any security errors. It also checks IPC$, but inability to connect there would have broken many earlier tests
  • The "Access this computer from the network" privilege on the DC is checked to verify it is held by Administrators, Authenticated Users, and Everyone groups
  • The DC's computer object is checked to ensure it is the latest version on the DCs. This is done to prove replication convergence since a very stale DC might lead to security issues for users, problems with the DCs own computer account password, or secure channels to other servers. It checks versions, USNs, originating servers, and timestamps

These tests are performed using LDAP, RPC, RPC over SMB, and ICMP.

Connectivity

No matter what you specify for tests, this always runs as part of Initial Required Tests.

CrossRefValidation

This test retrieves a list of naming contexts (located in “cn=partitions,cn=configuration,dc=<forest root domain>”) with their cross references and then validates them, similar to the CheckSDRefDom test above. It is looking at the nCName , dnsRoot, nETBIOSName, and systemFlags attributes to:

  • Make sure the names or DNs are not invalid or null
  • Confirm DNs are not otherwise mangled with CNF or 0ADEL (which happens during Conflict or Deletion operations)
  • Ensure the systemFlags are correct for that object
  • Call out any empty (orphaned) replica sets

The test uses LDAP.

CutoffServers

Tests the AD replication topology to ensure there are no DCs without working connection objects between partners. Any servers that cannot replicate inbound or outbound from any DCs are considered “cut off”. It uses the function DsReplicaSyncAll to do this which means this “test” actually triggers replication on the DCs so use with caution if you are the owner of crud WAN links that you keep clean with schedules, and certainly consider this before using /e.

This test is rather misleading in its help description; if it cannot contact a server that is actually unavailable to LDAP on the network then it gives no error or test results, even if the /v parameter is specified. You have to notice that there is no series of “analyzing the alive system replication topology” or “performing upstream (of target) analysis” messages being printed for a cutoff server. However, the Connectivity test will fail if the server is unreachable so it’s a wash.

The test uses RPC.

DcPromo

The DCpromo test is one of the two oddballs in DCDIAG (the other is ‘DNS’). It is designed to test how well a DCPROMO would proceed if you were to run it on the server where DCDIAG is launched. It also has a number of required switches for each kind of promotion operation. All of the tests are against the server specified first in the client DNS settings. It tests:

  • If at least one network adapter has a primary DNS server set
  • If you would have a disjoint namespace based on the DNS suffix
  • That the proposed authoritative DNS zone can be contacted
  • If dynamic DNS updates are possible for the server’s A record. It checks both the setting on the authoritative DNS zone as well as the client registry configuration of DnsUpdateOnAllAdapters and DisableDynamicUpdate
  • If an LDAP DClocator record (i.e. “_ldap ._tcp.dc._msdcs.<domain>”) is returned when querying for existing forests

The test uses DNS on the network.

DNS

This series of enterprise-wide DNS tests are already well documented here:

http://technet.microsoft.com/en-us/library/cc731968(WS.10).aspx

The tests use DNS, RPC, and WMI protocols.

FrsEvent

This test validates the File Replication Service’s health by reading (and printing, if using /v) FRS event log warning and error entries from the past 24 hours. It’s possible this service won’t be running or installed on Windows Server 2008 or later if SYSVOL has been migrated to DFSR. On Windows Server 2008, some events may be misleading as they may refer to custom replica sets and not necessarily SYSVOL; on Windows Server 2008 R2, however, FRS can be used for SYSVOL only.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

DFSREvent

This test validates the Distributed File System Replication service’s health by reading (and printing, if using /v) DFSR event log warning and error entries from the past 24 hours. It’s possible this service won’t be running or installed on Windows Server 2008 if SYSVOL is still using FRS; on Windows Server 2008 R2 the service is always present on DCs. While this ostensibly tests DFSR-enabled SYSVOL, any errors within custom DFSR replication groups would also appear here, naturally.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

SysVolCheck

This test reads the DCs Netlogon SysvolReady registry key to validate that SYSVOL is being advertised:

HKEY_Local_Machine\System\CurrentControlSet\Services\Netlogon\Parameters
SysvolReady=1

The value name has to exist with a value of 1 to pass the test. This test will work with either FRS or DFSR-replicated SYSVOLs. It doesn’t check if the SYSVOL and NELOGON shares are actually accessible, though (CheckSecurityError does that).

The test uses RPC over SMB (through a named pipe to WinReg).

LocatorCheck

This test validates that DCLocator queries return the five “capabilities” that any DC must know of to operate correctly.

If not hosting one, the DC will refer to another DC that can satisfy the request; this means that you must carefully examine this under /v to make sure a server you thought was supposed to be holding a capability actually is correctly returned. If no DC answers or if the queries return errors then the test will fail.

The tests use RPC over SMB with the standard DsGetDcName DCLocator queries.

Intersite

This test uses Directory Replication Service (DRS) functions to check for conditions that would prevent inter-site AD replication within a specific site or all sites:

  • Locates and connect to the Intersite Topology Generators (ISTG)
  • Locates and connect to the bridgehead servers
  • Reports back any replication failures after triggering a replication
  • Validates that all DCs within sites with inbound connections to this site are available
  • Checks the KCC values for “IntersiteFailuresAllowed” and “MaxFailureTimeForIntersiteLink” overrides within the registry key:

KEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

You must be careful with this test’s command-line arguments and always provide /a or /e. Not providing a site means that the test runs but skips actually testing anything (you can see this under /v).

All tests use RPC over the network to test the replication aspects and will make registry connections (RPC over SMB to WinReg) to check for those NTDS settings override entries. LDAP is also used to locate connection info.

KccEvent

This test queries the Knowledge Consistency Checker on a DC for KCC errors and warnings generated in the Directory Services event log during the last 15 minutes. This 15 minute threshold is irrespective of the Repl topology update period (secs) registry value on the DC.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

KnowsOfRoleHolders

This test returns the DC's knowledge of the five Flexible Single Master Operation (FSMO) roles. The test does not inherently check all DCs knowledge for consistency, but using the /e parameter would provide data sufficient to allow comparison.

The test uses RPC to return DSListRoles within the Directory Replication Service (DRS) functions.

MachineAccount

This test checks if:

  • The DC's computer account exists in AD
  • It’s within the Domain Controllers OU
  • It has the correct UserAccountControl flags for DCs
  • The correct ServerReference attributes are set
  • The minimum Service Principal Names are set. For those paying close attention, this is identical to one test aspect of CheckSecurityError; this is because they use the same internal test

This test also mentions two repair options:

  • /RecreateMachineAccount will recreate a missing DC computer object. This is not a recommended fix as it does not recreate any child objects of a DC, such as FRS and DFSR subscriptions. The best practice is to use a valid SystemState backup to authoritatively restore the DC's deleted object and child objects. If you do use this /RecreateMachineAccount option then the DC should then be gracefully demoted and promoted to repair all the missing relationships
  • /FixMachineAccount will add the UserAccountControl flags to a DCs computer object for “TRUSTED_FOR_DELEGATION” and “SERVER_TRUST_ACCOUNT”. It’s safe to use as a DC missing those bit flags will not function and it does not remove other bit flags present. Using this repair option is preferred over trying to set these flags yourself through ADSIEDIT or other LDAP editors

This test uses LDAP and RPC over SMB.

NCSecDesc

This test checks permissions on all the naming contexts (such as Schema, Configuration, etc.) on the source DC to validate that replication and connectivity will work between DCs. It makes sure that “Enterprise Domain Controllers” and “Administrators” groups have the correct minimum permissions. This is the same performed test within CheckSecurityError.

The test uses LDAP.

NetLogons

This test is designed to:

  • Validate that the user running DCDIAG (and therefore in theory, all other users) can connect to and read the SYSVOL and NETLOGON shares without any security errors. It also checks IPC$, but inability to connect there would have broken many earlier tests
  • Verify that the Administrators, Authenticated Users, and Everyone group have the “access this computer from the network” privilege on the DC. If not, you’d see a ton of other errors here though, naturally

Both of these tests are also performed by CheckSecurityError.

The tests use SMB and RPC over SMB (through named pipes).

ObjectsReplicated

This test verifies that replication of a few key objects and attributes has occurred and displays up-to-dateness info if replication is stale. By default the two objects validated are:

  • The ”CN=NTDS Settings” object of each DC exists up to date on all other DCs.
  • The “CN=<DC name>” object of each DC exists up to date on all other DCs.

This test is not valuable unless run with /e or /a as it just asks the DC about itself when those are not specified. Using /v will give more details on objects thought to be stale based on version.

You can also specify arbitrary objects to test with /objectdn /n, which can be useful after creating a “canary” object to validate replication.

The tests are done using RPC with Directory Replication Service (DRS) functions.

OutboundSecureChannels

This test is designed to check external trusts. It will not run by default and will fail even when provided correct /testdomain parameters, validating the secure channel with NLTEST.EXE, and using a working external trust. It does state that the secure channel is valid but then mistakenly reports that there are no working trust objects. I’ll update this post when I find out more. This test should not be used.

RegisterLocatorDnsCheck

Validates many of the same aspects as the Dcpromo test. It requires the /dnsdomain switch to specify a domain that would be the target of registration; this can be a different domain than the current primary one. It specifically verifies:

  • If at least one network adapter has a primary DNS server set.
  • If you would have a disjoint namespace based on the DNS suffix
  • That the proposed authoritative DNS zone can be contacted
  • If dynamic DNS updates are possible for the server’s A record. It checks both the setting on the authoritative DNS zone as well as the client registry configuration of DnsUpdateOnAllAdapters and DisableDynamicUpdate
  • If an LDAP DClocator record (i.e. “_ldap ._tcp.dc._msdcs.<domain>”) is returned when querying for existing forests
  • That the authoritative DNS zone can be contacted

The test uses DNS on the network.

Replications

This test checks all AD replication connection objects for all naming contexts on specified DC(s) to see:

  • If the last replication attempted was successful or returned an error
  • If replication is disabled
  • If replication latency is more than 12 hours

The tests are done with LDAP and RPC using DsReplicaGetInfo.

RidManager

This test validates that the RID Master FSMO role holder:

  • Can be located and contacted through a DsBind
  • Has valid RID pool values

This role must be online and accessible for DCs to be able to create security principals (users, computers, and groups) as well as for further DCs to be promoted within a domain.

The test uses LDAP and RPC.

Services

This test validates that various AD-dependent services are running, accessible, and set to specific start types:

  • RPCSS - Start Automatically – Runs in Shared Process
  • EVENTSYSTEM - Start Automatically - Runs in Shared Process
  • DNSCACHE - Start Automatically - Runs in Shared Process
  • NTFRS - Start Automatically - Runs in Own Process (if domain functional level is less than Windows Server 2008. Does not trigger on SYSVOL being replicated by FRS)
  • ISMSERV - Start Automatically - Runs in Shared Process
  • KDC - Start Automatically - Runs in Shared Process
  • SAMSS - Start Automatically - Runs in Shared Process
  • SERVER - Start Automatically - Runs in Shared Process
  • WORKSTATION - Start Automatically - Runs in Shared Process
  • W32TIME - Start Manually or Automatically - Runs in Shared Process
  • NETLOGON - Start Automatically - Runs in Shared Process

(If target is Windows Server 2008 or later)

  • NTDS - Start Automatically - Runs in Shared Process
  • DFSR - Start Automatically - Runs in Own Process (if domain functional level is Windows Server 2008 or greater. Does not trigger on SYSVOL being replicated by DFSR)

(If using SMTP-based AD replication)

  • IISADMIN - Start Automatically - Runs in Shared Process
  • SMTPSVC - Start Automatically - Runs in Shared Process

These are the “real” service names listed in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services. If this test is specified when targeting Windows Server 2003 DCs it is expected to fail on RpcSs. See KB2512643.

The test uses RPC and the Service Control Manager remote protocol.

SystemLog

This test validates the System Event Log’s health by reading and printing entries from the past 60 minutes (stopping at computer startup timestamp if less than 60 minutes). Errors and warnings will be printed, with no evaluation done of them being expected or not – this is left to the DCDIAG user.

By default, remote connections to the event log are disabled by the Windows Server 2008/R2 firewall rules so this test will fail. KB2512643 covers enabling those rules to allow the test to succeed.

The test uses RPC, specifically with the EventLog Remoting Protocol.

Topology

This test checks that a server has a fully-connected AD replication topology. This test must be explicitly run. It checks:

The test uses DsReplicaSyncAll with the flag of DS_REPSYNCALL_DO_NOT_SYNC. Meaning that the test analyzes and validates replication topology without actually replicating changes. The test does not validate the availability of replication partners – having a partner offline will not cause failures in this test. This does not test if the schedule is completely closed, preventing replication; to see those active replication results, use tests Replications or CutoffServers.

The test uses RPC and LDAP.

VerifyEnterpriseReferences

This test verifies computer reference attributes for all DCs, including:

  • ServerReference attribute correct for a DC on cn=<DC name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
  • ServerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
  • frsComputerReference attribute correct for a DC site object on cn=domain system volume (sysvol share),cn=ntfrs subscriptions,cn=<DC Name>,ou=domain controllers,DC=<domain>
  • frsComputerReferenceBL attribute correct for a DC object on cn=<DC Name>,cn=domain system volume (sysvol share),cn=file replication service,cn=system,dc=<domain>
  • hasMasterNCs attribute correct for a DC on cn=ntds settings,cn=<DC Name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
  • nCName attribute correct for a partition at cn=<partition name>,cn=partitions,cn=configuration,dc=<domain>
  • msDFSR-ComputerReference attribute correct for a DC DFSR replication object on cn=<DC Name>,cn=topology,cn=domain system volume,cn=dfsr-blobalsettings,cn=system,dc=<domain>
  • msDFSR-ComputerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>

Note that the two DFSR tests are only performed if domain functional level is Windows Server 2008 or higher. This means there will be an expected failure if DFSR has not been migrated to SYSVOL as the test does not actually care if FRS is still in use.

The test uses LDAP. The DCS are not all individually contacted, only the specified DCs are contacted.

VerifyReferences

This test verifies computer reference attributes for a single DC, including:

  • ServerReference attribute correct for a DC on cn=<DC name>,cn=<site>,cn=sites,cn=configuration,dc=<domain>
  • ServerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>
  • frsComputerReference attribute correct for a DC site object on cn=domain system volume (sysvol share),cn=ntfrs subscriptions,cn=<DC Name>,ou=domain controllers,DC=<domain>
  • frsComputerReferenceBL attribute correct for a DC object on cn=<DC Name>,cn=domain system volume (sysvol share),cn=file replication service,cn=system,dc=<domain>
  • msDFSR-ComputerReference attribute correct for a DC DFSR replication object on cn=<DC Name>,cn=topology,cn=domain system volume,cn=dfsr-blobalsettings,cn=system,dc=<domain>
  • msDFSR-ComputerReferenceBL attribute correct for a DC site object on a DC on cn=<DC Name>,ou=domain controllers,dc=<domain>

This is similar to the VerifyEnterpriseRefrences test except that it does not check partition cross references or all other DC objects.

The test uses LDAP.

VerifyReplicas

This test verifies that the specified server does indeed host the application partitions specified by its crossref attributes in the partitions container. It operates exactly like CheckSDRefDom except that it does not show output data and validates hosting.

This test uses LDAP.

 

That’s all folks.

- Ned “that was seriously un-fun to write” Pyle

  • It might have sucked to write but this is going to be one of the most useful blog entries on the site.  

    One issue I have with dcdiag is that it can output so much info sometimes people will get confused.  That is when you see those super long posts on the forums where people dump this huge dcdiag output file and they don't know where to start.

    Great work as usual!

    Thanks

    Mike

  • This is education! Very good post.

  • Thanks folks, the kind words always make it worth the effort. :)

  • Very good post!

  • Huzzah!  Thanks for the hard work, Ned.  Please feel free to continue to bring your "paricular brand of OCD" to other areas of Microsoft documentation.  The community will definitely appreciate it!  Pass word on to your bosses that you deserve a raise too...

    Thanks

    Steve