When we do initial interviewing of a candidate for a job here in the CSS Directory Services team a question we’ll often start with is “how important is DNS to Active Directory?”. The person’s answer-if the correct answer of very important is given- is a great place to start with more detailed DNS questions. Questions like “what is an SRV and why is it important to domain controllers?”.
As a recap, SRVs are the DNS records which domain controllers register in DNS in order to advertise the services which they can provide to clients and other domain controllers in the domain. These SRVs are used to advertise general DC location, Kerberos, LDAP, Global Catalog and perhaps other services depending on your environment. Domain members and other domain controllers, as a matter of course, will query DNS to locate the appropriate domain controller to send requests to.
Pretty straightforward stuff and something you can learn a lot about by reviewing it as it occurs in a network trace.
Things get a little more complicated when we consider placing the DNS forward lookup zones which contain these SRVs into Active Directory, also known as AD integration. The reason for this is that we are then relying on both traditional dynamic DNS to register these DNS records for hosts and domain controllers services as well as relying on AD replication to guarantee that all DCs which should have a copy of that DNS record (depending on the scope setting in DNS) does in fact have it. That’s the easy part to understand.
So consider that you have a not-so-small environment of about 200 domain controllers distributed among about 50 AD sites (where the network topology is not hub and spoke but a more distributed one) and that the DNS for your forest is AD integrated where the scope option for that (configured in dnsmgmt.msc) is set for every domain controller in the forest. In other words, DNS for the forest is stored in an application partition of its own and replicated around to all 200 DCs when there’s an update. Add these DCs all running the DNS service on themselves and “looking to themselves” or configured to query themselves for DNS as part of this scenario as well, though it is not a vital criteria.
Let’s take a domain controller called DC100 to start our scenario. So, when DC100 registers its own SRVs in DNS it registers them to itself and, via the DNS service running locally, it creates the objects in AD which represent those DNS records. These objects or updates to pre-existing objects are then replicated via AD replication to DC100’s AD replication partners and ultimately out to the entire forest every time the DC registers DNS records dynamically which by default is once every 24 hours (a setting referred to confusingly as the DNS refresh interval, the same name as a DNs server side setting in the DNS Management snapin).
No problem yet, right folks? Well the problem may occur in a difference in how some records that are distinct and separate entries in a standard DNS zone are, instead, in an Active Directory integrated zone stored as a single dnsNode type object which contains the necessary information about the different hosts. This is really only relevant for SRV records which represent hosts advertising the same service such as KRB, GC, LDAP and the like. Rather than separate objects in AD, the information about the domain controllers which the record represents is stored there in one multivalued attribute called dnsRecord.
Those who managed Windows 2000 AD environments are familiar with the way that a multivalued attribute replicates in that version of AD-for a single update to the multivalued attribute the entire attribute would have to be replicated. So, consider the simplistic example of a multivalued attribute which contains 32K of data. When this list was updated with 1K of data the actual data which would be replicated to that DC’s partners was 33K. Server 2003 introduced Linked Value Replication (LVR) which, for attributes which support it and in a 2003 Forest Functional Level (FFL), in the 1K update scenario above only that single 1K would be replicated. That feature saves a lot of processing and network bandwidth over time.
Unfortunately, the dnsRecord attribute is not one that can be replicated via LVR in Server 2003 AD. What that means is that you could have the following “Perfect Storm” scenario if your AD replication topology is large enough and distributed enough. How that happens is that the Netlogon service on each domain controller is responsible for updating DNS, and consequently the dnsRecord attribute, for the SRVs of itself in that 24 hour interval mentioned earlier. Since all DCs in the forest are updating the same version attribute it is possible that two domain controllers could send an update where they add themselves to the list at approximately the same time but one of the DCs update may not include the other DC. What happens in this case is that other DCs receive these two updates-which are the same version-and then take the update which is newer as a “last writer wins” decision. If that update does not include that newly added DC then that update is discarded.
An ideal scenario would be to replicate the dnsRecord multivalued attribute using LVR. This, or a similar method, is something which will hopefully occur in future product. In the meantime our development team has changed the DC Locator DNSRefreshInterval back to 1 hour for Server 2008 in order to help guard against this scenario.
A quick way to check which DC sent the dnsRecord attribute update which may have won in that “last writer wins” situation is to use the tried and true repadmin.exe /showobjmeta command.
But maybe you want to look at the dnsRecord attribute to see if that is your problem. Well, here’s how. The dnsRecord attribute is not human readable unless you use LDP.EXE. Here’s a sample from one of my test DCs of what you see in the DNS snapin for the records...
...and this is how it stored in AD within one attribute of a single object:
Expanding base 'DC=_ldap._tcp.dc._msdcs,DC=adatum.com,CN=MicrosoftDNS,DC=ForestDnsZones,DC=adatum,DC=com'...
Getting 1 entries:
dnsRecord (2): wDataLength: 32 wType: 33; Version: 5 Rank: 240 wFlags: 0 dwSerial: 38 dwTtlSeconds: 1476526080 dwTimeout: 0 dwStartRefreshHr: 3572750 Data:
00 00 00 64 01 85 18 03 0B 61 64 66 73 61 63 63 ...d.…...adfsacc
6F 75 6E 74 06 61 64 61 74 75 6D 03 63 6F 6D 00 ount.adatum.com.
------------------------------------; wDataLength: 36 wType: 33; Version: 5 Rank: 240 wFlags: 0 dwSerial: 38 dwTtlSeconds: 1476526080 dwTimeout: 0 dwStartRefreshHr: 3572751 Data:
00 00 00 64 01 85 1C 03 0F 77 69 6E 2D 31 63 6D ...d.…...win-1cm
69 35 62 35 32 7A 71 63 06 61 64 61 74 75 6D 03 i5b52zqc.adatum.
63 6F 6D 00 com.
dSCorePropagationData: 0x0 = ( );
instanceType: 0x4 = ( WRITE );
objectClass (2): top; dnsNode;
whenChanged: 7/30/2008 8:31:44 AM Pacific Daylight Time;
whenCreated: 7/30/2008 8:27:38 AM Pacific Daylight Time;
Keep in mind that this scenario really only applies to the records that are otherwise known as site-less records-the DNS SRV records which are not in the site subcontainers in DNS. Other records, which are updated by domain controllers in the same AD site, are of course going to be much less prone to this problem since these DCs will replicate within site in a more frequent and more mesh manner, typically. This scenario can be exacerbated by having all DCs look to themselves for primary DNS lookup.
So what do you do in the situation where you see an update unexpectedly discarded like this? There are several things to consider.
The first is that your environment may be in a situation where the DnsAvoidRegisterRecords setting may be useful. This is a setting which can be distributed via Group Policy to define which records your domain controller will not register in DNS. It may be a good idea in your environment to not register these site-less records after all. The ramification of using this setting to not register the site-less records may be that domain clients and other DCs may struggle to locate DCs outside of their own site if the local ones become inaccessible. What that ultimately means is that, if you choose to do this in your environment, you should also consider carefully setting autositecoverage settings as needed.
It’s also possible that a person could arrive at this issue by altering your DNSRefreshInterval settings for registering records using the DC's Netlogon service or otherwise altering your DNS server side settings. If that’s the case you should consider a general formula (below) for configuring your DC’s DNSRefreshInterval and and your DNS server settings of the confusingly similar names “No Refresh Interval” and “Refresh Interval”.
Before getting into the specific of the formula, thought, let’s go over the settings it is referring to.
DNSRefreshInterval is the domain controller side setting for how frequently the DC will attempt to register itself in DNS, which I will recommend that folks alter to 1 hour, from the default 24 hours in Server 2003. This recommendation is not set in stone, so alter as you think best for your environment.
Here’s a picture of the DC specific setting of DNSRefreshInterval:
OK, so here’s the formula which can be used as a guideline to help you decide how to configure these DNS settings for your environment. The formula result will give you the suggested number for both the DNS server setting of “No Refresh Interval” and “Refresh Interval” to configure in your DNS server scavenging settings for the forward lookup zone.
“No Refresh Interval” and “Refresh Interval” must be greater than
((“Number of DC’s” * DNSREFRESHINTERVAL)/24)= “No Refresh Interval” or “Refresh Interval”
Here’s an example for a forest with 300 domain controllers: (300*1)/24=12.5
We can round up to 13, and the measurement is in days. So, in order to help prevent issues where an update may be lost due to replication topology concerns you would set your DCs to register once an hour, and your DNS server settings for 13 days for “No Refresh Interval” and “Refresh Interval” to start with and see how that goes. The idea behind altering these settings this way is to provide a balance of giving the DCs in the most "remote" sites time to register as they need to in DNS while also having enough scavenging take place in the forest to prevent problems with unexpected SRVs being in the forward lookup zone.
Here’s a picture of the DNS server settings in the DNSMgmt.msc snapin which I’ve been talking about, to help eliminate any confusion about what settings are being discussed.
This issue is relatively uncommon but, for those who are not actually seeing this issue, a good insight into how AD and DNS work and some things to keep an eye out for.
Special thanks to Nick Tuttle of our United Kingdom support team for pioneering this issue.
I wanted to point out a correction. Server 2008 does have altered behavior in this scenario, but it is an altered default refresh interval for the DC locator records, not making the dnsRecord attribute replicated via LVR.
This was a misread on my part. Thanks to John Gregson for setting me straight.