Introduction to the “Don’t be THAT guy” blog series:
I come in to work every day and get calls with these exact scenarios. Of course, the names have been changed to protect the innocent and to keep the guilty from having to find new jobs. The reason I wanted to start detailing some of these cases was to provide some cautionary tales for any one in the position of network or server administrator. Therefore, read this, what I hope to be the first of many blog entries, so that you might avoid being “THAT guy”.
Don’t be THAT guy… The case of the missing DNS zone.
The call came in from Scotland and apart from enjoying the gentlemen’s accents and their appreciation for a good pint I knew they had a pretty big problem. They described to me that the DNS zone for their company’s domain (corp1.local) was found to be empty after name resolution issues were reported. With their DNS servers being Active Directory integrated this meant that it was likely blank throughout their domain due to replication, and indeed it was. This is definitely in the “not good” category.
With all of their zone information gone an Authoritative Restore of the AD partition containing the DNS zone was going to be necessary. Once we started looking at this it was determined that the root cause of the problem was an Active Directory Object collision. This was collision brought about by someone at the company in a remote location (neither of my new Scottish friends) creating a 2003 server domain controller (DC) and installing DNS on it.
As you might have guessed all of the DNS servers are Active Directory integrated and store the zone information in AD. This means that all of the DNS records and zones are stored as objects within (AD). You can further specify where in AD you want these objects stored. You can choose to have them stored in the System partition (where Windows 2000 servers stored the info by default). Or, with Windows Server 2003 you can choose to store it in a domain-wide container (DomainDNSZones) or a forest-wide container (ForestDNSzones) depending on how you want the zone information deployed.
Now prior to the creation of this new DC and installation of DNS, all of their DNS servers (and by the way, to be AD integrated a DNS server must be a DC) were set to replicate to “All domain controllers in the Active Directory domain” as seen below:
This configuration places all of the DNS zone information in the System partition of AD under the MicrosoftDNS container as shown below.
Now when the customer’s technician (aka THAT guy) installed DNS they apparently did two things that are a BIG no-no. First after installing DNS he actually created the zone for the domain that he knew he would need. Now the first time people hear that they say “so…”, but after you think about how Active Directory Integration works with DNS you know this was a bad thing. Once this zone was created and left blank this “Change” was replicated out to all of the other DNS servers. Therefore, a blank zone for their domain was propagated through their environment effectively wiping out all current DNS zone information in its wake.
Now, admittedly the first mistake was probably the biggest since it can bring an entire network to its knees in one replication cycle, but the second mistake made it harder for us to correct the first one. When THAT guy was creating the new blank zone he accepted all of the 2003 Server DNS defaults which has the SECOND radio button (from figure 1 above) selected: “To all DNS servers in the Active Directory domain”. This forced the NEW zone that was created to be stored in the DomainDNSzones partition (See fig 3 below) and not the System partition like all of the other DC’s in the domain.
Now I know what you’re thinking: If all of your other DC’s are using the System partition to replicate the DNS zones and that is also where there zone information is loaded from, then WHY would these DC’s load the “blank” zone from the DomainDNSzones partition? I’m glad you asked because I did the same thing. When we looked at the System partitions on the DC’s we found that all of the correct zone information was there and had not been overwritten by the “blank” zone. Why did it not load into DNS?
When the new zone replicated to the DomainDNSZones partition this created a duplicate object for the corp1.local zone within AD that in turn caused a collision. This collision or conflict within AD kept the good zone data from being loaded. Once we determined that the empty zone existed in DomainDNSzones we simply deleted it and restarted the DNS service on the DCs to restore the proper zone info and name resolution was restored throughout their network.
So what could have prevented this poor soul from becoming THAT guy? Here are some possibilities:
1 – Having a test domain with either physical or virtual machines set up to mimic your corporate environment so upcoming changes can be demonstrated as safe. Even condensing a world wide environment to a handful of VM’s is better that shooting from the hip.
2 – Having a change control process where the changes to be made are reviewed before granting permission for they implemented.
3 – Having your buddy look over your shoulder and give you a reality check before you press “OK”.
If you would like to know more about the DNS scenario that took place above please follow the link below:
867464 Event ID 4515 is logged in the DNS Server log in Windows Server 2003
And remember: Don’t be THAT guy.
- Steven Martin
PingBack from http://neuralfibre.com/paul/it/heres-a-ad-dns-screwup-i-have-seen-firsthand
Introduction to the “Don’t be THAT guy” blog: I come in to work every day and get calls
Great info - we had a similar event take place when an admin deleted the DNS zone from a DC that was thought to be having repl issues as a 'step' in troubleshooting. Well, it wasn't having repl issues but shortly after his efforts, there were LOTS of issues with repl being just one. we quickly recovered functionality but in the wrong way: we had a lag site DC, made the AD-i zone there a standard pri zone, copied the text file back over the the PDCE and made a 'new' primary zone, based on that file, then converted it to AD-i. In hindsight, the auth restore was the way to go and we encountered issues from this 'fix' for quite some time. Unfortunately, we just didn't think it through while our world was crumbling around us. Keep up the great postings!