DNS Scavenging is a great answer to a problem that has been nagging everyone since RFC 2136 came out way back in 1997. Despite many clever methods of ensuring that clients and DHCP servers that perform dynamic updates clean up after themselves sometimes DNS can get messy. Remember that old test server that you built two years ago that caught fire before it could be used? Probably not. DNS still remembers it though. There are two big issues with DNS scavenging that seem to come up a lot:
"I'm hitting this 'scavenge now' button like a snare drum and nothing is happening. Why?"
"I woke up this morning, my DNS zones are nearly empty and Active Directory is sitting in a corner rocking back and forth crying. What happened?"
This post should help us figure out when the first issue will happen and completely avoid the second. We'll go through how scavenging is setup then I'll give you my best practices.
Scavenging will help you clean up old unused records in DNS. Since "clean up" really means "delete stuff" a good understanding of what you are doing and a healthy respect for "delete stuff" will keep you out of the hot grease. Because deletion is involved there are quite a few safety valves built into scavenging that take a long time to pop. When enabling scavenging patience is required. It will work just fine, but not today!
Note: For purposes of this discussion we are going to concentrate on the most common Windows DNS scenario: Windows Server 2003 DNS servers hosting AD integrated zones.
Scavenging is set in three places on a Windows Server:
It must be set in all three places or nothing happens.
To see the scavenging setting on a record hit View | Advanced in the DNS MMC then bring up properties on a record.
Scavenging gets set on a resource record in one of three ways. The first is by someone coming in here, checking the "Delete this record when it becomes stale" checkbox and hitting apply. When you hit apply the time of day will be rounded down to the nearest hour and applied as the timestamp on the record. Static records have a timestamp of 0 indicating do not scavenge.
The second is when a record gets created by a client machine registering using dynamic DNS. Windows clients will attempt to dynamically update DNS every 24 hours. All DDNS records get set to scavenge. When a record is first created by a client that has no existing record it is considered an "Update" and the timestamp is set. If the client has an existing host record and changes the IP of the host record this is also considered an "Update" and the timestamp is set. If the client has an existing host record with the same IP address then this is considered a "Refresh" and the timestamp may or may not get changed depending on zone settings. More on this later.
The third way to set scavenging on records is by using DNScmd.exe with the /ageallrecords switch. Let's pause here for a few moments to consider a few important words: All, Records, Delete, Stuff. If you actually run this command against a zone it will truly set scavenging and a timestamp on all records in the zone including static records that you never want to be scavenged. Because of the time it takes scavenging to do it's thing people find this command and get tempted to give it a try. Do not. It will delete stuff. Have patience instead.
Once a timestamp is set on a record it will replicate around to all servers that host the zone. There is one caveat to this. If scavenging is not enabled on the zone that hosts the record then it will never scavenge so the timestamp is essentially irrelevant. The timestamp may get updated on the server where the client dynamically registers but it will not replicate around to the other servers in the zone.
Before a server will even look at a record to see if it will be scavenged the zone must have scavenging enabled. To access the scavenging settings for a zone right click the zone, select properties then on the general tab hit the "Aging" button. This screen is universal for the zone. If you view it on any DNS server where this zone is replicated it will be the same.
When you first set scavenging on a zone the timestamp seen at the bottom (reload zone if you don't see it) will be set to the current time of day rounded down to the nearest hour plus the Refresh interval. This also gets reset any time the zone is loaded or any time dynamic updates get enabled on the zone.
The "zone can be scavenged after" timestamp is the first of your safety valves. It gives clients time to get their record timestamp updated before the big axe swings. Since new record timestamps are not replicated while zone scavenging is disabled this also gives replication time to get things in order.
The next safety valves are the Refresh and No-refresh intervals. Both of these must elapse before a record can be deleted.
The No-refresh interval is a period of time during which a resource record cannot be refreshed. Recall from earlier that a refresh is a dynamic update where we are not changing the host/IP of a resource record, just touching the timestamp. If a client changes the IP of a host record this is considered an "update" and is exempt from the No-refresh interval. The purpose of a No-refresh interval is simply to reduce replication traffic. A change to a record means a change that must be replicated.
After the (Record Timestamp) + (No-refresh interval) elapses we enter the Refresh interval. The refresh interval is the time when refreshes to the timestamp are allowed. This is the time when good things must happen. The client is allowed to come in and update it's timestamp. This timestamp will be replicated around and the No-refresh interval begins again. If for some reason the client fails to update it's record during the refresh interval it becomes eligible to be scavenged. Will it disappear immediately? Probably not but it is certainly possible.
Note: When setting Refresh and No-Refresh intervals be sure to allow enough time for clients to get several registration attempts during a Refresh interval. Failure to do so could allow a record to become eligible for scavenging simply from a failed refresh attempt.
One last thing before we leave the zone setting behind. If you right click on your server you will see the option to "Set Aging/Scavenging for All Zones...". Selecting this will take you to a screen similar to the one above. What does this do? This sets the default settings that will be used if a new zone is created by this server. Unless you check the subsequent box "Apply these settings to the existing Active Directory-integrated zones" it will not touch existing zones.
So you now have a resource resource record set to scavenge and a zone set to scavenge. All that is left is for somebody to come along, check all the timestamps and delete some stuff. This is done by any server that hosts the AD integrated zone.
Setting scavenging on the server is done by right clicking the server in the MMC, selecting properties, going to the advanced tab and checking the "Enable automatic scavenging of stale records" checkbox.
The Scavenging Period is how often this particular server will attempt to scavenge. When a server scavenges it will log a DNS event 2501 to indicate how many records were scavenged. An event 2502 will be logged if no records were scavenged. Only one server is required to scavenge since the zone data is replicated to all servers hosting the zone.
Tip: You can tell exactly when a server will attempt to scavenge by taking the timestamp on the most recent 2501/2502 event and adding the Scavenging period to it.
Although you can set every server hosting the zone to scavenge I recommend just having one. The logic for this is simple: If the one server fails to scavenge the world won't end. You'll have one place to look for the culprit and one set of logs to check. If on the other hand you have many servers set to scavenge you have many logs to check if scavenging fails. Worse yet, if things start disappearing unexpectedly you don't want to go hopping from server to server looking for 2501 events.
To facilitate strict control over which server is scavenging for a zone you can use DNSCmd.exe to specify exactly which servers may scavenge. For example the following command will make it so that only 192.168.1.1 and 192.168.1.2 DNS servers are allowed to scavenge on the contoso.com zone:
DNSCmd . /ZoneResetScavengeServers contoso.com 192.168.1.1 192.168.1.2
DNSCmd . /ZoneResetScavengeServers contoso.com 192.168.1.1 192.168.1.2
With the server now scavenging, zones enabled for scavenging, and resources records set what actually happens when the server does it's thing?
When the last 2501/2052 event + the server scavenging period comes around the server is going to make a scavenging attempt. You can also manually initiate an attempt by right clicking the server and selecting "Scavenge Stale Resource Records". Note that manually making an attempt in no way bypasses the safety valves. These are the final safety valves before we "delete stuff":
If all of the above checks are good then the zone is ready to be scavenged. At this point the scavenging server checks the timestamp on each individual resource record. If the current date/time is greater than the timestamp + No-refresh + Refresh then the record is deleted.
Here is how I set scavenging up on a preexisting zone. This procedure is designed for maximum safety. Using default settings this process can take as long as 4-5 weeks (2 weeks Sanity phase, 2-3 weeks for Enable phase)
Sift through your DNS records looking for any records older than the Refresh + No-Refresh interval. If you see any then something has gone wrong with the dynamic registration process and it must be corrected before proceeding. A thorough check at this point is the most important step in setup
Things to check if you find old records:
Do not proceed unless you can explain any outdated records. In the next phase they will be deleted.
The final step is to actually enable scavenging. Enable scavenging on the single server you used the /ZoneResetScavengServers command on.
Once enabled create a new test record and enable it for scavenging. Then map out the point in time when this record will disappear. Here is how:
Lets look at an example with the following assumptions:
Given these assumptions you can rub your temples for a bit and predict that the record will be deleted at approximately 6am on 1/10/2008.
Once scavenging is enabled you can check back periodically to look for the 2501 and 2502 events to see how things are going. You can also come back at the predicted date and time and see if your test record disappeared.
Thanks very much for a well written explanation of the failsafes and timeframes involved in enabling scavenging.
Scavenging is one of the most complicated aspects of MS DNS. It isn't simple to predict what will happen or when it will happen. Personally, I think it should be enabled by default and set fairly aggressively but I'm jaded from my personal experience. GREAT info in the blog; VERY helpful! Keep up the posting - this stuff is sooo helpful to the troops in the field.
during the sanity phase you advised to look at records older than refresh and no refresh interval, before checking Do I have to run scavenging manually from the GUI with the option "Scavenge stale resource record" ?
Thanks indeed for the docs it is very helpfull
When doing the sanity check you (the admin) are really in read-only mode. You are just looking through your records. If you've done the previous steps then records should be getting updated regularly via DDNS. This point is your last chance to catch anything that's not working before scavenging truly begins.
It's a tedious but important step. If you are managing large DNS zones then you could automate it to some degree by using DNScmd to export the zone then scrub it via script, Excel, or something similar.
Manually scavenging is often of little use. If scavenging is properly setup then it will work without intervention. If it is not properly setup then both automatic and manual attempts will fail. aka... "I'm hitting this 'scavenge now' button like a snare drum and nothing is happening. Why?" :)
I agree. The best way to do scavenging is to flip it on when the zone is first created and is still empty. Any RRs that appear in the zone from DDNS are obviously working ok. Any RRs that you create manually will have the scavenge checkbox cleared by default. Either way you are safe.
I'm not sure I would want the default settings to be more aggressive though. The 7 and 7 day intervals seem rather arbitrary at first but when you look at the default DHCP lease time of 8 days it makes more sense. DHCP attempts a renew at half the lease interval right? So 4 days + 2 days + 1 day = 3 attempts within a single 7 day interval.
Thanks for the feedback everyone!
I added a Host (A) record for my UNIX client (sunws1) and an Alias (CNAME) record (activedsvr) for the DC as the (Kerberos) KDC that my UNIX machine will authenticate to. After a week or so, I noticed I couldn't authenticate from the UNIX client to my DC, and all my DNS records I had added were gone - scavenged I believe. I only have one DC, and I'm not replicating to any other DC. It's a closed test network. I turned off scavenging, but why would I ever want my aliases and host records to disappear? Or is this only an issue when a system has only one (1) DC (and not replicating anywhere)?
Scavenging is a solution for some problems that creep up when using dynamic DNS. When clients update their own records there are situations where incorrect or outdated records can be left behind. For example, mobile clients roaming between two DHCP servers can cause fits with the PTR records that some unix apps use for "client verification".
Scavenging will never delete a static record and you would never want it to.
If you created a static record and found it disappeared then either the scavenge checkbox got checked or a client came along and made an update to this record and in doing so flipped it from static to dynamic. Setting DNS to only accept secure dynamic updates will prevent this as the DDNS client will have no permissions to the record (barring use of the DNSupdateProxy group).
In the case of a Domain Controller (Microsoft KDC) it will continuously update it's records so even with scavenging they will not disappear. If for some reason the DC goes offline for an extended period then scavenging will take out the record just as you would want it to. It will reappear if the server ever comes up again.
Thanks for this usefull reply. I've got a question about record ownership in DNS. When I try to see whom is the owner of a dynamic record, the owner tab always display "SYSTEM" on the "current owner of this item:" field. Is that correct ?
Thank you, great article! Only one correction. We had a problem with the "Setup Phase". You wrote
1. ... DNSCmd /ZoneResetScavengeServers can be used ...
2. Turn on scavenging on the zones ...
But the command in (1.) does not work before scavenging in turned on at the zone level.
Error, failed reset of scavenge servers on zone rsint.net. Status = 9611
Command failed: DNS_ERROR_INVALID_ZONE_TYPE 9611
We opened a call at microsoft and they told us we have to enable scavenging on the zone first. Not logical but that's it!
Very good explanation, especially compared to other articles including Microsoft Technet. Thank you.
Does anyone know of a CD/DVD step-by-step video instruction on DNS? I am a visual person and feel more comfortable.
Should I set all of the DNS records (servers, etc.) to no refresh, i.e., no to delete this record when it becomes stale?
I, personally, would perfer just a little more info out of the 2501 events. Aside from doing an export of the zones before and after, I have no way of telling what got axed. Are there any methods to getting some kind of report as to what zones got scavenged?
Very well written and informative. Thanks.
Another tidbit; If you want to test by manualy starting the Scavenge process, you can "Scavenge Stale Resource Records" from the Action menu. The system will only allow you to do this once very 30 minutes.
Super important before enabling DNS Scavenging is to verify that the DHCP Client Service for DCs and all important member servers is up and runnning!
If not A and PTR records will not be refreshed (Note: this DNS registration function has been moved to the DNS Client in Windows 2008), you'll have all SRV records for DCs thanks to the Netlogon Service but without the A records they will end up being useless...
This is another known reason for "I woke up this morning, and my Active Directory (and/or critical servers) was sitting in a corner rocking back and forth crying. What happened?"