John Howard - Senior Program Manager in the Hyper-V team at Microsoft

Senior Program Manager, Hyper-V team, Windows Core Operating System Division.

Blogs

How to remove a failed server from DFS in Windows Server 2003 R2

  • Comments 7
  • Likes

This week has been a little strained, hence quiet on the blogging front. Apart from a hectic week at work (more to follow on that shortly), the reason was a "disaster" which happened late last Sunday evening - everything was working at home one moment, and dead the next.

Since the move over from the UK, I'm still in temporary accomodation. To save space, although my servers were couriered over, I didn't bring a monitor as all the monitors I owned only ran on 240v. The servers arrived a little shaken, but not too stirred - a few cards were loose, but no failed disks. Just a little prodding into place and they came back perfectly. Humming away for 6 weeks or so without fault.

If you've ever tried to figure out why a machine won't boot without a monitor attached, I know where you're coming from. Short answer is, it's next to impossible. It also happened that this machine was not just any machine, but a Domain Controller. And not just any domain controller, the domain controller holding all the FSMO roles for my home domain. Of course, it will probably come as no surprise to you it's also running a further 5 virtual machines including my website hosting, ISA and Exchange. So yes, it was somewhat of a disaster.

On Monday morning, I took the machine into the office and with a monitor attached, it was obvious it was continually rebooting (off both plexes in the boot mirror) before the GUI portion of the boot came up. Safe mode, last known good gave same symptoms. Similarly, boot logging didn't help as the boot log doesn't get written to disk until the GUI part of the boot comes up.

I borrowed another disk from Ben, plugged it in and installed XP SP2 (only 32 bit OS immediately to hand). However, during the first boot, it blue-screened. Sure enough, there was a problem with the hardware - either motherboard or memory.

Running a memory tester showed something wrong with one or more of the (expensive!) ECC memory slots. I saw a big bill coming :(. It was a tedious process of elimination by swapping DIMMs around until the failed chip or chips was identified. That at least got to the point of XP booting. Attempting to boot back with the failed DIMM removed (actually a pair as the system needs matched pairs), same symptoms as before. At this point, going back to XP, I discovered XP didn't have drivers for the RAID SCSI Controller for the system boot disk and worse, none were available. Onto plan B for recovery.


I re-installed a Windows Server 2003 on the loan disk with the recovery console enabled to attempt to see what was going on. Chkdsk showed the SCSI disks being corrupt and the mirror needing repair. Fixing those still wasn't getting past the text mode part of the boot.

Not being one to give up, I took the machine home on Monday night. During the day, my wife had bought a second hand 17" monitor for $20.00 - given it's in next to perfect condition, I thought that was pretty good value.

From the recovery console of Windows Server installed on the loan disk, I spent two very long and tedious evenings going through disabling drivers one-by-one in the hope I'd find the driver failing to load - every time the same 0x0000007b with 0xc000007b in the parameter list - inaccessible_boot_disk.

Well, two days later I did give up. In some ways I'm glad I did - when I took the decision to blow away the machine for real, I discovered the disks were also corrupt in some way - both of them. Blue screens on reinstall. Possibly the RAID controller? Nope, tried a spare one too :( Anyway, I've more disks on order and more memory on order - at least they're much cheaper in the US than in the UK.

In the meantime, with reduced RAM, on the loan disk I at least got the ISA server and the Exchange server back running. Cleaning up AD to seize the FSMO roles which were held by the previous installation is easy enough (http://support.microsoft.com/?id=255504). They're now safely on a Virtual domain controller running on another server.

However, there was one interesting side effect relating to DFS in Windows Server 2003 R2. Yes, the machine also was a file server replicating to another server using RDC using domain based DFS. Some of the DFS roots had the now decommissioned server as the preferred target. What this unfortunately means is that when you go into the DFS console from another machine (either another server or from an XP machine with the console installed), when examining the DFS Root, you get the error below: \\domain.com\share: The namespace cannot be queried. The RPC server is unavailable.

This only happens on roots which were configured to have the failed server as the preferred target. Clients were still OK accessing the still working server as they failed over automatically

So, from the File Server Management Console, you're stuck - you can't remove the failed server. However, you can use the command line utility, dfsutil to forceably remove it.

First, run dfsutil /root:\\domain.com\share /export:share.txt

Share.txt will look something like

<?xml version="1.0"?>
<Root Name="
\\DOMAIN\Share" State="1" Timeout="300" >
 <Target Server="FAILEDSERVER" Folder="Share" State="2"/>
 <Target Server="GOODSERVER" Folder="Share" State="2"/>
</Root>

To delete the failedserver, and remember this is a last ditch thing, run (on one line)

dfsutil /unmapftroot /root:\\domain\share
/server:failedserver /share:share

You're now close. To make this work, you must have access to the share on a good server. You must also bounce (at least I had to) the DFS Replication service on the good server AND restart the File Server Management Console. However, once done, everything will be good again. Just need to re-introduce the new server once the new disks arrive.

So now you know one reason why it's been a quiet week of blogging!
Cheers,
John.

Comments
  • John,  sorry to hear your dilemma.  I think in the future you ought to have WinPE on hand.  I actually use BartPE; much better.  I have the Windows 2003 I386 / BartPE source files staged on a machine and when I need to create a custom Windows 2003 boot disk (WinPE) with certain drivers, I just add what I need, create CD, and boot the troubled machine.  This saves ton of time versus installing a parallel OS install.  I actually create / integrated a custom BartPE CD with over 30 RAID / SATA / NIC drivers for this type of reason.  You may also include your tools for diagnosis in this CD - instead of hours or even days, troubleshooting - it would have been just minutes.  By the way, its also great for RIS tools and VS migration (P2V).  Just a thought...good luck.

  • Wow sounds like a bit of a disaster there. Did you have backups, or did you lose everything that was on that machine?

    And I agree, BartPE would have been useful.

  • The irony here is that I get the same error message from the command line that was the original problem:

    C:\Users\administrator.CBR>dfsutil /root:\\adam-old\dfs /view

    Could not execute the command successfully

    SYSTEM ERROR - The RPC server is unavailable.

  • Thanks for the info. I used this to kill a namespace that had it's server disappear.

  • I had this problem -- I retired the domain controller that was hosting my DFS namespaces, and didn't move them across.

    In the end, I had to use ADSI Edit to delete the namespaces (under ..., CN=System, CN=Dfs-Configuration).

    Then I had to flush the caches, using

    dfsutil cache domain flush

    dfsutil cache referral flush

    ...then I could recreate the namespaces.

  • Thanks for sharing.

    I was unsuccesfully trying to remove a root target DC after it was demoted without first removing the DC from the root targets.

    Found out by reading your post that the /share parameter of dfsutil /unmapftroot should be the name of the share and not \\servername\share as it is listed in the Help and Support Center and the TechNet website and other places.

  • I too have had a simular problem, in trying your solution I receive the following:

    dfsutil /UnmapFtRoot /Root:\\domain.local\namespace /Server:deadserver /Share:(Share to be removed) reports system error 1169 "There was no match for the specified key in the index"

    Anyone else seen this error, I can't find any inforamation on it.

    Thanks

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment