Brian Puhl's Weblog

These postings are provided "AS IS" with no warranties, and confer no rights...WHEW...glad we got that over with, let's get to the good stuff now...

Blogs

What to do with FSMO roles...

  • Comments 7
  • Likes

We recently hired a new engineer to a team which manages some of the internal MS environments...  We were discussing FSMO role placement and he sent me mail (snippet below slightly edited) which I thought was interesting...

The reason why we separated the roles at my last company was due to the FSMO role seizure process. You are correct, although the server is still a single point of failure, we can mitigate this single point of failure by placing the forest roles on one box and the domain roles on another. In the event that we unexpectedly lose a DC that is either a forest or domain FSMO role holder, the process of seizing the roles is minimized (less roles to seize). Also, it had been our experience that the forest roles aren't really used that often. You are correct, FSMO roles are still a single point of failure, however, unless we really need to perform any forest related “stuff”, the single point of failure (from a forest FSMO perspective) is a non-issue. This is not the case with the domain FSMO roles, specifically the PDCE. At my last company, we felt that due to the PDCE functions it was necessary to place the domain FSMO roles on a separate box...

I wanted to share this, because it reminded me of a FSMO related interview question which I've used in some variation or another:

Suppose you're paged in the middle of the night and told that one of the 150 domain controllers in your single domain forest crashed.  You're first thought is likely "So what, I'll deal with it in the morning." but then you remember it's the one holding all 5 FSMO roles.  If you could only pick one FSMO role to sieze, which one would let you go back to sleep without worrying about the next day?

There are many people that I've asked this question to...the large majority of who answered, "The Schema Master, because without the schema the AD can't function."...  Hopefully they aren't reading this blog from their whichever other job they landed...

So back to the the whole FSMO single-point of failure and redundancy thing...

I figured there were 2 possible reasons that they arrived at the idea that seperating FSMO roles based on forest/domain division was logical:

  1. There was some sort of fault tolerance between FSMO roles which could be preserved in a failure
  2. There was some urgency (specifically user impact) to getting a role holder back online immediately should a failure occur

The first reason is obviously false.  FSMO is the "Flexible Single Master Operations" with the emphasis on "single"...the whole point of these roles was that even though Active Directory is a distributed system, there were just some things that could only be done in one place at a time.  So let's just take the generally accepted knowledge that each FSMO role provides specific functionality which only exists in that role.

The second reason takes a bit more thought, but what really happens when a FSMO role holder fails?  Looking at each role, what the impact of it being offline is, and the urgency:

Schema Master – Schema updates are not available – These are generally planned changes, and the first step when doing a schema change is normally something like "make sure your environment is healthy".  There isn't any urgency if the schema master fails, having it offline is largely irrelevant until you want to make a schema change.

Domain Naming Master – No new domains or application partitions can be added – This sort of falls into the same "healthy environment" bucket as the schema master.  I don't know of anyone who has just randomly decided to add a new domain to a forest without much thought or planning...of course, then again, I don't know all that many people either...  You might wonder why I mentioned app partitions there as well...personal experience.  When we upgraded the first DC to a beta Server 2003 OS which included the code to create the DNS application partitions, we couldn't figure why they weren't instantiated...until we realized that the server hosting the DNM was offline (being upgraded) at the same time.  Sure enough, it came online and there they were...  But I've never said we were perfect here...

Infrastructure Master – No cross domain updates, can't run any domain preps – Domain preps are planned (again)...But no cross-domain updates.  Hmmm...that could be important if you have a multi-domain environment with a lot of changes occurring...but wait...the IM tasks are throttled to run over a 2 day period (by default), so how much urgency does that really imply?  I guess you'd have to call it as you see it in your environment but it's probably not 3am urgency...for my buddy the new engineer, he's only working in single domain forests anyways, so urgency = zero.

RID Master – New RID pools unable to be issued to DC's – This gets a bit more complicated, but let me see if I can make it easy.  Every DC is initially issued 500 RID's.  When it gets down to 50% (250) it requests a second pool of RID's from the RID master.  So when the RID master goes offline, every DC has anywhere between 250 and 750 RIDs available (depending on whether it's hit 50% and received the new pool).  So the urgency question is how long will it take your environment to exhaust the RIDs on a given DC?  My guess is that in most environments, this isn't that urgent either.  Oh yeah, and don't forget that if you do seize the RID master during a failure...that's an automatic flatten and rebuild of the server, you can't bring it back online.

PDC – Time, logins, pw changes, trusts – So we made it to the bottom of the list, and by this point you've figured that the PDC has to be the most urgent FSMO role holder to get back online...the rest of them can be offline for varying amounts of time with no impact at all...so what about this one?  Yes, you should get the PDC back online whenever you can but it's not even something that I'd jump out of bed to do...let's call it the "first thing in the morning" list.  Time synch's are important, but w32time does a pretty good job and nothings going to diverge between today and tomorrow enough to impact you...users may see funky behavior if they changed their password, but replication will probably have completed before they call the help desk so nothing to worry about, and trust go back to that whole "healthy forest" thing again...  The biggest impact we see internally at Microsoft from the PDC being offline are all of the applications which were written in NT4.0 timeframe that are biased towards it.  Now that's something to consider.

So when it really comes down to it, is there any benefit to seperating the forest and domain roles onto seperate servers?  Probably not...is there any harm in it?  Nahh...let's just chalk it up to "operational preference" since the guys who are watching this stuff day to day need to be comfortable with the way the environments are configured.

Pop Quiz Time:

Raise your hand, if when your phone rings in the middle of the night and you get that call...you transfer the PDC role and go back to sleep...

...

...

now keep your hand in the air if you reconfigured the server that you transferred the role to, to also be authoritative for time?  I think I found a topic for my next blog...

If I don't see you before then, Merry Christmas, Happy Holidays...or like that commercial says, Merry Chrismahanakwanzaka.

Comments
  • Now the way you told me your interview question, it involved the server crashing at 6:30pm when you had a 7pm dinner-date with a super-model... when did your blog get all politically correct?

  • Nice article, Brian. In reference to the whole Schema master role - I'm HUGELY in favor of making that role completely and totally "Install when needed, de-install when done".

    Make no mistake - this would have saved me many cross-forest migration hours, as the folks that couldn't have figured out how to install the role were typically the same folks who fire-bombed their schema.

  • Well, since I got dinged for competitive urgency on my last review (whatever the hell that is), I guess for me the right answer is never do I get to go back to sleep. But then I don't have 150 DCs either.

    Seriously, nice article Brian. I asked my team the question, and every one said PDCE. Makes me feel all warm inside :-)

  • I think the person who wrote you the email argument kind of falls down based on its own info. If the forest roles aren't that important so what if they are on the box with the domain roles and that box fails. If looking at critical work load, it doesn't do anything to increase it because as he and you indicated, you can leave it for a while, possibly even a long while.

    I have never been a fan of spreading out the roles, when you put them together you don't have an "all your eggs in one basket" situation like some people like to say because, IMO, they aren't all eggs. Or maybe they are all eggs but not all raw eggs, some are hardboiled and some are plastic. Spreading out the roles just means you have to keep track of them all over the place. If you have them all on one server (with maybe IM as exception for GC reasons), you know right where they are.

    In a large org, it is generally a good idea to keep RID and PDC together especially if you have legacy systems or, and this is important, newer systems that operate like legacy systems because that is how the developers knew how to write the code. I have seen apps that used LDAP but used PDC Location API calls to find the PDC to contact for the LDAP calls. They had no idea that there were other better ways of locating domain resources because they did a simple upgrade of their old NT4 app and tried to change as little as possible.

    When I did ops and we ran AD just for NOS, the PDC was the only DC I would run for. Fortune 5 company I was lead for had too many things that used PDC based functionality. Once we added Exchange 2K, I would run for PDCs and all DCs in the Exchange sites.

    On the time point... Obviously that is just for the forest root domain but I always set all DCs that could potentially become the PDC for the root domain to use the same AD-external time source. One less thing to think about when moving the role.

  • Alright, AD oldtimers quiz. Do you remember what FSMO _originally_ stood for? And why it was so whacked dev had to change it?

  • Floating Single Master Operations was the original name.  I don't know the specifics as to why it was changed but I think it is pretty obvious.  Don't really want them to float -- rather have them flexible.

  • PingBack from http://jeftek.wordpress.com/2006/09/10/fsmo-roles-if-you-could-only-save-one-which-would-it-be/