Blog - Title

September, 2011

  • Accelerating Your IT Career

    Your career isn’t win or lose anymore, it is win or die. The days of guaranteed work, pensions, and sticking with one company for fifty years are gone. Success has returned to something Cro-Magnon man would recognize: if you’re good at what you do, you get to eat.

    I recently spoke to university graduates about their future as new Microsoft engineers. For the first time, that meant organizing my beliefs. It distills four simple pillars: discipline, technical powerhouse, communication, and legacy. In the tradition of Eric Brechner and in honor of Labor Day, I’d like to share my philosophy.

    Discipline

    Learn constantly, not just when life is forcing you. Read every trustworthy article you can get your hands on, before you need to know it; the time to learn AD replication is not when its failure blocks your schema upgrade. Understanding architecture is the key to deploying and troubleshooting any complex system. If you get nothing else from this post, remember that statement - it can alter your life. For Directory Services, start here.

    Don’t be good at one thing - be amazing at a few things, and good at the rest. We all know someone who's the expert on X. He guards X jealously, making sure he is "indispensable.” Notice how he’s always in a lousy mood: he's not allowing anyone to relieve his boredom and he lives in fear that if anyone does, he'll be replaced. Learn several components inside and out. When you get jaded, move on to a few more and give someone else a turn. You'll still be the expert while they learn and if things get gnarly, you can still save the day. Over time, you become remarkable in many areas. Keep your skills up on the rest so that you can pinch hit when needed. Surround yourself with smart people and absorb their knowledge.

    Admit your mistakes. The only thing worse than making a mistake is trying to cover it up. Eventually, everyone is caught or falls under a career-limiting cloud of suspicion. Now colleagues will remember times they trusted you, and won’t make that "mistake" again. Plead guilty and start serving community service, where you help the team fix the glitch.

    Get a grip. It's never as bad as you think. Losing your composure costs you concentration and brainpower. Remaining emotional and depressed makes you a poor engineer, and a lousy person to be around to boot. Learn how to relax so you can get back to business.

    Never surrender. Your career path is a 45-degree angle leading up to infinity, not an arc - arcs come back down! Keep learning, keep practicing, keep refreshing, keep growing. Keep a journal of "I don't know" topics, and then revisit it weekly to see what you've learned. IT makes this easy: it's the most dynamic industry ever created. In my experience, the Peter Principle is usually a self-induced condition and not the true limit of the individual.

    Technical Powerhouse

    Figure out what makes you remember long term. There is a heck-of-a-lot to know when dealing with complex distributed systems - you can't always stop to look things up. Find a recall technique that works for you and practice it religiously. You’re not cramming for a test; you’re building a library in your brain to serve you for fifty years. No amount of learning will help if you can’t put it to good use.

    Be able to repro anything. When I first came to Microsoft, people had fifteen computers at their desk. Thanks to free virtualization, that nonsense is over and you can run as many test environments as you need, all on one PC. "Oh, but Ned, those virtual machines will cost a fortune!" Gimme a break, it’s walking-around money. A lab pays for itself a thousand times every year, thanks to the rewards of your knowledge and time. It's the best investment you can make. Study and memory are powered by experience.

    Know your dependencies. What does the File Replication Service need to work? DNS, LDAP, Kerberos, RPC. What about AD replication? DNS, LDAP, Kerberos, RPC. Interactive user logon? DNS, LDAP, Kerberos, RPC. Windows developers tend to stick with trusted protocols. If you learn the common building blocks of one component, you become good at many other components. That means you can troubleshoot, design, teach, and recognize risks to them all.

    Understand network captures. It's hard to find an IT system talking only to itself. Notepad, maybe (until you save a file to a network share). There are many free network capture tools out there, and they all have their place. Network analysis is often the only way to know how something works between computers, especially when logging and error messages stink - and they usually do. I'd estimate that network analysis solves a quarter of cases worked in my group. Learn by exploring controlled, working scenarios; the differences become simple to spot in failure captures. Your lab is the key.

    Learn at least one scripting language. PowerShell, CMD, VBS, KiXtart, Perl, Python, WinBatch, etc. – any is fine. Show me an IT pro who cannot script and I'll show you one that grinds too many hours and doesn't get the bonus. Besides making your life easier, scripting may save your business someday and therefore, your career. An introductory programming course often helps, as they teach fundamental computer science and logic that applies to all languages. This also makes dependencies easier to grasp.

    Learn how to search and more importantly, how to judge the results. You can't know everything, and that means looking for help. Most people on the Internet are spewing uninformed nonsense, and you must figure out how to filter them. A vendor is probably trustworthy, but only when talking about their own product. TechNet and KB trump random blogs. Stay skeptical with un-moderated message boards and "enthusiast" websites. Naturally, search results from AskDS are to be trusted implicitly. ;-P

    Communication

    Learn how to converse. I don’t mean talk, I mean converse. This is the trickiest of all my advice: how to be both interesting and interested. The hermit geek in the boiler room - that guy does not get promotions, bonuses, or interesting projects. He doesn't gel with a team. He can't explain his plans or convince anyone to proceed with them. He can't even fill the dead air of waiting… and IT troubleshooting is a lot of waiting. Introverts don’t get the opportunities of extroverts. If I could learn to suppress my fear of heights, you can learn to chat.

    Get comfortable teaching. IT is education. You’re instructing business units in the benefits and behavior of software. You're schooling upper management why they should buy new systems or what you did to fix a broken one. You're coaching your colleagues on network configuration, especially if you don’t want to be stuck maintaining them forever. If you can learn to teach effortlessly and likably, a new aspect to your career opens up. Moreover, there's a tremendous side effect: teaching forces you to learn.

    Learn to like an audience. As you rise in IT, the more often you find yourself speaking to larger groups. Over time they become upper management or experienced peers; an intimidating mix. If you let anxiety or poor skills get in the way, your career will stall. Arm yourself with technique and get out in front of people often. It's easier with practice. Do you think Mark Russinovich gets that fat paycheck for his immaculate hair?

    Project positive. Confidence is highly contagious. When the bullets are flying, people want to follow the guy with the plan and the grin. Even if deep down he's quivering with fear, it doesn’t show and he charges forward, knowing that everyone is behind him. People want to be alongside him when the general hands out medals. Self-assurance spreads throughout an organization and you'll be rewarded for it your whole career. Often by managers who "just can't put their finger" on why they like you.

    Be dominant without domineering. One of the hardest things to teach new employees in Microsoft Support is how to control a conference call. You’re on the phone with a half dozen scared customers, bad ideas are flying everywhere, and managers are interrupting for “status updates”. You can’t be rude; you have to herd the cats gently but decisively. Concentration and firmness are paramount. Not backing down comes with confidence. Steering the useless off to harmless tasks lets you focus (making them think the task is important is the sign of an artist). There's no reason to yell or demand; if you sound decisive and have a plan, everyone will get out of the way. They crave your leadership.

    Legacy

    Share everything. Remember "the expert?" He's on a desert island but doesn’t signal passing ships. Share what you learn with your colleagues. Start your own internal company knowledgebase then fill it. Have gab sessions, where you go over interesting topics you learned that week. Talk shop at lunch. Find a reason to hang out with other teams. Set up triages where everyone takes turn teaching the IT department. Not only do you grow relationships, you're leading and following; everyone is improving, and the team is stronger. A tight team won't crumble under pressure later, and that's good for you.

    Did you ever exist? Invent something. Create documentation, construct training, write scripts, and design new distributed systems. Don’t just consume and maintain - build. When the fifty years have passed, leave some proof that you were on this earth. If a project comes down the pipe, volunteer - then go beyond its vision. If no projects are coming, conceive them yourself and push them through. The world is waiting for you to make your mark.

    I used many synonyms in this post, but not once did I say “job.” Jobs end at quitting time. A career is something that wakes you up at midnight with a solution. I can’t guarantee success with these approaches, but they've kept me happy with my IT career for 15 years. I hope they help with yours.

    Ned "good luck, we're all counting on you" Pyle

  • Managing RID Pool Depletion

    Hiya folks, Ned here again. When interviewing a potential support engineer at Microsoft, we usually start with a softball question like “what are the five FSMO roles?” Everyone nails that. Then we ask what each role does. Their face scrunches a bit and they get less assured. “The RID Master… hands out RIDs.” Ok, what are RIDs? “Ehh…”

    That’s trouble, and not just for the interview. Poor understanding of the RID Master prevents you from adding new users, computers, and groups, which can disrupt your business. Uncontrolled RID creation forces you to abandon your domain, which will cost you serious money.

    Today, I discuss how to protect your company from uncontrolled RID pool depletion and keep your domain bustling for decades to come.

    Background

    Relative Identifiers (RID) are the incremental portion of a domain Security Identifier (SID). For instance:

    S-1-5-21-1004336348-1177238915-682003330-2100

    ==>

    S-1-5-Domain Identifier-Relative Identifier

    A SID represents a unique trustee, also known as a "security principal" – typically users, groups, and computers – that Windows uses for access control. Without a matching SID in an access control list, you cannot access a resource or prove your identity. It’s the lynchpin.

    Every domain has a RID Master: a domain controller that hands each DC a pool of 500 RIDs at a time. A domain contains a single RID pool which generates roughly one billion SIDs (because of a 30-bit length, it’s 230 or 1,073,741,823 RIDs). Once issued, RIDs are never reused. You can’t reclaim RIDs after you delete security principals either, as that would lead to unintended access to resources that contained previously issued SIDs.

    Anytime you create a writable DC, it gets 500 new RIDs from the RID Master. Meaning, if you promote 10 domain controllers, you’ve issued 5000 new RIDs. If 8 of those DCs are demoted, then promoted back up, you have now issued 9000 RIDs. If you restore a system state backup onto one of those DCs, you’ve issued 9500 RIDs. The balance of any existing RIDs issued to a DC is never saved – once issued they’re gone forever, even if they aren’t used to create any users. A DC requests more RIDs when it gets low, not just when it is out, so when it grabs another 500 that becomes part of its "standby" pool. When the current pool is empty, the DC switches to the standby pool. Repeat until doomsday.

    Adding more trustees means issuing more blocks of RIDs. When you’ve issued the one billion RIDs, that’s it – your domain cannot create users, groups, computers, or trusts. Your RID Master logs event 16644The maximum domain account identifier value has been reached.” Time for a support case.

    You’re now saying something like, “One billion RIDs? Pffft. I only have a thousand users and we only add fifty a year. My domain is safe.” Maybe. Consider all the normal ways you “issue” RIDs:

    • Creating users, computers, and groups (both Security and email Distribution) as part of normal business operations.
    • The promotion of new DCs.
    • DCs gracefully demoted costs the remaining RID pool.
    • System state restore on a DC invalidates the local RID pool.
    • Active Directory domains upgraded from NT 4.0 inherit all the RIDs from that old environment.
    • Seizing the RID Master FSMO role to another server

    Now study the abnormal ways RIDs are wasted:

    • Provisioning systems or admin scripts that accidentally bulk create users, groups, and computers.
    • Attempting to create enabled users that do not meet password requirements
    • DCs turned off longer than tombstone lifetime.
    • DC metadata cleaned.
    • Forest recovery.
    • The InvalidateRidPool operation.
    • Increasing the RID Block Size registry value.

    The normal operations are out of your control and unlikely to cause problems even in the biggest environments. For example, even though Microsoft’s Redmond AD dates to 1999 and holds the vast majority of our resources, it has only consumed ~8 million RIDs - that's 0.7%. In contrast, some of the abnormal operations can lead to squandered RIDs or even deplete the pool altogether, forcing you to migrate to a new domain or recover your forest. We’ll talk more about them later; regardless of how you are using RIDs, the key to avoiding a problem is observation.

    Monitoring

    You now have a new job, IT professional: monitoring your RID usage and ensuring it stays within expected patterns. KB305475 describes the attributes for both the RID Master and the individual DCs. I recommend giving it a read, as the data storage requires conversion for human consumption.

    Monitoring the RID Master in each domain is adequate and we offer a simple command-line tool I’ve discussed beforeDCDIAG.EXE. Part of Windows Server 2008+ or a free download for 2003, it has a simple test that shows the translated number of allocated RIDs called rIDAvailablePool:

    Dcdiag.exe /test:ridmanager /v

    For example, my RID Master has issued 3100 RIDs to my DCs and itself:

    clip_image001
    image

    If you just want the good bit, perhaps for batching:

    Dcdiag.exe /TEST:RidManager /v | find /i "Available RID Pool for the Domain"

    For PowerShell, here is a slightly modified version of Brad Rutkowski's original sample function. It converts the high and low parts of riDAvailablePool into readable values:

    function Get-RIDsRemaining   

    {

        param ($domainDN)

        $de = [ADSI]"LDAP://CN=RID Manager$,CN=System,$domainDN"

        $return = new-object system.DirectoryServices.DirectorySearcher($de)

        $property= ($return.FindOne()).properties.ridavailablepool

        [int32]$totalSIDS = $($property) / ([math]::Pow(2,32))

        [int64]$temp64val = $totalSIDS * ([math]::Pow(2,32))

        [int32]$currentRIDPoolCount = $($property) - $temp64val

        $ridsremaining = $totalSIDS - $currentRIDPoolCount

        Write-Host "RIDs issued: $currentRIDPoolCount"

        Write-Host "RIDs remaining: $ridsremaining"

    }

    image

    Another sample, if you want to use the Active Directory PowerShell module and target the RID Master directly:

    function Get-RIDsremainingAdPsh

    {

        param ($domainDN)

        $property = get-adobject "cn=rid manager$,cn=system,$domainDN" -property ridavailablepool -server ((Get-ADDomain $domaindn).RidMaster)

        $rid = $property.ridavailablepool   

        [int32]$totalSIDS = $($rid) / ([math]::Pow(2,32))

        [int64]$temp64val = $totalSIDS * ([math]::Pow(2,32))

        [int32]$currentRIDPoolCount = $($rid) - $temp64val

        $ridsremaining = $totalSIDS - $currentRIDPoolCount

        Write-Host "RIDs issued: $currentRIDPoolCount"

        Write-Host "RIDs remaining: $ridsremaining"

    }

    image

    Turn one of those PowerShell samples into a script that runs as a scheduled task that updates a log every morning and alerts you to review it. You can also use LDP.EXE to convert the RID pool values manually every day, if you are an insane person.

    You should also consider monitoring the RID Block Size, as any increase exhausts your global RID pool faster. Object Access Auditing can help here. There are legitimate reasons to increase this value on certain DCs. For example, if you are the US Marine Corps and your DCs are in a warzone where they may not be able to talk to the RID Master for weeks. Be smart about picking values - you are unlikely to need five million RIDs before talking to the master again; when the DC comes home, lower the value back to default.

    The critical review points are:

    1. You don’t see an unexpected rise in RID issuance.
    2. You aren’t close to running out of RIDs.

    Let’s explore what might be consuming RIDs unexpectedly.

    Diagnosis

    If you see a large increase in RID allocation, the first step is finding what was created and when. As always, my examples are PowerShell. You can find plenty of others using VBS, free tools, and whatnot on the Internet.

    You need to return all users, computers, and groups in the domain – even if deleted. You need the SAM account name, creation date, SID, and USN of each trustee. There are going to be a lot of these, so filter the returned properties to save time and export to a CSV file for sorting and filtering in Excel. Here’s a sample (it’s one wrapped line):

    Get-ADObject -Filter 'objectclass -eq "user" -or objectclass -eq "computer" -or objectclass -eq "group"' -properties objectclass,samaccountname,whencreated,objectsid,uSNCreated -includeDeletedObjects | select-object objectclass,samaccountname,whencreated,objectsid,uSNCreated | Export-CSV riduse.csv -NoTypeInformation -Encoding UTF8

    Here I ran the command, then opened in Excel and sorted by newest to oldest:

    image
    Errrp, looks like another episode of “scripts gone wild”…

    Now it’s noodle time:

    • Does the user count match actual + previous user counts (or at least in the ballpark)?
    • Are there sudden, massive blocks of object creation?
    • Is someone creating and deleting objects constantly – or was it just once and you need to examine your audit logs to see who isn’t admitting it?
    • Has your user provisioning system gone berserk (or run by someone who needs… coaching)?
    • Have you changed your password policy and are now trying to create enabled users that do not meet password requirements (this uses up a RID during each failed creation attempt).
    • Do you use a VDI system that constantly creates and deletes computer accounts when provisioning virtual machines - we’ve seen those too: in one case, a third party solution was burning 4 million computer RIDs a month.

    If the RID allocations are growing massively, but you don’t see a subsequent increase in new trustees, it’s likely someone increased RID Block Size inappropriately. Perhaps they set hexadecimal rather than decimal values – instead of the intended 15,000 RIDs per allocation, for example, you’d end up with 86,016!

    It may also be useful to know where the updates are coming from. Examine each DC’s RidAllocationPool for increases to see if something is running on - or pointed at – a specific domain controller.

    Recovery

    You know there’s a problem. The next step is to stop things getting worse (as you have no way to undo the damage without recovering the entire forest).

    If you identified the cause of the RID exhaustion, stop it immediately; your domain’s health is more important. If that system continues in high enough volume, it’s going to force you to abandon your domain.

    If you can’t find the cause and you are anywhere near the end of your one billion RIDs, get a system state backup on the RID Master immediately. Then transfer the RID Master role to a non-essential DC that you shut down to prevent further issuance. The allocated RID pools on your DCs will run out, but that stops further damage. This gives you breathing space to find the bad guy. The downside is that legitimate trustee creation stops also. If you don’t already have a Forest Recovery process in place, you had better get one going. If you cannot figure out what's happening, open a support case with us immediately.

    No matter what, you cannot let the RID pool run out. If you see:

    • SAM Event 16644
    • riDAvailablePool is “4611686015206162431
    • DCDIAG “Available RID Pool for the Domain is 1073741823 of 1073741823

    ... it is too late. Like having a smoke detector that only goes off when the house has burned down. Now you cannot create a trust for a migration to another domain. If you reach that stage, open a support case with us immediately. This is one of those “your job depends on it” issues, so don’t try to be a lone gunfighter.

    Many thanks to Arren “cowboy killer” Connor for his tireless efforts and excellent internal docs around this scenario.

    Finally, a tip: know all the FSMO roles before you interview with us. If you really want to impress, know that the PDC Emulator does more than just “emulate a PDC”. Oy vey.

     

    UPDATE 11/14/2011:

    Our seeds to improve the RID Master have begun growing and here's the first ripe fruit - http://support.microsoft.com/kb/2618669

     

     

    Until next time.

    Ned “you can’t get RID of me that easily” Pyle

  • Windows 8 for the IT Pro: The New Plumbing

    Hi folks, Ned coming to you from the secret underground redoubt, where the cable is out, the wife is at grad school, and the dogs are napping as autumn finally reaches North Carolina.

    image

    I’m not a fan of blog posts that only aggregate links and don’t offer original thought. Today I make an exception, as the first official bits of Windows 8 have hit the street. Like all Windows pre-releases, you notice two immediate problems:

    1. The consumer content overwhelms the IT Professional content.
    2. The Internet is a public toilet of misunderstanding, opinions masquerading as facts, and general ignorance.

    Nothing wrong with the first point; we’re outnumbered at least a thousand to one, so it’s natural for advertising to target the majority. The second point I can’t abide by; I despise misinformation.

    Nothing has changed with my NDA - I cannot discuss Windows 8 in detail, speak of the future, or otherwise get myself fired. Nevertheless, I can point you to accurate content that’s useful to an IT Professional craving more than just the new touchscreen shell for tablets. My links talk a little Windows Server and show features that Mom won’t be using.

    So, in vague order and with no regard to the features being Directory Services or not, here are the goods. Some are movies and PowerPoint slides, some are text. Some are Microsoft and some are not. Many are buried in the //Build site. I added some exposition to each link so I don’t feel so dirty.

    Enjoy, it’s going to be a busy decade.

    Intro (good for basic familiarity)

    Security & Active Directory

    Interestingly, no mainstream websites have discovered many of the AD changes visible in the server preview build, or at least, not written about them. Aha! Here they come, thanks for the tip Sean:

    Virtualization, Networking, & High Availability

    Deployment & Performance

    Remember, everything is subject to change and refers only to the Developer Preview release from the //Build conference; Windows 8 isn’t even in beta yet. Grab the client or server and see for yourself.

    And no matter what link you click, I don’t recommend reading the comments. See point 2.

    image
    Where do you want me to put this Internet?

    Ned “bowl o’ links” Pyle

  • Friday Mail Sack: Super Slo-Mo Edition

    Hello folks, Ned here again with another Mail Sack. Before I get rolling though, a quick public service announcement:

    Plenty of you have downloaded the Windows 8 Developer Preview and are knee-deep in the new goo. We really want your feedback, so if you have comments, please use one of the following avenues:

    I recommend sticking to IT Pro features; the consumer side’s covered and the biggest value is your Administrator experience. The NDA is not off - I still cannot comment on the future of Windows 8 or tell you if we already have plans to do X with Y. This is a one-way channel from you to us (to the developers).

    Cool? On to the sack. This week we discuss:

    Shake it.

    Question

    We were chatting here about password synchronization tools that capture password changes on a DC and send the clear text password to some third party app. I consider that a security risk...but then someone asked me how the password is transmitted between a domain member workstation and a domain controller when the user performs a normal password change operation (CTRL+ALT+DEL and Change Password). I suppose the client uses some RPC connection, but it would be great if you could point me to a reference.

    Answer

    Windows can change passwords many ways - it depends on the OS and the component in question.

    1. For the specific case of using CTRL+ALT+DEL because your password has expired or you just felt like changing your password:

    If you are using a modern OS like Windows 7 with AD, the computer uses the Kerberos protocol end to end. This starts with a normal AS_REQ logon, but to a special service principal name of kadmin/changepw, as described in http://www.ietf.org/rfc/rfc3244.txt.

    The computer first contacts a KDC over port 88, then communicates over port 464 to send along the special AP_REQ and AP_REP. You are still using Kerberos cryptography and sending an encrypted payload containing a KRB_PRIV message with the password. Therefore, to get to the password, you have to defeat Kerberos cryptography itself, which means defeating the crypto and defeating the key derived from the cryptographic hash of the user's original password. Which has never happened in the history of Kerberos.

    image

    The parsing of this kpasswd traffic is currently broken in NetMon's latest public parsers, but even when you parse it in WireShark, all you can see is the encryption type and a payload of encrypted goo. For example, here is that Windows 7 client talking to a Windows Server 2008 R2 DC, which means AES-256:

    image
    Aka: Insane-O-Cryption ™

    On the other hand, if using a crusty OS like Windows XP, you end up using a legacy password mechanism that worked with NT 4.0 – in this case SamrUnicodeChangePasswordUser2 (http://msdn.microsoft.com/en-us/library/cc245708(v=PROT.10).aspx).

    XP also supports the Kerberos change mechanism, but by default uses NTLM with CTRL+ALT+DEL password changes. Witness:

    image

    This uses “RPC over SMB with Named Pipes” with RPC packet privacy. You are using NTLM v2 by default (unless you set LMCompatibility unwisely) and you are still double-protected (the payload and packets), which makes it relatively safe. Definitely not as safe as Win7 though – just another reason to move forward.

    image

    You can disable NTLM in the domain if you have Win2008 R2 DCs and XP is smart enough to switch to using Kerberos here:

    image

    ... but you are likely to break many other apps. Better to get rid of Windows XP.

    2. A lot of administrative code use SamrSetInformationUser2, which does not require knowing the user’s current password (http://msdn.microsoft.com/en-us/library/cc245793(v=PROT.10).aspx). For example, when you use NET USER to change a domain user’s password:

    image

    This invokes SamrSetInformationUser2 to set Internal4InformationNew data:

    image

    So, doubly-protected (a cryptographically generated, key signed hash covered by an encrypted payload). This is also “RPC over SMB using Named Pipes”

    image

    The crypto for the encrypted payload is derived from a key signed using the underlying authentication protocol, seen from a previous session setup frame (negotiated as Kerberos in this case):

    image

    3. The legacy mechanisms to change a user password are NetUserChangePassword (http://msdn.microsoft.com/en-us/library/windows/desktop/aa370650(v=vs.85).aspx) and IADsUser::ChangePassword (http://msdn.microsoft.com/en-us/library/windows/desktop/aa746341(v=vs.85).aspx)

    4. A local user password change usually involves SamrUnicodeChangePasswordUser2, SamrChangePasswordUser, or SamrOemChangePasswordUser2 (http://msdn.microsoft.com/en-us/library/cc245705(v=PROT.10).aspx).

    There are other ways but those are mostly corner-case.

    Note: In my examples, I am using the most up to date Netmon 3.4 parsers from http://nmparsers.codeplex.com/.

    Question

    If I try to remove the AD Domain Services role using ServerManager.msc, it blocks me with this message:

    image

    But if I remove the role using Dism.exe, it lets me continue:

    image

    This completely hoses the DC and it no longer boots normally. Is this a bug?

    And - hypothetically speaking, of course - how would I fix this DC?

    Answer

    Don’t do that. :)

    Not a bug, this is expected behavior. Dism.exe is a pure servicing tool; it knows nothing more of DCs than the Format command does. ServerManager and servermanagercmd.exe are the tools that know what they are doing.
    Update: Although as Artem points out in the comments, we want you to use the Server Manager PowerShell and not servermanagercmd, which is on its way out.

    To fix your server, pick one:

    • Boot it into DS Repair Mode with F8 and restore your system state non-authoritatively from backup (you can also perform a bare metal restore if you have that capability - no functional difference in this case). If you do not have a backup and this is your only DC, update your résumé.
    • Boot it into DS Repair Mode with F8 and use dcpromo /forceremoval to finish what you started. Then perform metadata cleanup. Then go stand in the corner and think about what you did, young man!

    Question

    We are getting Event ID 4740s (account lockout) for the AD Guest account throughout the day, which is raising alerts in our audit system. The Guest account is disabled, expired, and even renamed. Yet various clients keep locking out the account and creating the 4740 event. I believe I've traced it back to the occasional attempt of a local account attempting to authenticate to the domain. Any thoughts?

    Answer

    You'll see that when someone has set a complex password on the Guest account, using NET USER for example, rather than having it be the null default. The clients never know what the guest password is, they always assume it's null like default - so if you set a password on it, they will fail. Fail enough and you lock out (unless you turn that policy off and replace it with intrusion protection detection and two-factor auth). Set it back to null and you should be ok. As you suspected, there a number of times when Guest is used as part of a "well, let's try that" algorithm:

    Network access validation algorithms and examples for Windows Server 2003, Windows XP, and Windows 2000

    To set it back you just use the Reset Password menu in Dsa.msc on the guest account, making sure not to set a password and clicking ok. You may have to adjust your domain password policy temporarily to allow this.

    As for why it's "locking out" even though it's disabled and renamed:

    • It has a well-known SID (S-1-5-21-domain-501) so renaming doesn’t really do anything except tick a checkbox on some auditor's clipboard
    • Disabled accounts can still lock out if you keep sending bad passwords to them. Usually no one notices though, and most people are more concerned about the "account is disabled" message they see first.

    Question

    What are the steps to change the "User Account" password set when the Network Device Enrollment Service (NDES) is installed?

    Answer

    When you first install the Network Device Enrollment Service (NDES), you have the option of setting the identity under which the application pool runs to the default application pool identity or to a specific user account. I assume that you selected the latter. The process to change the password for this user account requires two steps -- with 27 parts (not really…).

      1. First, you must reset the user account's password in Active Directory Users and Computers.

      2. Next, you must change the password configured in the application pool Advanced Settings on the NDES server.

    a. In IIS manager, expand the server name node.

    b. Click on Application Pools.

    c. On the right, locate and highlight the SCEP application pool.

    image

    d. In the Action pane on the right, click on Advanced Settings....

    e. Under Process Model click on Identity, then click on the … button.

    image

    f. In the Application Pool Identity dialog box, select Custom account and then click on Set….

    g. Enter the custom application pool account name, and then set and confirm the password. Click Ok, when finished.

    image

    h. Click Ok, and then click Ok again.

    i. Back on the Application Pools page, verify that SCEP is still highlighted. In the Action pane on the right, click on Recycle….

    j. You are done.

    Normally, you would have to be concerned with simply resetting the password for any service account to which any digital certificates have been assigned. This is because resetting the password can result in the account losing access to the private keys associated with those certificates. In the case of NDES, however, the certificates used by the NDES service are actually stored in the local computer's Personal store and the custom application pool identity only has read access to those keys. Resetting the password of the custom application pool account will have no impact on the master key used to protect the NDES private keys.

    [Courtesy of Jonathan, naturally - Neditor]

    Question

    If I have only one domain in my forest, do I need a Global Catalog? Plenty of documents imply this is the case.

    Answer

    All those documents saying "multi-domain only" are mistaken. You need GCs - even in a single-domain forest - for the following:

    (Update: Correction on single-domain forest logon made, thanks for catching that Yusuf! I also added a few more breakage scenarios)

    • Perversely, if you have enabled IgnoreGCFailures (http://support.microsoft.com/kb/241789); turning it on removes universal groups from the user security token if there is no GC, meaning they will logon but not be able to access resources they accessed fine previously).
    • If your users logon with UPNs and try to change their password (they can still logon in a single domain forest with UPN or NetBiosDomain\SamAccountName style logons).
    • Even if you use Universal Group Membership Caching to avoid the need for a GC in a site, that DC needs a GC to update the cache.
    • MS Exchange is deployed (All versions of Exchange services won't even start without a GC).
    • Using the built-in Find in the shell to search AD for published shares, published DFS links, published printers, or any object picker dialog that provides option "entire directory"  will fail.
    • DPM agent installation will fail.
    • AD Web Services (aka AD Management Gateway) will fail.
    • CRM searches will fail.
    • Probably other third parties of which I'm not aware.

    We stopped recommending that customers use only handfuls of GCs years ago - if you get an ADRAP or call MS support, we will recommend you make all DCs GCs, unless you have an excellent reason not. Our BPA tool states that you should have at least one GC per AD site: http://technet.microsoft.com/en-us/library/dd723676(WS.10).aspx.

    Question

    If I use DFSR to replicate a folder containing symbolic links, will this replicate the source files or the actual symlinks? The DFSR FAQ says symlink replication is supported under certain circumstances.

    Answer

    The symlink replicates; however, the underlying data does not replicate just because there is a symlink. If the data is not stored within the RF, you end up with a replicated symlink to nowhere:

    Server 1, replicating a folder called c:\unfiltersub. Note how the symlink points to a file that is not in the scope of replication:

    image

    Server 2, the symlink has replicated - but naturally, it points to an un-replicated file. Boom:

    image

    If the source data is itself replicated, you’re fine. There’s no real way to guarantee that though, except preventing users from creating files outside the RF by using permissions and FSRM screens. If your end users can only access the data through a share, they are in good shape. I'd imagine they are not the ones creating symlinks though. ;-)

    Question

    I read your post on career development. There are many memory techniques and I know everyone is different, but what do you use?

    [A number of folks asked this question - Neditor]

    Answer

    When I was younger, it just worked - if I was interested in it, I remembered it. As I get older and burn more brain cells though, I find that my best memory techniques are:

    • Periodic skim and refresh. When I have learned something through deep reading and hands on, I try to skim through core topics at least once a year. For example, I force myself to scan the diagrams in the all the Win2003 Technical Reference A-Z sections, and if I can’t remember what the diagram is saying, I make myself read that section in detail. I don’t let myself get too stale on anything and try to jog it often.
    • Mix up the media. When learning a topic, I read, find illustrations, and watch movies and demos. When there are no illustrations, I use Visio to make them for myself based on reading. When there are no movies, I make myself demo the topics. My brain seems to retain more info when I hit it with different styles on the same subject.
    • I teach and publically write about things a lot. Nothing hones your memory like trying to share info with strangers, as the last thing I want is look like a dope. It makes me prepare and check my work carefully, and that natural repetition – rather than forced “read flash cards”-style repetition, really works for me. My brain runs best under pressure.
    • Your body is not a temple (of Gozer worshipers). Something of a cliché, but I gobble vitamins, eat plenty of brain foods, and work out at least 30 minutes every morning.

    I hope this helps and isn’t too general. It’s just what works for me.

    Other Stuff

    Have $150,000 to spend on a camera, a clever director who likes FPS gaming, and some very fit paint ballers? Go make a movie better than this. Watch it multiple times.

    image
    Once for the chat log alone

    Best all-around coverage of the Frankfurt Auto Show here, thanks to Jalopnik.

    image
    Want!

    The supposedly 10 Coolest Death Scenes in Science Fiction History. But any list not including Hudson’s last moments in Aliens is fail.

    If it’s true… holy crap! Ok, maybe it wasn’t true. Wait, HOLY CRAP!

    So many awesome things combined.

    Finally, my new favorite time waster is Retronaut. How can you not like a website with things like “Celebrities as Russian Generals”.

    image
    No, really.

    Have a nice weekend folks,

    - Ned “Oh you want some of this?!?!” Pyle

  • Advanced XML filtering in the Windows Event Viewer

    Hi guys, Joji Oshima here again. Today I want to talk about using Custom Views in the Windows Event Viewer to filter events more effectively. The standard GUI allows some basic filtering, but you have the ability to drill down further to get the most relevant data.
    Starting in Windows Vista/2008, you have the ability to modify the XML query used to generate Custom Views.

    Limitations of basic filtering:

    Basic filtering allows you to display events that meet certain criteria. You can filter by the event level, the source of the event, the Event ID, certain keywords, and the originating user/computer.

    image
    Basic Filter for Event 4663 of the security event logs

    You can choose multiple events that match your criteria as well.

    image
    Basic filter for Event 4660 & 4663 of the security event logs

    A real limitation to this type of filtering is the data inside each event can be very different. 4663 events appear when auditing users accessing objects. You can see the account of the user, and what object they were accessing.

    clip_image001 clip_image002
    Sample 4663 events for users ‘test5’ and ‘test9’

    If you want to see events that are only about user ‘test9’, you need a Custom View and an XML filter.

    Using XML filtering and Custom Views:

    Custom Views using XML filtering are a powerful way to drill through event logs and only display the information you need. With Custom Views, you can filter on data in the event. To create a Custom View based on the username, right click Custom Views in the Event Viewer and choose Create Custom View.

    image

    Click the XML Tab, and check Edit query manually. Click ok to the warning popup. In this window, you can type an XML query. For this example, we want to filter by SubjectUserName, so the XML query is:

          <QueryList>
               <Query Id="0">
                  <Select Path="Security">
                     *[EventData[Data[@Name='SubjectUserName'] and (Data='test9')]]
                   </Select>
               </Query>
          </QueryList>

    image

    After you type in your query, click the Ok button. A new window will ask for a Name & Description for the Custom View. Add a descriptive name and click the Ok button.

    image

    You now have a Custom View for any security events that involve the user test9.

    image

    Take It One Step Further:

    Now that we’ve gone over a simple example, let’s look at the query we are building and what else we can do with it. Using XML, we are building a SELECT statement to pull events that meet the criteria we specify. Using the standard AND/OR Boolean operators, we can expand upon the simple example to pull more events or to refine the list.

    Perhaps you want to monitor two users - test5 and test9 - for any security events. Inside the search query, we can use the Boolean OR operator to include users that have the name test5 or test9.

    The query below searches for any security events that include test5 or test9.

          <QueryList>
               <Query Id="0">
                  <Select Path="Security">
                     *[EventData[Data[@Name='SubjectUserName'] and (Data='test5' or Data=’test9’)]]
                   </Select>
               </Query>
          </QueryList>

    Event Metadata:

    At this point you may be asking, where did you come up with SubjectUserName and what else can I filter on? The easiest way to find this data is to find a specific event, click on the details tab, and then click the XML View radio button.

    image

    From this window, we can see the structure of the Event’s XML metadata. This event has a <System> tag and an <EventData> tag. Each of these data names can be used in the filter and combined using standard Boolean operators.

    With the same view, we can examine the <System> metadata to find additional data names for filtering.

    image

    Now let’s say we are only interested in a specific Event ID involving either of these users. We can incorporate an AND Boolean to filter on the System data.

    The query below looks for 4663 events for user test5 or test9.

          <QueryList>
               <Query Id="0">
                  <Select Path="Security">
                     *[EventData[Data[@Name='SubjectUserName'] and (Data='test5' or Data='test9')]]
                     and
                     *[System[(EventID='4663')]]
                   </Select>
               </Query>
          </QueryList>

    Broader Filtering:

    Say you wanted to filter on events involving test5 but were unsure if it would be in SubjectUserName, TargetUserName, or somewhere else. You don’t need to specify the specific name that the data can be in, but just search that some data in <EventData> contains test5.

    The query below looks for events that any data in <EventData> equals test5.

          <QueryList>
               <Query Id="0">
                  <Select Path="Security">
                     *[EventData[Data and (Data='test5')]]
                  </Select>
               </Query>
          </QueryList>

    Multiple Select Statements:

    You can also have multiple select statements in your query to pull different data in the same log or data in another log. You can specify which log to pull from inside the <select> tag, and have multiple <select> tags in the same <query> tag.

    The example below will pull 4663 events from the security event log and 1704 events from the application event log.

          <QueryList>
               <Query Id="0">
                  <Select Path="Security">*[System[(EventID='4663')]]</Select>
                 <Select Path="Application">*[System[(EventID='1704')]]</Select>
               </Query>
          </QueryList>

    image

    XPath 1.0 Limitations:

    Windows Event Log supports a subset of XPath 1.0. There are limitations to what functions work in the query. For instance, you can use the "position", "Band", and "timediff" functions within the query but other functions like "starts-with" and "contains" are not currently supported.

    Further Reading:

    Create a Custom View
    http://technet.microsoft.com/en-us/library/cc709635.aspx

    Event Queries and Event XML
    http://msdn.microsoft.com/en-us/library/bb399427(v=VS.90).aspx

    Consuming Events (Windows)
    http://msdn.microsoft.com/en-us/library/dd996910(VS.85).aspx

    Conclusion:

    Using Custom Views in the Windows Event Log can be a powerful tool to quickly access relevant information on your system. XPath 1.0 has a learning curve but once you get a handle on the syntax, you will be able to write targeted Custom Views.

    Joji "the sieve" Oshima

    [Check out pseventlogwatcher if you want to combine complex filters with monitoring and automation. It’s made by AskDS superfan Steve Grinker: http://pseventlogwatcher.codeplex.com/ – Neditor]

  • What the heck does /genmigxml do?

    Hello guys and gals, Kim Nichols here with my first AskDS post. While deciding on a title, I did a quick search on the word "heck" on our AskDS blog to see if Ned was going to give me any grief. Apparently, we "heck" a lot around here, so I guess it's all good. :-)

    I'm hoping to shed some light on USMT's /genmigxml switch and uncover the truth behind which XML files must be included for both scanstate and loadstate. I recently had a USMT 4 case where the customer was using the /genmigxml switch during scanstate to generate a custom XML file, mymig.xml. After creating the custom XML, the file was added to the scanstate command via the /i:mymigxml switch along with any other custom XML files. When referencing the file again on loadstate, loadstate failed with errors similar to the following:

    2011-08-01 18:40:50, Info  [0x080000] Current XML stack: <component type="Documents" context="KIMN\test2" defaultSupported="Yes"> "External_UserDocs - KIMN\test2"

    2011-08-01 18:40:50, Error [0x08055d] MXE Agent: Migration XML C:\USMT\amd64\mig.xml is not properly formatted. Message: context attribute has an invalid value.

     

    2011-08-01 18:40:50, Error [0x000000] EngineStartup caught exception: FormatException: context attribute has an invalid value. class UnBCL::ArrayList<class Mig::CMXEXmlComponent *> *__cdecl Mig::CMXEMigrationXml::LoadComponents(class Mig::CPlatform *,class UnBCL::String *,class UnBCL::XmlNode *,class Mig::CMXEMigrationXml *,class Mig::CMXEXmlComponent *,class Mig::CUserContext *)

     

    2011-08-01 18:40:50, Info [0x080000] COutOfProcPluginFactory::FreeSurrogateHost: Shutdown in progress

    From this error, it appears that user KIMN\test2 is invalid for some reason. What is interesting is if that user logs on to the computer prior to running loadstate, loadstate completes successfully. Requiring all users to log on prior to migrating their data is not recommended and can cause issues with application migration.

    I did some research to get a better understanding of the purpose behind the /genmigxml switch and why we hadn't received more calls on this issue. Here's what I found:

    Technet: What's New in USMT 4.0 - http://technet.microsoft.com/en-us/library/dd560752(WS.10).aspx

    This option specifies that the ScanState command should use the document finder to create and export an .xml file that defines how to migrate all of the files found on the computer on which the ScanState command is running. The document finder, or MigXmlHelper.GenerateDocPatterns helper function, can be used to automatically find user documents on a computer without authoring extensive custom migration .xml files.”

    Technet : Best Practices - http://technet.microsoft.com/en-us/library/dd560764(WS.10).aspx

    “You can Utilize the /genmigxml command-line option to determine which files will be included in your migration, and to determine if any modifications are necessary.”

    Technet: Step-by-Step: Basic Windows Migration using USMT for IT Professionals - http://technet.microsoft.com/en-us/library/dd883247(WS.10).aspx

    "In USMT 4.0, the MigXmlHelper.GenerateDocPatterns function can be used to automatically find user documents on a computer without authoring extensive custom migration .xml files. This function is included in the MigDocs.xml sample file downloaded with the Windows AIK. "

    We can use /genmigxml to get an idea of what the migdocs.xml file is going to collect for a specific user. We don't specifically document what you should do with the generated XML besides review it. Logic might lead us to believe that, similar to the /genconfig switch, we should generate this XML file and include it on both our scanstate and our loadstate operations if we want to make modifications to which data is gathered for a specific user. This is where we run into the issue above, though.

    If we take a look inside this XML file, we see a list of locations from which scanstate will collect documents. This list includes the path for each user profile on the computer. Here's a section from mymigxml.xml in my test environment. Notice that this is the same user from my loadstate log file above.

    clip_image002

    So, if including this file generates errors, why use it? The answer is /genmigxml was only intended to provide a sample of what will be migrated using the standard XML files. The XML is machine-specific and not generalized for use on multiple computers. If you need to alter the default behavior of migdocs.xml to exclude or include files/folders for specific users on a specific computer, modify the file generated via /genmigxml for use with scanstate. This file contains user-specific profile paths so don't include it with loadstate.

    But wait… I thought all XML files had to be included in both scanstate and loadstate?

    The actual answer is it depends. In the USMT 4.0 FAQ, we specify including the same XML files for both scanstate and loadstate. However, immediately following that sentence, we state that you don't have to include the Config.xml on loadstate unless you want to exclude some files that were migrated to the store.

    The more complete answer is the default XML files (migapp.xml & migdocs.xml) need to be included in both scanstate and loadstate if you want any of the rerouting rules to apply; for instance, migrating from one version of Office to another. Because migapp.xml & migdocs.xml transform OS and user data to be compatible with a different version of the OS/Office, you must include both files on scanstate and on loadstate.

    As for your custom XML files (aside from the one generated from /genmigxml), these only need to be specified in loadstate if you are rerouting files or excluding files that were migrated to the store from migrating down to the new computer during loadstate.

    To wrap this up, in most cases migdocs.xml migrates everything you need. If you are curious about what will be collected you can run /genmigxml to find out, but the output is computer-specific, you can’t use it without modification.

    - Kim "Boilermaker" Nichols

  • The PDCe with too much to do

    Hi. Mark again. As part of my role in Premier Field Engineering, I’m sometimes called upon to visit customers when they have a critical issue being worked by CTS, needing another set of eyes. For today’s discussion, I’m going to talk you through, one such visit.

    It was a dark and stormy night …

    Well not really – it was mid-afternoon but these sorts of things always have that sense of drama.

    The Problem

    Custom applications were hard coded to use the PDC Emulator (PDCe) for authentication – a strategy the customer later abandoned to eliminate a single point of failure. The issue was hot because the PDCe was not processing authentication requests after a reboot.

    The customer had noticed lsass.exe consuming a lot of CPU and this is where CTS were focusing their efforts.

    The Investigation

    Starting with the Directory Service event logs, I noticed the following:

    Event Type:          Information

    Event Source:        NTDS Replication

    Event Category:      Replication

    Event ID:            1555

    Date:                <Date>

    Time:                <Time>

    User:                NT AUTHORITY\ANONYMOUS LOGON

    Computer:            <Name of PDCe>

    Description:

    The local domain controller will not be advertised by the domain controller locator service as an available domain controller until it has completed an initial synchronization of each writeable directory partition that it holds. At this time, these initial synchronizations have not been completed.

     

    The synchronizations will continue.

     

    also:

    Event Type:          Warning

    Event Source:        NTDS Replication

    Event Category:      Replication

    Event ID:            2094

    Date:                <Date>

    Time:                <Time>

    User:                NT AUTHORITY\ANONYMOUS LOGON

    Computer:            <Name of PDCe>

    Description:

    Performance warning: replication was delayed while applying changes to the following object. If this message occurs frequently, it indicates that the replication is occurring slowly and that the server may have difficulty keeping up with changes.

    Object DN: CN=<ClientName>,OU=Workstations,OU=Machine Accounts,DC=<Domain Name>,DC=com

     

    Object GUID: <GUID>

     

    Partition DN: DC=<Domain Name>,DC=com

     

    Server: <_msdcs DNS record of replication partner>

     

    Elapsed Time (secs): 440

     

     

    User Action

     

    A common reason for seeing this delay is that this object is especially large, either in the size of its values, or in the number of values. You should first consider whether the application can be changed to reduce the amount of data stored on the object, or the number of values.  If this is a large group or distribution list, you might consider raising the forest version to Windows Server 2003, since this will enable replication to work more efficiently. You should evaluate whether the server platform provides sufficient performance in terms of memory and processing power. Finally, you may want to consider tuning the Active Directory database by moving the database and logs to separate disk partitions.

     

    If you wish to change the warning limit, the registry key is included below. A value of zero will disable the check.

     

    Additional Data

     

    Warning Limit (secs): 10

     

    Limit Registry Key: System\CurrentControlSet\Services\NTDS\Parameters\Replicator maximum wait for update object (secs)

     

     

    and:

    Event Type:          Warning

    Event Source:        NTDS General

    Event Category:      Replication

    Event ID:            1079

    Date:                <Date>

    Time:                <Time>

    User:                <SID>

    Computer:            <Name of PDCe>

    Description:

    Internal event: Active Directory could not allocate enough memory to process replication tasks. Replication might be affected until more memory is available.

     

    User Action

    Increase the amount of physical memory or virtual memory and restart this domain controller.

     

     

    In summary, the PDCe hasn’t completed initial synchronisation after a reboot and it’s having memory allocation problems while it works on sorting it out. Initial synchronisation is discussed in:

    Initial synchronization requirements for Windows 2000 Server and Windows Server 2003 operations master role holders
    http://support.microsoft.com/kb/305476

    With this information in hand, I had a chat with the customer hoping we’d identify a relevant change in the environment leading up to the outage. It became apparent they’d configured a policy for deploying RDP session certificates. Furthermore, they’d noticed clients receiving many of these certificates instead of the expected one.

    RDP session certificates are Secure Sockets Layer (SSL) certificates issued to Remote Desktop servers. It is also possible to deploy RDP session certificates to client operating systems such as Windows Vista and Windows 7. More on this later…

    The customer and I examined a sample client and found 285 certificates! In addition to this unusual behaviour, the certificates were being published to Active Directory. There were 3700 affected clients – approx. 1 million certificates published to AD!

    The Story So Far

    We’ve injected huge amounts of certificate data into the userCertificate attribute of computer objects, we’ve got replication backlog due to memory allocation issues and the DC can’t complete an initial sync before advertising itself as a DC.

    What Happened Next Uncle Mark?!

    The CTS engineer back at home base wanted to gather some debug logging of LSASS.exe. While attempting to gather such a log, the PDCe became completely unresponsive and we had to reboot.

    While the PDCe rebooted, the customer disabled the policy responsible for deploying RDP session certificates.

    After the reboot, the PDCe had stopped logging event 1079 (for memory allocation failures) but in addition to event 1555 and 2094, we were now seeing:

    Event Type           Warning

    Event Source:        NTDS Replication

    Event Category:      DS RPC Client

    Event ID:            1188

    Date:                <Date>

    Time:                <Time>

    User:                NT AUTHORITY\ANONYMOUS LOGON

    Computer:            <Name of PDCe >

    Description:

    A thread in Active Directory is waiting for the completion of a RPC made to the following domain controller.

     

    Domain controller:

    <_msdcs DNS record of replication partner>

    Operation:

    get changes

    Thread ID:

    <Thread ID>

    Timeout period (minutes):

    5

     

    Active Directory has attempted to cancel the call and recover this thread.

     

    User Action

    If this condition continues, restart the domain controller.

     

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    A bit more investigation with:

    Repadmin.exe /showreps (or /showrepl for later versions of repadmin)

    told us that all partitions were in sync except the domain partition – the partition with a million certificates attached to computer objects.

    We decided to execute:

    Repadmin.exe /replicate <Name of PDCe> <Closest Replication Partner> <Domain Naming Context> /force

    Next, we waited … for several hours.

    While waiting, we considered:

    • Disabling initial sync with:

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters]

    Repl Perform Initial Synchronizations = 0

    • Increasing the RPC timeout for NTDS with:

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;830746

    Both of these changes require a reboot. The customer was hesitant to reboot again and while they thought it over, initial sync completed.

    With the PDCe authenticating clients, I headed home to get some sleep. The customer had disabled the RDP session certificate deployment policy and was busy clearing the certificate data out of computer objects in Active Directory.

    Why?

    The next day, I went looking for root cause. The customer had followed some guidance to deploy the RDP session certificates. Some of the guidance noted during the investigation is posted here:

    http://blogs.msdn.com/b/rds/archive/2010/04/09/configuring-remote-desktop-certificates.aspx

    I set up a test environment and walked through the guidance. After doing so, I did not experience the issue. I was getting a single certificate no matter how often I would reboot or apply Group Policy. In addition, RDP session certificates were not being published in Active Directory. Publishing in Active Directory is easily explained by this checkbox:

    image

    An examination of the certificate template confirmed they had this checked.

    So why were clients in the customer environment receiving multiple certificates while clients in my test environment received just one?

    The Win

    I noticed the following point in the guidance being followed by the customer:

    image

    A bit of an odd recommendation. Sure enough, the customer’s template had different names for “Template display name” and “Template name”. I changed my test environment to make the same mistake and suddenly I had a repro – a new certificate on every reboot and policy refresh.

    Some research revealed that this was a known issue. One of these fields checks whether an RDP session certificate exists while the other field obtains a new certificate. Giving both fields the same name works around the problem.

    Conclusion

    So in the aftermath of this incident, there are some general recommendation that anyone can take to help avoid this kind of situation.

    • Follow our guidance carefully – even the weird stuff
    • Test before you deploy
    • Deploy the same way as you test
    • Avoid making critical servers more critical than they need to be

    - Mark “Falkor” Renoden

  • Friday Mail Sack: Robert Wagner Edition

    Hello folks, Ned here again. This week, we discuss:

    Things have been a bit quiet this month blog-wise, but we have a dozen new posts in the pipeline and coming your way next week. Some of them from infamous foreigners! It’s all very exciting, keep your RSS reader tuned to this station.

    On to the sackage.

    Question

    As far as I know, each computer name must be unique within single domain. How do domain controllers check this uniqueness? Most applications (ADUC, ADSIEDIT, etc.) displays entry common name that matches computer account name, which may not be unique.

    Answer

    The samaccountname attribute – often referred to as the “pre-Windows 2000 name” - is what needs to be unique, as it’s the real “user ID.” That uniqueness isn’t  enforced by DCs when you create principals. You can create multiple computers, users, or groups with the same samaccountname. Well-written apps like DSA.MSC or DSAC.EXE will block you, but not because they are abiding by a DC’s rules:

    image

    If you use a less polite or more powerful app, a DC will let you create a duplicate samaccountname. At the first logon using that principal though, the DC will notice the duplicate and rename its samaccountname to “$DUPLICATE-<something>”.

    If you want to see this for yourself:

    1. Configure AD Recycle Bin in your lab.

    2. Create a user and then delete it.

    3. Recreate the user manually in another OU (same name, samaccountname, UPN – just in a different location).

    4. Restore the deleted user to its previous location using the recycle bin.

    5. Note how the identical users exist and have an identical samaccountname.

    6. Logon as that user and the restored user will have its samaccountname mangled with $DUPLICATE.

    The “name” of the object is unique because it has to form a distinguished name, so you get that free thanks to LDAP. Only samaccountname and UPN will allow duplicates. And obviously, while I can create two computers with the same name in different OUs of the same domain, DNS is not going to be pleased and name resolution isn’t going to work – so this is all rather moot.

    Question

    When you were testing DFSR performance, what size file did you use for this statement?

    • Turn off RDC on fast connections with mostly smaller files - later testing (not addressed in the chart below) showed 3-4 times faster replication when using LAN-speed networks (i.e. 1GBb or faster) on Win2008 R2. This is because it was faster to send files in their totality than send deltas, when the files were smaller and more dynamic and the network was incredibly fast. The performance improvements were roughly twice as fast on Win2008 non-R2. This should absolutely not be done on WAN networks under 100 Mbit though as it will likely have a very negative affect.

    Answer

    97,892 files in 32,772,081,549 total bytes for an average file size of 334,777 bytes. That test used a variety of real-world files, so there was no specific size, nor were they automatically generated with random contents like some of the tests.

    Question

    When using AD Users and Computers, what is the difference for unlocking between this:

    image

    And this:

    image

    Answer

    The first one is sort of a "placeholder" (it would have been better as a button that grayed out when not needed, in my opinion) to let you know where unlocking happens. An actual account lockout raises the extra text and clicking that checkbox now does something.

    I prefer the way AD Administrative Center handles this:

    image

    image

    Even better, I can just find the locked accounts and unlock them right there.

    image

    Or even betterer…er:

    image

    image
    Woot, let’s unlock everyone and hit the bar!

    Reminder: account lockouts are yuck. It’s just a way to create denial of service attacks. Use intrusion detection with auditing to find villains trying to brute force passwords. Even better, use two-factor auth with smart cards, which chemically neuters external brute force. If your security department thinks account lockout is better than this, get a new security department; yours is broken.

    Question

    Are RDC Recursion depth, Comparator buffer size, horizon size, hash window size RDC parameters configurable for DFSR?

    Answer

    No, no, no, and no. :) All you can choose is the minimum size to use RDC, or if you don’t want RDC at all.

    image

    image

    That’s a great doc on how to write your own RDC application, by the way. It’s shocking how few there have been; we have an internal RDC copy utility that is the bomb. I wish we’d sell it to you, I like money.

    image
    Not as much as this guy, obviously

    Question

    How can I use USMT offline migration with vendor-provided full disk encryption, like McAfee Endpoint Encryption, Symantec PGP Whole Disk Encryption, Check Point Full Disk Encryption, etc. I already know that with Microsoft BitLocker I just need to suspend it temporarily.

    Answer

    Any official documentation on making WIN PE mount a vendor-encrypted volume would come from the vendor, as they have to provide a driver and installation steps for WIN PE to mount the volume, or steps on how to “suspend” outside of PE like BitLocker. For example, McAfee’s tool of choice seems to be EETech (here is its user guide). I’d highly recommend opening a case with the vendor before proceeding, even if they provide online documentation. Easy to lose your data forever when you start screwing around with encrypted drives.

    USMT does not have any concept of an encrypted volume (any more than Notepad.exe would); he’s a user-mode app.

    Question

    We use DFS Namespace interlinks, where a domain-based namespace links to standalone namespaces which then link to file shares. When we restart a standalone namespace root server though, clients start trying to get referrals as soon as it is available through SMB paths and not when its DFS service is ready to accept referrals. Is this expected?

    Answer

    This is expected behavior and demonstrates why deploying standalone DFS root servers on non-clustered servers goes against our best practices. The client bases server availability on SMB, which is ready at that point on the standalone server – it doesn’t know that this is yet another DFSN referral, and it’s not going to work yet. Interlinks are gross, for this reason. If you must use this, cluster the standalone servers so that they can survive a node reboot for Patch Tuesday without hurting your users’ feelings.

    This is also why Win2008 (V2) namespaces were invented: so that customers could stop creating complex and fragile interlinked domain-standalone DFS namespaces in order to get around scalability limits. V2 scales nearly infinitely and if you deploy it, you can cut out the middle layer of servers and hopefully, save a bunch of dough.

    Question

    Have you ever seen the DFSR ConflictAndDeleted folder grow larger than the quota set, even when the XML file is not corrupt? E.g. 5GB when quota is set to the default of 660MB.

    Answer

    Yes, starting in Windows Server 2008. Previously, a damaged conflictanddeletedmanifest.xml required manual deletion. In Win2008 and later, the DFSR service detects errors parsing that XML file. It writes “Deleting ConflictManifest file” in the DFSR debug log and automatically deletes the manifest file, then creates a brand new empty one. Any files that were previously noted in the deleted manifest are no longer tracked, so they become orphaned in the folder. Not an ideal solution, but now you’re less likely to run out of disk space due to a corrupt manifest. That’s the downside to using a non-transactional file like XML– if there’s a disk hiccup, voltage gremlin, or trans-dimensional rift, you get incomplete tags.

    I bet a bunch of DFSR admins are now checking their ConflictAndDeleted folders…

    image
    Aha, there’s that spreadsheet I was looking for… eww, it’s got eggshell goop on it.

    Other Stuff

    Black Hat put up their 2011 USA presentations, make sure you browse around. The ones I found most interesting (and include a whitepaper, slide deck, or video):

    • How a Hacker Has Helped Influence the Government - and Vice Versa (the writer of L0phtcrack talks about being a PM at DARPA)
    • Corporate Espionage for Dummies: The Hidden Threat of Embedded Web Servers (endless web servers you didn’t even know you had running)
    • Killing the Myth of Cisco IOS Diversity: Towards Reliable, Large-Scale Exploitation of Cisco IOS (he who controls the spice, controls the universe!)
    • Easy and quick vulnerability hunting in Windows (he points out how to examine your vendor apps carefully, as your vendor often isn’t)
    • Faces Of Facebook-Or, How The Largest Real ID Database In The World Came To Be (or: the reason Ned does not use social media)
    • OAuth – Securing the Insecure (or: the other reason Ned does not use social media)
    • Battery Firmware Hacking (Good lord, start FIRES?!)
    • Hacking Medical Devices for Fun and Insulin: Breaking the Human SCADA System (Never mind fires, hacking people into diabetic comas!)

    There were several Apple and iOS pwnage talks too. I don’t care about those but you might, especially if you’re the new “owner” of those unmanaged boxes in your environment, thanks to the Sales Borgs wanting iPads for no legitimate reason… Another hidden cost of “IT consumerization”.

    A few people asked for a DOCX copy of the Accelerating Your IT Career post. Grab it here.

    Free Artificial Intelligence class online from a Research Professor of Computer Science at Stanford University. Looks neato for the price.

    Did you watch the Futurama season finale last night? The badly dubbed manga was a gigantic trough of awesome. I was right next to Katey Sagal at my hotel at Comic-Con. She is teeny.

    Space.com has a terrific infographic of the 45 years of Star Trek.

    The entire history of Star Trek is in this SPACE.com timeline infographic.
    Source: SPACE.com: All about our solar system, outer space and exploration

    At Microsoft, you name your own computers and dogfooding means you can join as many to the domain as you like. My user domain alone has 58,420 computers and it’s a “small” one in our forest, so trying to control machine names is counterproductive even for bureaucratabots. I have a test server called Stink, and yesterday I needed to remote its registry. When I typed in the name, I found I wasn't the first to think of smelly nomenclature:

    clip_image001 
    The last one should be a band name

    For MarkMoro and all those like him, the 10 best ’80s cop show opening credits (warning: a couple sweary words in the text, but all movies totally SFW; this is old American network TV, after all).

    Finally, the best email conversation thread of last month:

    Jonathan: Darn you, Ned. I lost track of an hour on this ConceptRobots site.
    Ned: I should get a commission from the sites I push in Friday mail sacks.
    Jonathan: Yes… you wield your influence so adroitly.
    Ned: Or was it… androidly?

    ahahHAHAHAHAHAHHAHAHAHHAHAHAHA


    Ha.

    Jonathan: I'm going to destroy your cubicle when you go to lunch.
    Ned: That's fair.

     

    Have a great weekend, folks.

    Ned "still has a thing for Stephanie Powers" Pyle