Disclaimer: All postings are provided "AS IS" with no warranties, and confer no rights. This weblog does not represent the thoughts, intentions, plans or strategies of Microsoft. Because a weblog is intended to provide a semi-permanent point-in-time snapshot, you should not consider out of date posts to reflect current thoughts and opinions.
Hi folks, Ned here again. It’s been nearly a month since the last Mail Sack post so I’ve built up a good head of steam. Today we discuss FRS, FSMO, Authentication, Authorization, USMT, DFSR, VPN, Interactive Logon, LDAP, DFSN, MS Certified Masters, Kerberos, and other stuff. Plus a small contest for geek bragging rights.
Clickity Clackity Clack.
I’ve read TechNet articles stating that the PDC Emulator is contacted when authentication fails - in case a newer password is available - and the PDCE would know this. What isn't stated explicitly is whether the client contacts or the current DC contacts the PDCE on behalf of the client. This is important to us as our clients won’t always have a routable connection to the PDCE but our DCs will; a DMZ/Perimeter network scenario basically.
Excellent question! We document the password and logon behaviors here rather loosely: http://msdn.microsoft.com/en-us/library/cc223752(PROT.13).aspx. Specifically for the “bad password, let’s try the PDCE” piece, it works like this:
1. I use some bad credentials on my Windows 7 client (using RunAs to start notepad.exe as my Tony Wang account)
2. Then we see this conversation:
a. Frame 34, the client contacts his 02 DC with a Kerberos Logon request as Twang in the Contoso domain. b. Frame 40, DC 02 knows the password is bad, so he then forwards the same Kerberos Logon request to the PDCE 01. c. Frame 41, the PDCE 01 responds back to the 02 DC with KDC Error 24 (“bad password”). d. Frame 45, the DC 02 responds back to the client with “bad password”.
a. Frame 34, the client contacts his 02 DC with a Kerberos Logon request as Twang in the Contoso domain.
b. Frame 40, DC 02 knows the password is bad, so he then forwards the same Kerberos Logon request to the PDCE 01.
c. Frame 41, the PDCE 01 responds back to the 02 DC with KDC Error 24 (“bad password”).
d. Frame 45, the DC 02 responds back to the client with “bad password”.
3. User now gets:
I described the so-called “urgent replication” here: http://blogs.technet.com/b/askds/archive/2010/08/18/fine-grained-password-policy-and-urgent-replication.aspx. That covers how account lockout and password changes processing will work (that’s DC to PDCE too, so no worries there for you).
Can you help me understand cached domain logons in more detail? At the moment I have many Windows XP laptops for mobile users. These users logon to the laptops using cached domain logins. Afterwards they establish a VPN connection to the company network. We have some third party software that and group policies that don’t work in this scenario, but work perfectly if the user logs on to our corporate network instead of the VPN, using the exact same laptop.
We don’t do a great job in documenting how the cached interactive logon credentials work. There is some info here that might be helpful, but it’s fairly limited:
How Interactive Logon Works http://technet.microsoft.com/en-us/library/cc780332(v=WS.10).aspx
But from hearing this scenario many times, I can tell you that you are seeing expected behavior. Since a user is logging on interactively with cached creds (stored here in an encrypted form: HKEY_LOCAL_MACHINE\Security\Cache) while offline to a DC in your scenario, then they get a network created and access resources, anything that only happens at the interactive logon phase is not going to work. For example, logon scripts delivered by AD or group policy. Or security policies that apply when the computer is started back up (and won’t apply for another 90-120 minutes while VPN connected – which may not actually happen if the user only starts VPN for short periods).
I made a hideous flowchart to explain this better. It works – very oversimplified – like this:
As you can see, with a VPN not yet running, it is impossible to access a number of resources at interactive logon. So if your application’s “resource authentication” only works at interactive logon, there is nothing you can do unless the app changes.
This is why we created VPN at Logon and DirectAccess – there would be no reason to make use of those technologies otherwise.
How to configure a VPN connection to your corporate network in Windows XP Professional http://support.microsoft.com/kb/305550
Where Is “Logon Using Dial-Up Connections” in Windows Vista? http://blogs.technet.com/b/grouppolicy/archive/2007/07/30/where-is-logon-using-dial-up-connections-in-windows-vista.aspx
DirectAccess http://technet.microsoft.com/en-us/network/dd420463.aspx
If you have a VPN solution that doesn’t allow XP to create the “dial-up network” at interactive logon, that’s something your remote-access vendor has to fix. Nothing we can do for you I’m afraid.
Can DFSR use security protocols other than Kerberos? I see that it has an SPN registered but I never see that SPN used in my network captures or ticket cache.
DFSR uses Kerberos auth exclusively. The DFSR client’s TGS request does not contain the DFSR SPN, only the HOST computer name. So the special looking DFSR SPN is - pointless. It’s one of those “almost implemented” features you occasionally see. :)
Let’s look at this in action.
Two DFSR (06 and 07) servers doing initial sync, talking to their DC (01). TGS requests/responses, using only the computer HOST name SPNs:
Then DFSR service opens RPC connections between each server and uses Kerberos to encrypt the RPC traffic with RPC_C_AUTHN_LEVEL_PKT_PRIVACY, using RPC_C_AUTHN_GSS_NEGOTIATE and requiring RPC_C_QOS_CAPABILITIES_MUTUAL_AUTH. Since NTLM doesn’t support mutual authentication, DFSR can only use Kerberos:
If you block Kerberos from working (TCP/UDP 88), DFSR falls over and the service won’t start:
Event 1202 "Failed to contact domain controller..." with an extended error of "160 - the parameter is incorrect"
I am using the USMT scanstate /P option to get a size estimate of a migration. But I don’t understand the output. For example:
4096 434405376 0 426539816 512 427467776 1024 428611584 2048 430821376 4096 434405376 8192 446136320 16384 467238912 32768 512098304 65536 587988992 131072 812908544 262144 1266679808 524288 2189426688 1048576 4041211904
USMT is telling you the size estimate based on your possible NTFS cluster sizes. So 4096 means a 4096-byte cluster sizes will take 434405376 bytes (or 414MB) in an uncompressed store. Starting in USMT 4.0 though the /P option was extended and now allows you to specify an XML output file. It’s a little more readable and includes temporary space needs:
scanstate c:\store /o /c /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml <?xml version="1.0" encoding="UTF-8"?> <PreMigration> <storeSize> <size clusterSize="4096">72669229056</size> </storeSize> <temporarySpace> <size>151299104</size> </temporarySpace> </PreMigration> scanstate c:\store /o /c /nocompress /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml <?xml version="1.0" encoding="UTF-8"?> <PreMigration> <storeSize> <size clusterSize="4096">92731744256</size> <size clusterSize="0">92511635806</size> <size clusterSize="512">92538449408</size> <size clusterSize="1024">92565861376</size> <size clusterSize="2048">92620566528</size> <size clusterSize="4096">92731744256</size> <size clusterSize="8192">92958539776</size> <size clusterSize="16384">93413900288</size> <size clusterSize="32768">94341398528</size> <size clusterSize="65536">96226705408</size> <size clusterSize="131072">100214767616</size> <size clusterSize="262144">108447399936</size> <size clusterSize="524288">125118185472</size> <size clusterSize="1048576">159657230336</size> </storeSize> <temporarySpace> <size>158364704</size> </temporarySpace> </PreMigration>
scanstate c:\store /o /c /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml
<?xml version="1.0" encoding="UTF-8"?>
<PreMigration>
<storeSize>
<size clusterSize="4096">72669229056</size>
</storeSize>
<temporarySpace>
<size>151299104</size>
</temporarySpace>
</PreMigration>
scanstate c:\store /o /c /nocompress /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml
<size clusterSize="4096">92731744256</size>
<size clusterSize="0">92511635806</size>
<size clusterSize="512">92538449408</size>
<size clusterSize="1024">92565861376</size>
<size clusterSize="2048">92620566528</size>
<size clusterSize="8192">92958539776</size>
<size clusterSize="16384">93413900288</size>
<size clusterSize="32768">94341398528</size>
<size clusterSize="65536">96226705408</size>
<size clusterSize="131072">100214767616</size>
<size clusterSize="262144">108447399936</size>
<size clusterSize="524288">125118185472</size>
<size clusterSize="1048576">159657230336</size>
<size>158364704</size>
Sheesh, 72GB compressed. I need to do some housecleaning on this computer…
I was poking around with DFSRDIAG.EXE DUMPMACHINECFG and I noticed these polling settings. What are they?
Good eye. DFSR uses LDAP to poll Active Directory in two ways in order to detect changes to the topology:
1. Every five minutes (hard-coded wait time) light polling checks to see if subscriber objects have changed under the computer’s Dfsr-LocalSettings container. If not, it waits another five minutes and tries again. If there is something new, it does a full LDAP lookup of all the settings in the Dfsr-GlobalSettings and its Dfsr-LocalSettings container, slurps down everything, and acts upon it.
2. Every sixty minutes (configurable wait time) it slurps down everything just like a light poll that detected changes, no matter if a change was detected or not. Just to be sure.
Want to skip these timers and go for an update right now? DFSRDIAG.EXE POLLAD.
While reviewing FRS KB266679 I noted:
"The current VV join is inherently inefficient. During normal replication, upstream partners build a single staging file, which can source all downstream partners. In a VV join, all computers that have outbound connections to a new or reinitialized downstream partner build staging files designated solely for that partner. If 10 computers do an initial join from \\Server1, the join builds 10 files in stage for each file being replicated."
Is this true – even if the file is identical FRS makes that many copies? What about DFSR?
It is true. On the FRS hub server you need staging as large as the largest file x15 (if you have 15 or more spokes) or you end up becoming rather ‘single threaded’; a big file goes in, gets replicated to one server, then tossed. Then the same file goes in, gets replicated to one server, gets tossed, etc.
Here I create this 1Gb file with my staging folder set to 1.5 GB (hub and 2 spokes):
Note how filename and modified are changing here in staging as it goes through one a time, as that’s all that can fit. If I made the staging 3GB, I’d be able to get both downstream servers replicating at once, but there would definitely be two identical copies of the same file:
Luckily, you are not using FRS to replicate large files anymore, right? Just SYSVOL, and you’re planning to get rid of that also, right? Riiiiiiiiggghhhht?
DFSR doesn’t do this – one file gets used for all the connections in order to save IO and staging disk space. As long as you don’t hit quota cleanup, a staged file will stay there until doomsday and be used infinitely. So when it works on say, 32 files at once, they are all different files.
Are there any DFSR registry tuning options in Windows Server 2003 R2? This article only mentions Win2008 R2.
No, there are none. All of the OS non-specific ones listed are still valuable though:
Is there a scriptable way to change do what DFSUTIL.EXE CLIENT PROPERTY STATE ACTIVE or Windows Explorer’s DFS’ Set Active tabs do? Perhaps with PowerShell?
In theory, they could implement what the DfsShlEx.dll is doing in Windows Explorer:
NetDfsSetClientInfo
Not a cmdlet (not even .NET), but could eventually be exposed by .NET’s DLLImport and thusly, PowerShell. Which sounds really, really gross to me.
Or just drive DFSUTIL.EXE in your code. I hesitate to ask why you’d want to script this. In fact, I don’t want to know. :)
Are there problems with a user logging on to their new destination computer before USMT loadstate is run to migrate their profile?
Yes, if they then start Office 2007/2010 apps like Word, Outlook, Excel, etc. portions of their Office migration will not work. Office relies heavily on reusing its own built-in ‘upgrade’ code:
http://support.microsoft.com/kb/2023591 Note To migrate application settings, you must install applications on the destination computer before you run the loadstate command. For Office installations, you must run the LoadState tool to apply settings before you start Office on the destination computer for the first time by using a migrated user. If you start Office for a user before you run the LoadState tool, many settings of Office will not migrate correctly.
http://support.microsoft.com/kb/2023591
Note To migrate application settings, you must install applications on the destination computer before you run the loadstate command. For Office installations, you must run the LoadState tool to apply settings before you start Office on the destination computer for the first time by using a migrated user. If you start Office for a user before you run the LoadState tool, many settings of Office will not migrate correctly.
Other applications may be similarly affected, Office is just the one we know about and harp on.
I am seeing very often that a process named DFSFRSHOST.EXE is taking 10-15% CPU resources and at the same time the LAN is pretty busy. Some servers have it and some don’t. When the server is rebooted it doesn’t appear for several days.
Someone is running DFSR health reports on some servers and not others – that process is what gathers DFSR health data on a server. It could be that someone has configured scheduled reports to run with DFSRADMIN HEALTH, or is just running it using DFSMGMT.MSC and isn’t telling you. If you have an enormous number of files being replicated the report can definitely run for a long time and consume some resources; best to schedule it off hours if you’re in “millions of files” territory, especially on older hardware and slower disks.
FRS replication is not working for SYSVOL in my domain after we started adding our new Win2008 R2 DCs. I see this endlessly in my NTFRS debug logs:
Cmd 0039ca50, CxtG c2d9eec5, WS ERROR_INVALID_DATA, To DC2.mydomain.contoso.com Len: (436) [SndFail - rpc call]
Is FRS compatible between Win2003 and Win2008 R2 DCs?
That type of error makes me think you have some intrusion protection software installed (perhaps on the new servers, in a different version than on the other servers) or something is otherwise altering data on the network (such as when going through a packet-inspecting firewall).
We only ever see that issues when caused by a third party. There are no problems with FRS talking to each other on 2003, 2008, or 2008 R2. The FRS RPC code has not changed in many years.
You should get double-sided network captures and see if something is altering the traffic between the two servers. Everything RPC should look identical in both captures, down to a payload level. You should also try *removing* any security software from the 2 DCs and retesting (not disabling; that does nothing for most security products – their drivers are still loaded when their services are stopped).
When I run USMT 4.0 scanstate using /nocompress I see a catalog.mig created. It seems to vary in size a lot between various computers. What is that?
It contains all the non-file goo collected during the gather; mainly the migrated registry data.
James P Carrion has been posting a very real look into the MS Certified Masters program as seen through the eyes of a student working towards his Directory Services cert. If you’ve thought about this certification I recommend you read on, it’s fascinating stuff. Start at the oldest post and work forward; you can actually see his descent into madness…
----------
Microsoft uses a web-based system for facilities requests. The folks that run that department are excellent and the web system usually works great. Every so often though, you get something interesting like this…
Uuuhhh, I guess I can wait to see how that pans out.
-----------
And finally here is this week’s Stump the Geek contest picture:
Name both movies in which this picture appears. The first correct reply in the Comments gets the title of “Silverback Alpha Geek”. And nothing else… it’s a cruel world.
Have a good weekend folks.
- Ned “hamadryas baboon” Pyle
Stump the Geek: My guess, "Alien" and "Bladerunner"
And we have a winner! Congratulations Darkseid64. :-)
How to make sure which DC is currently authenticating the user,i guess SET cmd displays the cached logon as well as NSLOOKUP result is also from cache, my query is how to make sure these data are real not cached one.
Thanks
Awinish
Awesome list of information as always! On a more serious note, hope all you folks are doing fine down there in NC after the tornadoes!!
Steve
Thanks Steve, we're all good here. You know how the press likes to... exagerate. The so-called Charlotte tornado was in Salisbury, 40 miles north of the city center.
Hi Awinish. Is your question just that you want to see which DC authenticated a user? If so, you're right that the %logonserver% is not a reliable way to tell.
Getting a network capture is one reliable way. You will see which KDC was returned by DsGetDCName, then where the user sent their AS_REQ and got their TGT back successfully. Another (simpler) way is just to audit logons and kerberos authentication on the DCs; you will see in plain english where a user logged on.
Press exaggerate... noooo! :) Everytime it snows around here, you'd think they never saw the stuff before. "Quick, get 15 reporters stationed around the city! I have to know what this white stuff is!!"
Glad to hear all is well there. Scary situation for all those that were impacted by the madness out there. I'm not even sure now of how many tornadoes they said touched down, but I know it was a unfortunately high number... not to even mention the even more unfortunate death toll.
Thanks for clearing the doubt Ned.