Storage Replica (SR) is a new feature that enables storage-agnostic, block-level, synchronous replication between servers for disaster recovery, as well as stretching of a failover cluster for high availability. Synchronous replication enables mirroring of data in physical sites with crash-consistent volumes ensuring zero data loss at the file system level. Asynchronous replication allows site extension beyond metropolitan ranges with the possibility of data loss.
Ned Pyle, the Product Manager for Storage Replica, has written a great “getting started” guide here:
http://social.technet.microsoft.com/Forums/windowsserver/en-US/f843291f-6dd8-4a78-be17-ef92262c158d/getting-started-with-windows-volume-replication?forum=WinServerPreview
I got mine going after adding the Windows Storage Replication feature in Server Manager:
It’s configured in Failover Clustering:
· The existing 2003 DFS server is FILE1. All the commands below are run on this server in C:\temp.
· The first new 2012 R2 DFS server is FILE2
· The domain is child.corp.contoso.com
· The DFS Namespace in the domain is called “Testing”, so the path is \\child.corp.contoso.com\Testing
· There are 3 target servers where all DFS links point to: TARGET1 , TARGET2 , TARGET3
Hi,
Pat Fetty recently blogged about the new SCM baselines for Office 2013 going live.
I opened up my local copy of SCM and imported the content:
The .cab file contains the security settings. The “att” file contains the attachments which are Word documents describing the security baseline settings.
You may get prompted at this point to accept the security details of the package. Inspect the certificates to make sure they are issued by Microsoft and are trusted by your computer.
There are user and computer settings, separated by individual Office programs or core Office settings.
Done!
Browsing these new settings looks like this:
Once you export these settings into a GPO Backup and import them onto an existing blank GPO in your domain, you’ll want the ADMX/ADML files which relate to the Office 2013 settings. And you’ll probably want to save them into your PolicyDefinitions folder in SYSVOL:
\\your.domain.name\SYSVOL\your.domain.name\Policies\PolicyDefinitions
Get theme here:
http://www.microsoft.com/en-us/download/details.aspx?id=35554
I’ve just finished working on a case with a customer that was so interesting that it deserved a blog post to round it off.
These were the symptoms:
Often while logged in to the server things would appear to freeze – no screen updates, little mouse responsiveness, if you could start a program (perfmon, Task Manager, Notepad etc.) then you wouldn’t be able to type into it and if you did it would crash.
This Windows Server 2008 R2 server runs TSM backup software with thousands of servers on the network sending their backup jobs to it. At any one time there could be hundreds of backup jobs running. The load was lower during the day, but it was always working hard dealing with constant backups of database snapshots from servers. The backup clients are Windows, UNIX, Solaris, you name it…
When the server froze, you’d see 4 of the 24 logical CPUs lock at 100% and the other 20 CPUs would saw-tooth from locking at 100% to using 20-30%. The freeze would happen for minutes at a time.
There are 2 Intel 10GB NICs in a team using Intel's teaming software. The team and the switches are setup with LACP to enable inbound load balancing and failover.
By running perfmon remotely before the freeze happens we could see that the 4 CPUs that are locked at 100% are locked by DPCs. We used the counter “Processor Information\% DPC Time”.
A DPC is best defined in Windows Internals 6th Ed. (Book 1, Chapter 3):
Because this is a backup server, we’re expecting that the bulk of our hardware DPCs will be generated by incoming network packets and raised by the NICs. Though they could have been coming from the tape library or the storage arrays.
To look into what exactly is generating DPCs and how long the DPCs last for, we need to run Windows Performance Toolkit, specifically WPR.exe (Windows Performance Recorder). We have to do this carefully. We don’t want to increase the load of the server by capturing the Network and CPU activity of a server which already has high activity on the CPU and Network, and has shown a past history of crashing. But we want to run the capture while the server is in a frozen state. A tricky thing. So we ran this batch file:
Start /HIGH /NODE 1 wpr.exe -start CPU –start Network -filemode –recordtempto S:\temp ping -n 20 127.0.0.1 > nul Start /HIGH /NODE 1 wpr.exe –stop S:\temp\profile_is_CPU_Network.etl
Start /HIGH /NODE 1 wpr.exe -start CPU –start Network -filemode –recordtempto S:\temp
ping -n 20 127.0.0.1 > nul
Start /HIGH /NODE 1 wpr.exe –stop S:\temp\profile_is_CPU_Network.etl
If the server you are profile has a lot of RAM (24GB or more), you’ll want to protect your non-paged pool from increasing and harming your server. To do that you should review this blog and add this switch to the start command: –start "C:\Program Files (x86)\Windows Kits\8.0\Windows Performance Toolkit\SampleGeneralProfileForLargeServers.wprp"
We’re starting on NUMA node 1 as the NICs were bound to NUMA node 0 and the “Processor Information” perfmon trace we took earlier showed that the CPUs on NUMA node 0 were locked. We’re starting the recorder with a “high” prioritization so that we can be sure it gets the CPU time it needs to work. We’re not writing to RAM, we’re recording to disk in the hopes that if the trace crashes we’ll at least have a partial trace to use. We made sure that S: in this example was a SAN disk to ensure it had the required speed to keep up with the huge data we’re expecting. We’re pinging 20 times to make sure our trace is 20 seconds long. And finally we’re starting a trace of CPU and Network profiles.
Note that to gather stacks we first had to disable the ability for the Kernel (aka the Executive) to send its own pages of memory out from RAM to the pagefile, where we cannot analyze them. To do this run wpr –disablepagingexecutive on and then reboot.
We retrieved 3 traces in all:
So this blog now becomes a short tutorial on how you can use WPA (Windows Performance Analyzer) to locate the source of DPC issues. WPA is a VERY powerful tool and diagnosing problems is part science, part art. Meaning that no two diagnosis are ever done in the same way. This is just how I used WPA in this case. For this analysis, you’ll need the debugging tools installed and symbols configured and loaded.
First I want to see which CPUs are pegged. For that we use “CPU Usage (Sampled)\Utilization By CPU”, then select a time range by right-clicking:
Choose a round number (10 seconds in my example) as it makes it easier to quickly calculate how many things happened per minute when comparing to the graphs for the later scenarios:
I chose 20 seconds to 30 seconds as it is a 10 second window where there was heavy load and not blips due to tracing starting or stopping. Then “Zoom” by right clicking again.
Now all your graphs will be focused on that time range.
Then shift-select the CPUs which are pegged. In this case it is CPUs 0, 2, 4 and 6. This is because the cores are Hyperthreaded and the NICs cannot interrupt a logical CPU which is the result of Hyperthreading (CPUs 1, 3, 5, 7 etc.). And they are low-numbered CPUs because they are located on NUMA node 0.
Once they are selected, right-click and choose “Filter to Selection”:
Next we want to add a column for DPCs so we can see how much of the CPUs time was spent locked processing DPCs. To add columns, just right click on the column title bar (in the screen above this has “Line # | CPU || Count | Weight (in view) | Timestamp”) on the centre of the right hand pane and select the columns you want to display. Once the DPC/ISR column has been added, drag it to the left side of the yellow bar, next to the CPU column:
Expanding out the CPU items, we see that DPCs count for almost all of the CPU activity on these CPUs (the count figures for the CPUs activity is 10 seconds of CPU time and the count of CPU time for DPCs under this is over 9 seconds).
The next WPA graph we need is the one which can show how long the DPCs last for. We drag in the first graph under “DPC/ISR” called “DPC duration by Module, Function”:
One the far right column (“Duration”), we can see how long each module spends waiting with a DPC. This says that 36.8 seconds were spent on DPCs for NDIS.SYS alone. How can it be 36.8 seconds if the sample window is 10 seconds? Well, it is CPU seconds, and we have 24 CPUs, so we could potentially have 240 CPU seconds in all.
The next biggest waiter for DPCs is storport.sys. But at 1 second, it’s not even close.
The column with the blue text is called “Duration (Fragmented) (ms) Avg” and is the average time a DPC lasts for during this sample window. The NDIS.SYS DPCs last around 0.22 milliseconds, or 220 microseconds. The count of DPCs for NDIS and storport are comparatively similar (163,000 and 123,000 respectively), but because NDIS took so long on each DPC on average, it ended up locking the CPU for longer than storport did.
So let’s add the CPU column, move it to the left side of the yellow line with it as the first column to pivot on:
We can see that our targeted CPUs, 0, 2, 4. 6 have very high durations of DPC waits (using the last column for “Duration”, again) with no other CPU spending very much time in a DPC wait state. So we select these CPUs and filter.
Expanding out the CPUs, we see that there are many different sources of DPCs, but that NDIS is really the biggest source of DPC waits. So we will now move the “Module” column to be the left-most column and remove the CPU column from view. We then right click on NDIS.SYS and “Filter to Selection” again as we only want to focus on DPCs from NDIS on CPUs 0, 2, 4, 6:
One function, ndisInterruptDPC is causing our DPC waits. This is the one we’ll focus on. If we expand this, it will list every single DPC and how long that wait is. Select every single one of these rows by scrolling to the very bottom of the table (in this example there are 163,230 individual DPCs):
Right click on the column called “Duration” and choose “Copy Other” and then “Copy Column Selection”. This will copy only the values in the “Duration” column. We can paste this into Excel and create a graph which shows the duration of the DPCs as a function of the number of DPCs present:
I have added a red line on 0.1 milliseconds because according the hardware development kit for driver manufacturers, a DPC should not last longer than 100 microseconds. Meaning DPC above the red line are misbehaving. And that this is the bulk of our time spent waiting on DPCs.
So, we have established that we have slow DPCs on NDIS, and lots of them, and that they are locking our 4 CPUs. Our NICs aren’t able to spread their DPCs to any other CPUs and Hyperthreading isn’t really helping our specific issue. But what is causing the networking stack to generate so many slow DPC locks?
The final graph in WPA will show us this. From the category “CPU Usage (Sampled)”, drag in a graph called “DPC/ISR Usage by Module, Stack”. Filter to DPC (which will exclude ISRs) and our top candidates are:
To see what these are doing we simply expand the stack columns by clicking the triangle of the row with the highest count, looking for informative driver names and a large drop in the number of counts present, indicating that this particular function is causing a consumption of CPU time.
NTOSKRNL is running high because we are capturing. The kernel is spending time gathering ETL data. This can be ignored.
NETIO is redirecting network packets to/from tcpip.sys for a function called InetInspectRecieve:
TCP/IP is dealing with the NETIO commands above to do this “Receive Inspection”:
NDIS.SYS is dealing with 2 main functions in tcpip.sys: TcpTcbFastDatagram and InetInspectRecieve again:
Other than ntoskrnl, these 3 Windows networking drivers all have entries for the drivers listed as 5, 6 and 7 above in their stacks.
Lots of DPCs are caused by 3 probable sources:
Our actions were to make changes over 2 separate outage windows:
Here is what the picture looked like after we dissolved the NIC team, updated the NIC driver and enabled Intel I/OAT in the BIOS.
In this 10 second sample we can see that the 4 CPU cores are still effectively locked as the CPU time due to NDIS DPCs is 37.7 seconds (out of a possible maximum of 40 seconds. The number of DPCs has decreased by more than half to 55,000, meaning that the average duration of DPCs has become very long at 682 microseconds – triple the average time from before we removed the NIC team and enabled I/OAT.
The blue area of the graph above is the picture we had from before changes were made. The pink/orange area is the picture of DPC durations after removing NIC teaming and enabling I/OAT.
So why did the average duration of DPCs get longer?
It could be that the IDS software now does not need to relinquish its DPCs to make room on the same CPU cores as the DPCs for the NIC teaming driver. These 2 drivers must be locked to the same CPUs. With no need to relinquish a DPC due to another DPC of equal priority, the IDS DPCs are free to use the CPU for longer periods of time before being forced off.
At any rate, it certainly isn’t fixed yet.
And finally here’s what the picture looked like after we uninstalled the IDS portion of the Symantec package. Remember, this service was not configured to be enabled in any way.
You can see that the average time has dropped from 220 microseconds to 90 microseconds – below the 100 microsecond threshold required by the Driver Development Kit.
In this 10 second sample there were 127,000 DPCs from NDIS on the 4 heavily used CPUs, but the CPU time they consumed was 11 seconds, a reduction from 36.8 seconds.
The blue area of the graph above is the picture we had from before changes were made. The pink/orange area is the picture of DPC durations after removing NIC teaming and enabling I/OAT. And the green area is the picture after IDS is removed.
This is a dramatic improvement. Nearly all DPCs are below the 100 microsecond limit. The system is able to process the incoming load without locking up for high priority, long lasting DPCs.
We’re not quite done though. 4 of our CPUs are still working very hard, often pegged at 100%. But why only 4? This is a 2-socket system with 6 cores on each socket. That gives us 12 CPUs where we can run DPCs. DPCs from one NIC are bound to one NUMA node. We already dissolved our NIC team, so we only have 1 NIC in action, so we are limited to 6 cores. RSS can spread DPCs over CPUs in roots of 2, meaning 1, 2, 4, 8, 16, 32 cores. Meaning we can at most use 4 CPUs per NIC.
To scale out we would need to add more NICs and limit RSS on each of those NICs to 2 cores. We’d need to bind 3 NICs to NUMA node 0 and 3 to NUMA node 1. We’d also need to set the starting CPUs for those NICs to be cores 0, 2, 4, 6, 8 and 10. In that we can saturate every possible core.
But to do this, we’d need to ensure that we can have multiple NICs, without using the teaming software. Which means we’d need to assign each NIC a unique IP address. To do that we need to make sure that the TSM clients can deal with targeting a server name with multiple IP addresses in DNS for that name. And if connectivity to the first IP address is lost, that TSM can failover to one of the other IP addresses. We’ll test TSM and get back with our results later.
But we need one more fundamental check before doing that: We need to make sure that the incoming packet, hitting a specific NUMA node and core is going to end up hitting the right thread of the TSM server where that packet is going to be dealt with and backed up. If we can’t align a backup client to the incoming NIC and align that NIC to the backup software thread that should process it, then we’ll be causing intra-CPU interrupts, or worse yet, cross NUMA interrupts. This would make the entire system much less scalable.
So this is how this would all look. The registry key to set the NUMA node to bind a NIC to is “*NumaNodeId” (including the * at the start). To set the base CPU, use *RssBaseProcNumber”. To set the maximum number of processors to use set “*RssBaseProcNumber”.
These keys are explained here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff570864(v=vs.85).aspx
and here: Performance Tuning Guidelines for Windows Server 2008 R2
And more general information on how RSS works in Windows Server 2008 are here: Scalable Networking- Eliminating the Receive Processing Bottleneck—Introducing RSS
Our problem in the above picture, however, is that our process doesn’t know to run its threads on the NUMA node and cores where the incoming packets are arriving. Had this been SQL server, we could have run separate instances configured to start using specific CPUs. Hopefully, one day, TSM will operate like this and become NUMA-node aware.
I know this has been a long post, but for those who have read down to here, I do hope this has helped you with your troubleshooting using WPT.
I have been helping a customer with a tricky issue recently regarding slow network performance for SMB file copies over their network.
It came about after they took the settings defined in Security Compliance Manager for their member servers and deployed them as a Group Policy to their server OU. After doing this, they saw an 80% reduction in the performance in SMB file copies. But when we used Ntttcp.exe to test the network throughput via a test data stream, the throughput was not affected. Only SMB was affected.
They had Windows Server 2008 R2 SP1 VMs on ESX with 1 virtual 10Gb NIC patched to a team of 2 physical 10Gb NICs. When 2 servers tried to copy a set of large test files without the SCM security settings applies, they could reach around 400Mbps. When we applied the settings, that dropped to around 80Mbps
In the SCM security definitions, there are 234 settings defined. We had to find out which one of these settings caused their issue.
We could see that the CPUs of the VM were going nuts with a wild saw-tooth pattern of all CPUs. We tried adding more CPUs and the saw-tooth pattern simply spread without making any major change in achievable throughput.
The process consuming the CPU time in Task Manager was ‘System’.
So, to break into ‘System’ a little more, we ran Windows Performance Recorder (WPR) to get a trace of CPU activity, like this:
And in the trace, we expanded out “CPU Usage (Sampled)”, and added the graph for “DPC and ISR by Module, Stack”:
This showed us that all our CPU time was spent processing DPCs generated by a driver called cng.sys
This is “Kernel Cryptography, Next Generation” which relates to the server or clients ability to calculate cryptographic equations in the Kernel when doing things like sending or receiving encrypted information, or information which has been signed. Signing in this case could be creating a signature hash for chunks of transmitted data to prove that is hasn’t been modified while on the wire.
This, combined with the fact that only SMB was affected lead us to think it was SMB signing that was our issue.
SMBv2 uses these 2 GPO settings to define SMB signing:
The settings relate to SMBv2. Note that they change the default, in-box setting from “Disabled” to the Microsoft recommended SCM setting of “Enabled”.
For SMBv1 on Windows 2003 and older, the GPO settings are:
Once we removed the “always” settings, the transfer speed returned back to the higher 400Mbps transfer speed we expected.
We discussed the usefulness of this setting and in their network, it would be best to keep the “server” side setting enabled on DCs only to ensure that the GPO files which clients will download from the DCs during a Group Policy refresh have not been altered as these files are security sensitive files, but are usually very small and we don’t mind slightly slower transfer speeds for these files.
Here’s some additional resources we used when investigating SMB signing:
http://blogs.technet.com/b/josebda/archive/2010/12/01/the-basics-of-smb-signing-covering-both-smb1-and-smb2.aspx
http://msdn.microsoft.com/en-us/library/a64e55aa-1152-48e4-8206-edd96444e7f7#id218
http://blogs.msdn.com/b/openspecification/archive/2009/07/06/negtokeninit2.aspx?Redirected=true
http://blogs.msdn.com/b/openspecification/archive/2009/04/10/smb-maximum-transmit-buffer-size-and-performance-tuning.aspx
http://blogs.technet.com/b/filecab/archive/2012/05/03/smb-3-security-enhancements-in-windows-server-2012.aspx
http://support.microsoft.com/kb/320829
http://blogs.technet.com/b/neilcar/archive/2004/10/26/247903.aspx
http://gallery.technet.microsoft.com/NTttcp-Version-528-Now-f8b12769
I recently had the pleasure to help one of our Premier customers with a query they have regarding saving images in Active Directory.
By default, users have permission to save a jpeg or bmp file to their own AD user account. This file can be up to 100KB in size. In a large AD with hundreds of thousands of users, this could quickly increase the size of the AD database. The increase in size can increase backup times, increase the backup size and slow down restores.
This permission is granted via the constructed security principal, “SELF”.
“SELF” is given permission to a set of attributes, not to the individual attributes themselves. By combining attributes into groups of common attributes, you reduce the size of the ACL entry. These groups are called Permission Sets. The attributes which relate to images are:
The attribute Picture is in a Permission Set called Personal-Information. You can see the permission is applied to all users like this:
They wanted to take away the permission for SELF to be able to write to the Picture attribute, but this shouldn’t be a high-level deny for Everyone to write to this attribute. It could be that some users somewhere at sometime need to write to this attribute.
What I suggested they do was de-couple the Picture attribute from the Property-Set called Personal-Information. Then apply an explicit permission to the root of the domain for a group which has write access to this attribute instead.
But how do you link (and therefore unlink) an attribute to a Property-Set?
The property sets are not found in the schema, but instead are found in the Configuration partition, under Extended-Rights.
Each of the Property Sets has an attribute identifying it, called rightsGuid. This GUID is used to pull in attributes as members of the property set by specifying the same GUID in the attribute of the attribute called attributeSecurityGUID. If these 2 GUIDs are the same, then the attribute will be a member of the Property Set. By removing the attributeSecurityGUID entry on the Picture attribute, it is no longer a member of the Personal-Information Property Set. And the SELF will lose permission to write to this attribute.
While this sounds very complicated, here’s a simple picture to explain it all:
The object on the left “CN=Personal-Information” is the property set. The object on the right “CN=Picture” is the attribute in the schema. It’s lDAPDisplayName is thumbnailPhoto. The attributes of these objects, rightsGuid and attributeSecurityGUID have the same value, a matching GUID.
When you remove the attributeSecurityGUID, open the attribute and click the button on the bottom left called “Clear”, as shown below:
Notice also that the text in the attribute editor isn’t the same as the text you see in the window behind. The characters appear as pairs and the pairs in the blocks have been switched around.
In order to restore the GUID if you change your mind, you need to copy the same form of the GUID from another attribute. I chose Post-Code as this is also in the Personal Information Permission Set.
I hope this helps someone else to delegate their Active Directory if needed.
Craig
Hi again,
Another interesting case with a nice, easy solution.
While working with a Premier customer recently we found that the 2 local groups relating to DHCP, “DHCP Administrators” and “DHCP Users” didn’t get created on their new DHCP servers.
Only the role installation steps can do this for us as that will make sure they were actually given the required rights to manipulate or view the service.
We couldn’t just remove and reinstall the role – there was too much configuration already done.
We couldn’t ignore it as we were installing IPAM and it needs to place the IPAM servers computer account into the group “DHCP Users” on the servers. It does this by nesting itself into new a universal group in the domain: IPAMUG. This group is the one which actually becomes a member of the “DHCP Users” group.
The role was installed by a “next-next” manual installation using Server Manager. So it wasn’t as if some PowerShell or DISM.exe switch was accidentally left off. And if we repeated the manual installation, we would likely just end up where we started.
At the end of the Server Manager wizard, you get this completion message (without the big red arrow, that’s my addition).
Inside there is a link to launch a wizard which will configure the DHCP server, called “Complete DHCP configuration”. This wizard does 2 things:
The authorization part is pretty nifty. Usually you do this by right-clicking on the server in the DHCP MMC console and selecting “Authorize”. This will create a object in the Configuration partition of the Active Forest under Services / NetServices for the DHCP server. Only members of Administrators in the forest root domain or members of Enterprise Admins can create objects here. The new wizard lets you type alternate credentials to do this job:
My customer had authorized their DHCP servers, by doing it the old way in the DHCP MMC console using an account with permission to do so.
They hadn’t noticed that small blue link from the image above. There is also a outstanding notification within any Server Manager console which connects to one of these DHCP servers (or on the local host itself). But that was also quiet subtle, and requires that you click on it to see the same blue link:
In fact, we hadn’t even noticed any of this by the time I’d found an alternative way of creating these groups on their DHCVP servers using netsh.exe:
netsh.exe dhcp add securitygroups
Had we run the wizard through to it’s completion, we would have got a success message like this stating that the local groups were successfully created:
I hope this helps someone avoid some troubleshooting time when deploying DHCP on Windows Server 2012.
Just a quick note to publicise that MBAM 2.0 is now out, and each of AGPM 4.0, DaRT8.0, App-v 5.0, UE-V 1.0 each received their own updates to Service Pack 1. They are bundled in the new MDOP 2013.
Read more about it here at the new home for the MDOP team: http://blogs.windows.com/windows/b/business/archive/2013/04/10/making-windows-8-even-more-manageable-with-mdop-2013.aspx
I recently became the proud owner of the fantastic Sonos PLAYBAR. And while the Sonos team is considering creating a Windows 8 App to control their devices, I found a neat little hack to get the DLNA portion of the Sonos to become a “Play To” device from within Windows 8 music apps.
See the blog post here:
http://digitalmediaphile.com/index.php/2013/03/30/using-uncertified-play-to-devices-on-surface-rt-w8-apps/
Here are the registry keys I created for the PLAYBAR:
One of my recent posts was recently polished up enough to appear on the MSPFE blog:
http://blogs.technet.com/b/mspfe/archive/2012/12/06/lots-of-ram-but-no-available-memory.aspx
That blog roll is a new initiative within the Premier Field Engineer community to “put our best foot forward”.
Posts appear from all the Microsoft technologies we support by PFEs like me who are working everyday with our customers to help them to resolve their technical issues. I hope it’s useful to you.
I think the post title is pretty self explanatory.
Just to clarify it a little, the customer who hit this problem found that
OK, so what could be causing the problem?
Well, let’s first define what the 3 built-in security contexts for running services are and how they differ from each other:
LocalService
Limited
No
NetworkService
Yes
LocalSystem
Full
So any service running as LocalService cannot use the computers identity on the network and cannot authenticate to domain-joined resources on the network (and networked computers cannot authenticate to this service). A service running as NetworkService can do this authentication with remote resources. Both of these accounts have very limited permissions to access files and registry keys on the local system.
LocalSystem has no restrictions on the local computer and also has the ability to authenticate on the network and have networked computers authenticate with it. This is a bad context to use for the Hub Transport service as it has too much access on the local server.
So my first thought was that because LocalSystem works when sending messages and NetworkService does not work, then it wouldn’t be a problem regarding network authentication because both of these profiles support authenticating on the network. So it would be a local permission problem. Process Monitor from Sysinternals is a great tool for highlighting missing permissions to local resources.
We shutdown all but 1 Hub Transport servers and on the remaining Hub Transport server we started the Hub Transport service as NetworkService. We then started up Process Monitor with a filter to show only events where RESULT = ACCESS DENIED like this:
But when sending an email from one mailbox to another, it didn’t record any actions which were very interesting at all.
So, back to the drawing board. What about the scenario we excluded at the start? Network authentication. Well, what is happening with authentication is that the Mailbox server is trying to authenticate to the Hub Transport server to let it know that there are new messages that it needs to process.
To get started we need to know how it’s authenticating and are there any problems during authentication. We looked at the Security event logs on both the Mailbox server and the Hub Transport server focusing on the time the test message was sent. What we saw were “Audit Failure” with an Event ID 4265. The interesting parts of the event were that Kerberos was attempted, the SID authenticating was NULL and the error was “invalid key”.
We need to know which Kerberos tickets were in use for the LocalSystem logon session on the Mailbox server (we know that the information Store service starts as LocalSystem from here). We ran LogonSessions.exe from Sysinternals and got an output like this:
C:\>logonsessions.exe Logonsesions v1.21 Copyright (C) 2004-2010 Bryce Cogswell and Mark Russinovich Sysinternals - wwww.sysinternals.com [0] Logon session 00000000:000003e7: User name: CONTOSO\SERVER-1$ Auth package: Negotiate Logon type: (none) Session: 0 Sid: S-1-5-18 Logon time: 10/10/2012 12:04:25 Logon server: DNS Domain: contoso.com UPN: SERVER-1$@contoso.com [1] Logon session 00000000:0000ae9f: User name: Auth package: NTLM Logon type: (none) Session: 0 Sid: (none) Logon time: 10/10/2012 12:04:25 Logon server: DNS Domain: UPN: [2] Logon session 00000000:000003e4: User name: CONTOSO\SERVER-1$ Auth package: Negotiate Logon type: Service Session: 0 Sid: S-1-5-20 Logon time: 10/10/2012 12:04:26 Logon server: DNS Domain: contoso.com UPN: CONTOS-1$@contoso.com [3] Logon session 00000000:000003e5: User name: NT AUTHORITY\LOCAL SERVICE Auth package: Negotiate Logon type: Service Session: 0 Sid: S-1-5-19 Logon time: 10/10/2012 12:04:26 Logon server: DNS Domain: UPN:
C:\>logonsessions.exe
Logonsesions v1.21 Copyright (C) 2004-2010 Bryce Cogswell and Mark Russinovich Sysinternals - wwww.sysinternals.com
[0] Logon session 00000000:000003e7: User name: CONTOSO\SERVER-1$ Auth package: Negotiate Logon type: (none) Session: 0 Sid: S-1-5-18 Logon time: 10/10/2012 12:04:25 Logon server: DNS Domain: contoso.com UPN: SERVER-1$@contoso.com
[1] Logon session 00000000:0000ae9f: User name: Auth package: NTLM Logon type: (none) Session: 0 Sid: (none) Logon time: 10/10/2012 12:04:25 Logon server: DNS Domain: UPN:
[2] Logon session 00000000:000003e4: User name: CONTOSO\SERVER-1$ Auth package: Negotiate Logon type: Service Session: 0 Sid: S-1-5-20 Logon time: 10/10/2012 12:04:26 Logon server: DNS Domain: contoso.com UPN: CONTOS-1$@contoso.com
[3] Logon session 00000000:000003e5: User name: NT AUTHORITY\LOCAL SERVICE Auth package: Negotiate Logon type: Service Session: 0 Sid: S-1-5-19 Logon time: 10/10/2012 12:04:26 Logon server: DNS Domain: UPN:
The first entry [0] is LocalSystem using Kerberos. Next [1] is NTLM authentication. Then [2] is NetworkService and lastly [3] is LocalService which has no ability to authenticate. So the logon session ID we want to target is 0x3e7 which I’ve highlighted above.
We then ran klist tickets –li 0x3e7 on the Mailbox server to view the Kerberos service tickets held by the LocalSystem logon identity. This service will need a Kerberos ticket which is valid on the Hub Transport server. There was indeed a service ticket which was valid on the Hub Transport server (i.e. the encryption type AES256) was relevant as all Exchange servers are running on Windows Server 2008 SP2, the valid date range was correct and the clocks were in sync. So everything looks OK and the Mailbox server should be able to authenticate with the NetworkService logon session on the Hub Transport server. But it can’t. Why? Because the key which NetworkService on the Hub Transport servers should have been able to use to decrypt the incoming authentication message (the Kerberos service ticket) from the Mailbox server was broken for NetworkService, as explained here:
http://support.microsoft.com/kb/2566059
The domain functional level was at Windows Server 2003 and there was 1 2003 DC remaining in the domain, meaning that the pre-authentication key is encrypted using RC4 as newer AES128 and AES256 are not understood by 2003 DCs. When the first Windows Server 2008 member servers were added, they were these Exchange 2007 servers. The 2003 DCs started logging errors each time one of these 2008 clients requested a TGT or a Service Ticket because they would request it as AES256, which the 2003 OS didn’t understand. It would then negotiate down to RC4 and just work. In the mean time the 2003 DCs logged an error in the System event log about not understanding AES256.
As a workaround to prevent the errors from filling the event logs on the 2003 DCs and from filling the monitoring application window, they implemented a reg key on the Exchange servers to force them to always request RC4 encrypted tickets. They found this hint on a 3rd party user forum site.
We removed this key on the Mailbox servers:
HKLM\System\CurrentControlSet\Control\Lsa\Kerberos\Parameters\DefaultEncryptionType
We then removed all the Kerberos tickets which were cached on the Mailbox server using this command:
klist purge –li 0x3e7
And we then verified that this hotfix was installed on the remaining 2003 DCs so that they wouldn’t log the errors which flooded the event viewer causing them to implement the key we removed:
http://support.microsoft.com/kb/948963
We couldn’t install the hotfix mentioned in KB2566059 on the Mailbox servers as they were running on Windows Server 2008 SP2 and the hotfix was only built and released for Windows Server 2008 R2 as was not back-ported to Windows Server 2008 SP2. So the hotfix was not available to us.
As a final note, why did internet messages work? Well, those messages are coming from unauthenticated senders – on the internet. Messages travelling from one mailbox to another are coming from one authenticated user to another. So authentication must be working for mailbox-to-mailbox messaging. But unauthenticated messages from the internet just worked.
I hope this helps someone else in their troubleshooting the future.
I struck a problem at a custom and the impact, while it seemed minor on the surface, was actually a big deal for their migration project. In fact, the large team they had assembled to migrate users from one forest to a new forest had stopped while this issue was investigated.
It relates to SID History and the way Windows queries for and caches Name-to-SID and SID-to-Name lookups from AD. This cache was causing SharePoint to think that a user who wanted to logon was actually a user from the wrong domain, and would create that person a new identity for that person within SharePoint for them.
The scenario is actually very close to this one:
http://blogs.technet.com/b/rgullick/archive/2010/05/15/sharepoint-people-picker.aspx
But the workaround that we found would resolve the problem while they were migrating was pretty cool, so I thought I’d save it for all eternity here as a blog.
It boils down to this:
The LsaCache stores the previously looked-up domain user names and their SIDs. By asking a DC which has users that have both the new SID and the migrated SID on them at the same time, the DC always links the migrated SID to the new user name, not the old user name. If we can artificially fill the LsaCache with mappings for OLD USERNAME = OLD SID in our servers, then we can act as though no resources have migrated yet.
Here’s the scenario where users were migrated with SID History from child1.domainA.com to domainB.com
So we can see from the picture above that the LsaCache (the table in the bottom right of the drawing) has a mapping for NEW USERNAME = OLD SID but we want OLD USERNAME = OLD SID
So, let’s warm up the LsaCache so it looks the way we’d like it to:
Ah ha! Now our cache looks the way we’d like it, where OLD USERNAME = OLD SID. This way when a query for OLD SID is made, the result from cache will return OLD USERNAME.
The important step here is the red X where there IS NO STEP. What I mean is that the SharePoint server never talked to the DC to get the OLD SID lookup to return a result, meaning that we relied totally on the warmed up cache on the SPS alone.
This relies on the LsaCache on the SPS server ALWAYS having the entry for the SID from the CHILD1 domain matching the CHILD1 username, and never matching the DOMAINB username. The only way to ensure this is:
To view the actions as they are performed by LSA Lookups, add these 2 DWORDs to the registry under HKLM\System\CurrentControlSet\Control\Lsa\:
These keys are explained here:
http://technet.microsoft.com/en-us/library/ff428139(v=ws.10).aspx
So, all in all a little complicated, but the workaround to increase the value for LsaLookupCacheMaxSize and constantly running a script on the SPS server to query for the SID for usernames in CHILD1 (with a filter to target only users which had been migrated to domainB) worked well for the customer.
##############################
### UPDATE (22 March 2013) ###
The ADMX and ADML files for Windows 8 and Windows Server 2012 are now available as a separate download. This includes 185 ADMX files, and is the complete set of all ADMX files for these OSes. Please use this download instead of the instructions in this post to create your super-set of updated ADMX/ADML files.
http://www.microsoft.com/en-us/download/details.aspx?id=36991
A while back I posted something similar regarding upgrading the PolicyDefinitions folder in SYSVOL from Windows Vista and Windows Server 2008 set of ADMX/ADML files to their newer versions in Windows 7 and Windows Server 2008 R2. That post is here.
Well, it’s now time to move that on as Windows 8 and Windows Server 2012 are now out.
First off, all ADMX/ADML files have had their dates updated. While I didn’t look to see if all the contents of the files have changed, it’s probably best to assume every file has changed and update all of them.
One of them "(“InputPersonalization.admx”) has been removed since Windows 7. It controlled 1 setting, and this setting has been moved into the larger ControlPanel.admx. Meaning this admx/adml can be deleted once the newer ControlPanel.admx file is copied to the PolicyDefinitions folder.
Windows 8 and Windows Server 2012 offer a range of new features (he says putting it mildly), and there are new admx/adml files for these. So make sure you include these in your update
AppxPackageManager.admx AppXRuntime.admx DeviceCompat.admx DeviceSetup.admx EAIME.admx EdgeUI.admx EncryptFilesonMove.admx FileServerVSSAgent.admx FileServerVSSProvider.admx hotspotauth.admx LocationProviderAdm.admx msched.admx NCSI.admx NetworkIsolation.admx Printing2.admx Servicing.admx SettingSync.admx srm-fci.admx StartMenu.admx WCM.admx WinStoreUI.admx wlansvc.admx WPN.admx wwansvc.admx
As with the previous operating systems, there are some admx/adml files which exist on the server SKU which do not also exist on the client SKU, and vice versa:
adfs.admx FileServerVSSAgent.admx GroupPolicy-Server.admx MMCSnapIns2.admx NAPXPQec.admx PswdSync.admx Snis.admx TerminalServer-Server.admx WindowsServer.admx
DeviceRedirection.admx sdiagschd.admx
And the easy way to get all the possible ADMX/ADML files for a particular OS without having to install all the roles/features is to simply copy them out of the winsxs directory (replace en-US in the commands below if your OS is installed in a language other than English). Here is a sample set of commands which can do this for you. You’d need to run this on both a Windows 8 and Windows Server 2012 computers to capture all possible admx/adml files.
cd /d %windir%\winsxs dir *.admx /s /b > %USERPROFILE%\Desktop\admx.txt dir *.adml /s /b | find /i "en-us" > %USERPROFILE%\Desktop\adml_en-us.txt
mkdir %USERPROFILE%\Desktop\PolicyDefinitions mkdir %USERPROFILE%\Desktop\PolicyDefinitions\en-US FOR /F %i IN (%USERPROFILE%\Desktop\admx.txt) DO copy %i %USERPROFILE%\Desktop\PolicyDefinitions\ FOR /F %i IN (%USERPROFILE%\Desktop\adml_en-us.txt) DO copy %i %USERPROFILE%\Desktop\PolicyDefinitions\en-US\
I hope that helps you with your admx/adml upgrade.
This blog is about the ability in Windows 7 and Windows Server 2008 R2 to apply a SID to every scheduled task and use that SID to apply permissions elsewhere in the Operating System.
Services already have this feature from Vista and newer. The idea is the same; take the simple name for the service (or in that case of scheduled tasks in 7/R2 the path to the scheduled task) and compute a predictable SID based on that name. Have a look at the permissions applied to C:\Windows\System32\LogFiles\Firewall to see this in action. On the permissions of this folder, there is an ACE for a “group” called MpsSvc, which is the short name for the Windows Firewall service. In this way, even though the service is set to start as “Local Service”, not all other services which also run as this same account can see into the Firewall logs, only the firewall service itself has access.
So every scheduled task can have a SID computed for it – this new feature is described here:
http://msdn.microsoft.com/en-us/library/ee695875(v=vs.85).aspx
And the way to locate what the predicable SID for a given service name is to run:
schtasks /showsid /TN “TaskName”
With this SID, you can now assign permissions to resources. For example, you could use icacls to apply permission to a folder, below we are granting an NT TASK (SID starts with S-1-5-87) modify permission to the folder C:\SomeFolder:
icacls C:\SomeFolder /grant *S-1-5-87-xxxx-yyyy-zzzz:(M)
Now, if you go and setup your task in the GUI, and run these commands, you will see icacls report back:
No mapping between account names and security IDs was done.
What went wrong?
First you need to make sure that the scheduled task is configured for Windows 7 or Windows Server 2008 R2 and is using either “Network Service” or “Local Service”:
Then you need to make the task use the Unified Scheduling Engine so that it registers the SID with the list of “well known SIDs” for the system. But there is no check-box for this setting, and it is disabled by default. What to do?
Export your task as an XML file, locate the line which reads:
<UseUnifiedSchedulingEngine>False</UseUnifiedSchedulingEngine>
And change that “False” to “True”:
<UseUnifiedSchedulingEngine>True</UseUnifiedSchedulingEngine>
With that changed, remove your task and import the XML file you modified above.
Because the SID is a predicable calculation of the path to the task, so long as you recreate the task with the same name and in the same folder, the SID will remain the same and your icacls command will now work as expected, and only that scheduled task will have access to the file or folder to specify.
The Unified Scheduling Engine leverages the Unified Background Process Manager (UBPM), which is described further here:
http://blogs.technet.com/b/askperf/archive/2009/10/04/windows-7-windows-server-2008-r2-unified-background-process-manager-ubpm.aspx
Hi
This error code is a very generic output to a KMS client having problems activating. To view your output, run slmgr.vbs –ato
When troubleshooting this problem, we checked the following details – if any were a problem, they would generate this error code:
And the last one was the problem we hit: Our KMS server was Windows Server 2008 R2, just as the KMS clients. We’d crossed the threshold of 5 servers. But they still would not activate. The problem was that there are different license “channels” for Windows Server. They are described here: http://technet.microsoft.com/en-us/library/ff793411.aspx
Our servers which were having problems activating were all Windows Server 2008 R2 Datacenter Edition, and we had a “B Channel” KMS license installed on the KMS server.
We followed these steps on the KMS server to install the correct channel license:
After doing the above, we ran slmgr -ato on the Windows Server 2008 R2 Datacenter Edition servers. Note that “Channel C” is able to active all lower level channels.
Another quick post with a non-very-obvious solution, this time on a new Windows Server 2008 R2 cluster.
The case went like this:
The problem turned out to be that the built-in group “Authenticated Users” had been removed from the built-in group “Users” on the OS of each of the nodes. The customer didn’t want to add “Authenticated Users” back into this group as that would have granted too many accounts too many rights. The work-around we put in was to create a domain group and nest the newly created CNO into this group. This group was placed into the “Users” built in group on all the cluster nodes. In this way, the CNO now has membership in the built-in group “Users” on each of the nodes.
We needed to reboot all of the nodes before this change would take effect.
I hope this helps someone out there.
Normally in AD, all attributes are readable by “Authenticated Users”. Some attributes should inherit permissions, but should not be readable by “just anyone” To protect attributes like this, they can be marked as “confidential”.
There are 3 attributes relating BitLocker to which are marked in the schema as “confidential”.
This is done by marking the searchFlags attribute as enabled for bit 7 (128 decimal) in the schema where the attribute is defined. See here for more information on searchFlags: http://support.microsoft.com/kb/922836
These attributes are:
Attribute
Applies to Object
Used for
msTPM-OwnerInformation
computer
Contains the owner information of a computers TPM.
msFVE-KeyPackage
msFVE-RecoveryInformation
Contains a volumes BitLocker encryption key secured by the corresponding recovery password.
msFVE-RecoveryPassword
Contains a password that can recover a BitLocker-encrypted volume.
An object of type “msFVE-RecoveryInformation” is created for every encrypted volume and is stored as a sub-object of the computers object where the volume was encrypted.
Simply granting “read” access to these attributes will not allow a user to read the information in these attributes. A user who wants to read the attribute must also have an Access Mask for “Control_Access”. This is a special type of ACE (Access Control Entry). See here for more information on Access Masks: http://msdn.microsoft.com/en-us/library/aa374896(v=vs.85).aspx
The only GUI tool which can set and view these special Control_Access ACEs is LDP.exe (using the version from Windows Server 2003 R2 ADAM or newer). This is shown below:
The "Control_Access" flag is needed in ADDITION to the normal "Read Propery" right. The "Control_Access" flag gets you past the confidentiality bit. You still need to be able to read the contents of the attribute.
Apply the permission once at the top of EACH DOMAIN where you need to delegate access to the recovery information of BitLocker volumes. Usually this does not include forest root domains or resource forests. Ensure the “inheritance” box is checked on each ACE so that it propagates to every msFVE-RecoveryInformation or Computer object and only to its relevant attributes.
(Note from Ryans comment below: You can aply this permission anywhere in the OU structure if you'd like to split the delegation bewteen groups - e.g. Help Desk users can access the keys for Standard Workstations and the Server Admins can access the keys for servers etc. You could apply the "read propery" ACE at the top of the domain to a super-group for everyone who is allowed to access the keys, and then have different groups able to use the "Control_Access" flags for their particular OUs. This will help limit ACE bloat in lsass.exe working set while still locking down the keys in the way you'd expect.)
Here are sample scripts to add the "Control_Access" flag to the top of the domain:
Taken from: http://technet.microsoft.com/en-us/library/cc771778(WS.10).aspx
'To refer to other groups, change the group name (ex: change to "DOMAIN\Help Desk Staff")
strGroupName = "BitLocker Recoverers"
' -----------------------------------------------------------
' Access Control Entry (ACE) constants
'- From the ADS_ACETYPE_ENUM enumeration
Const ADS_ACETYPE_ACCESS_ALLOWED_OBJECT = &H5 'Allows an object to do something
'- From the ADS_ACEFLAG_ENUM enumeration
Const ADS_ACEFLAG_INHERIT_ACE = &H2 'ACE applies to target and inherited child objects
Const ADS_ACEFLAG_INHERIT_ONLY_ACE = &H8 'ACE does NOT apply to target (parent) object
'- From the ADS_RIGHTS_ENUM enumeration
Const ADS_RIGHT_DS_CONTROL_ACCESS = &H100 'The right to view confidential attributes
Const ADS_RIGHT_DS_READ_PROP = &H10 ' The right to read attribute values
'- From the ADS_FLAGTYPE_ENUM enumeration
Const ADS_FLAG_OBJECT_TYPE_PRESENT = &H1 'Target object type is present in the ACE
Const ADS_FLAG_INHERITED_OBJECT_TYPE_PRESENT = &H2 'Target inherited object type is present in the ACE
' BitLocker schema object GUID's
'- ms-FVE-RecoveryInformation object:
' includes the BitLocker recovery password and key package attributes
SCHEMA_GUID_MS_FVE_RECOVERYINFORMATION = "{EA715D30-8F53-40D0-BD1E-6109186D782C}"
'- ms-FVE-RecoveryPassword attribute: 48-digit numerical password
SCHEMA_GUID_MS_FVE_RECOVERYPASSWORD = "{43061AC1-C8AD-4CCC-B785-2BFAC20FC60A}"
'- ms-FVE-KeyPackage attribute: binary package for repairing damages
SCHEMA_GUID_MS_FVE_KEYPACKAGE = "{1FD55EA8-88A7-47DC-8129-0DAA97186A54}"
'- Computer object
SCHEMA_GUID_COMPUTER = "{BF967A86-0DE6-11D0-A285-00AA003049E2}"
'Reference: "Platform SDK: Active Directory Schema"
' Set up the ACE to allow reading of all BitLocker recovery information properties
Set objAce1 = createObject("AccessControlEntry")
objAce1.AceFlags = ADS_ACEFLAG_INHERIT_ACE + ADS_ACEFLAG_INHERIT_ONLY_ACE
objAce1.AceType = ADS_ACETYPE_ACCESS_ALLOWED_OBJECT
objAce1.Flags = ADS_FLAG_INHERITED_OBJECT_TYPE_PRESENT
objAce1.Trustee = strGroupName
objAce1.AccessMask = ADS_RIGHT_DS_CONTROL_ACCESS + ADS_RIGHT_DS_READ_PROP
objAce1.InheritedObjectType = SCHEMA_GUID_MS_FVE_RECOVERYINFORMATION
' Note: ObjectType is left blank above to allow reading of all properties
' Connect to Discretional ACL (DACL) for domain object
Set objRootLDAP = GetObject("LDAP://rootDSE")
strPathToDomain = "LDAP://" & objRootLDAP.Get("defaultNamingContext") ' e.g. string dc=fabrikam,dc=com
Set objDomain = GetObject(strPathToDomain)
WScript.Echo "Accessing object: " + objDomain.Get("distinguishedName")
Set objDescriptor = objDomain.Get("ntSecurityDescriptor")
Set objDacl = objDescriptor.DiscretionaryAcl
' Add the ACEs to the Discretionary ACL (DACL) and set the DACL
objDacl.AddAce objAce1
objDescriptor.DiscretionaryAcl = objDacl
objDomain.Put "ntSecurityDescriptor", Array(objDescriptor)
objDomain.SetInfo
WScript.Echo "SUCCESS!"
'To refer to other groups, change the group name (ex: change to "DOMAIN\TPM Owners")
strGroupName = "TPM Owners"
' ------------------------------------------------------------
' TPM and FVE schema object GUID's
'- ms-TPM-OwnerInformation attribute: SHA-1 hash of the TPM owner password
SCHEMA_GUID_MS_TPM_OWNERINFORMATION = "{AA4E1A6D-550D-4E05-8C35-4AFCB917A9FE}"
' Set up the ACE to allow reading of TPM owner information
objAce1.Flags = ADS_FLAG_OBJECT_TYPE_PRESENT + ADS_FLAG_INHERITED_OBJECT_TYPE_PRESENT
objAce1.ObjectType = SCHEMA_GUID_MS_TPM_OWNERINFORMATION
objAce1.InheritedObjectType = SCHEMA_GUID_COMPUTER
And this script can help pull the assigned ACEs out to show you who has been delegated access: http://gallery.technet.microsoft.com/ScriptCenter/0bd4af9e-968a-4ae6-9950-2b2450afda37/
I faced this problem recently at a customer.
They had pure Windows XP with Office 2003 deployed to their clients. These clients were accessing a SharePoint 2003 site. When they started deploying new Windows 7 clients with Office 2007 they found that when the users clicked on links to Office files which they had read-only permissions to, they would get prompted to enter credentials. But entering credentials doesn’t work. If they hit cancel or escape, the prompt would disappear and the file would open as expected.
Being a good PFE, the first place I started was with Network Monitor traces. I was looking for any strange “access denied” messages, authentication attempts with mismatched methods, bad HTTP redirections, DNS problems, that sort of thing.
Here’s what I found:
So what is going on?
WebClient is trying to take a write lock on the file. But the file is read-only to the user, so this fails. We see 4 requests to GET the file, each one has a reply which says “unauthorized”:
Then I found this article:
http://support.microsoft.com/kb/955375
This says that by setting the registry value UseWinINETCache = 1 you will instruct Office to always open web-based files as read-only. If you need to edit the file on a SharePoint site, these will be opened as read-only also, so this will fail. To work-around this limitation you must do one of the following when editing a file:
Note this limit applies to ALL web-based files opened by Office, even those on SharePoint 2007 and 2010, which do not experience this problem. Therefore, this is only a work-around until you are able to upgrade your SharePoint 2003 sites to 2007 or 2010. Note that Internet Explorer 8 is NOT a supported browser when accessing SharePoint 2003, for this reason and others.
Hi, another juicy customer question with a cool solution. The problem is this: on all workstations, the built-in Administrator account is disabled. Restriction groups are used to populate the group “Built-in\Administrators” with domain groups. No “back-door” local administrator accounts exist. So, when the desktop support team is trying to troubleshoot connectivity problems with machines, they may remove the computer from the domain. Once they do this, there is no way to log back on. Things were complicated even further as the workstations involved had their install partition encrypted with BitLocker, meaning any data on the workstation is also lost.
So, without exposing a back-door account, we needed a way to prevent the workstation administrator from removing a workstation from the domain, but not in a “permanent” way – slow them down in such a way that undoing the prevention should remind them to add a temporary backdoor account before removing from the domain. And here’s what we came up with:
We ran Sysinternals Process Monitor while removing the computer from the domain to see what changes are made to the system. If we can set a “deny” to the first action, then all the other actions will also be prevented from happening.
The first change that happens to the system during a removal from the domain is to set the “Netlogon”service to start-up as “manual” instead of “automatic”. So by setting a deny in the registry for the user “SYSTEM”, we prevent the first action. But the remove from the domain process is clever, and detects if it cannot change the value it will attempt to take ownership of the registry key. So we also need to prevent modifying the owner. All of that looks like this:
So when the workstation administrator attempts to remove the computer from the domain, they hit an immediate “access denied” and all further processing stops. Now the administrator must remove the explicit denies and that will hopefully be enough of a reminder to add a local administrator before rebooting the machine after removing it from the domain.
Please note that this is not a supported or recommended method to perform this job, but it did fit well for the customer who was in that tricky situation.
I had a question for a customer recently which needed some investigation, as the seemingly “easy steps” to export and import DFSN configurations didn’t do what either of us expected.
KB969382 lists the actions to take in the event of your DFS Namespace going west. Option 2 was the one we were looking at as we wanted to create regular DFS-N backups to be used in any DFS-N related emergency.
It seemed simple enough, run this command to backup your configuration:
dfsutil root export \\domain.name\DFSN DFSN-root.txt
And when disaster strikes, just run this command to put it all back again:
dfsutil root import set DFSN-Root.txt \\domain.name\DFSN
However, no matter what the DFS-N emergency we created in the lab, the import would always fail citing “element not found”.
The problem was that we were breaking the DFS-N root (on purpose), but the export/import scenario requires you to have a working DFS-N root. And to get that, you’d need good system-state backups of both a DC and a DFS Namespace server. Which isn’t going to provide for a fast, efficient restore scenario in a large organisation.
So I started experimenting, and it seems that the objects in AD are easily copied and imported again using ldifde – there is no attachment to the object GUIDs (like there is say in a failover cluster). And once all the objects are back in AD, all the links and targets start working again as expected.
The same applies to the share and DFSN root information in the registry – a simple ‘reg save’ followed by a ‘reg restore’ will get that information back with the registry ACLs in tact.
So, I wrote 2 scripts (each fires of a second script to run directly on the DFS Namespace servers):
Now, while the restore could be more targeted to allow you to chose the scenario to recover from (e.g. restore ONLY the objects in AD or DFS-N registry information on just 1 DFS-N server, or only one DFS-N root), I’ll leave it to you good reader to add that intelligence. This restore script restore the entire DFS-N configuration for all roots and to all DFS-N servers.
This will backup and restore both Windows 2000/2003 roots and Windows Server 2008 roots. It uses psexec from Sysinternals, available here. The reason it does this is to use reg save/reg restore, which capture ACLs on the registry keys and restores exactly the configuration which was backed up, rather than merging the configuration. While my testing shows that these reg keys do not have explicit permissions defined, you’re better safe than sorry.
Make sure and change any instances of “dc=domain,DC=name” and “\\domain.name” to the domain name in your environment.
Main Job
rem Setup input file
if not exist .\backupfiles mkdir .\backup-files if exist root-servers.txt del root-servers.txt
setlocal
if exist servers.txt del servers.txt dsquery * "CN=DFS-Configuration,CN=System,DC=domain,dc=name" -filter "(|(objectClass=fTDfs)(objectClass=msDFS-Namespacev2))" -attr name > allRoots.txt for /F "tokens=1-3 skip=1 delims= " %%i IN (allRoots.txt) DO ( dfsutil root \\domain.name\%%i | find /i "target" | find /i "%%i" >> %%i-serversRAW.txt for /F "tokens=2 delims=\" %%u IN (%%i-serversRAW.txt) do echo %%i;%%u >> root-servers.txt del %%i-serversRAW.txt )
for /F "tokens=1,2 delims=; " %%i IN (root-servers.txt) DO echo %%j>> serversRAW.txt sort serversRAW.txt /O serversSORTED.txt for /F "Tokens=*" %%s in ('type serversSORTED.txt') do set record=%%s&call :output
del serversRAW.txt del serversSORTED.txt
endlocal
rem Backup
for /F %%i IN (servers.txt) DO ( if not exist \\%%i\c$\temp mkdir \\c$\temp copy .\NSserverBackup.bat \\%%i\c$\temp /Y psexec \\%%i C:\temp\NSserverBackup.bat copy \\%%i\c$\TEMP\%%i-dfsroots.hiv .\backup-files\%%i-dfsroots.hiv /Y copy \\%%i\c$\TEMP\%%i-CCS-shares.hiv .\backup-files\%%i-CCS-shares.hiv /Y copy \\%%i\c$\TEMP\%%i-CS1-shares.hiv .\backup-files\%%i-CS1-shares.hiv /Y )
ldifde -f .\backup-files\dfs-export.ldf -v -d "CN=Dfs-Configuration,CN=System,DC=domain,dc=name" -l objectClass,remoteServerName,pKTGuid,pKT,msDFS-SchemaMajorVersion,msDFS-SchemaMinorVersion,msDFS-GenerationGUIDv2,msDFS-NamespaceIdentityGUIDv2,msDFS-LastModifiedv2,msDFS-Propertiesv2,msDFS-TargetListv2,msDFS-Ttlv2,msDFS-LinkPathv2,msDFS-LinkSecurityDescriptorv2,msDFS-Ttlv2,msDFS-Commentv2,msDFS-ShortNameLinkPathv2,msDFS-LinkIdentityGUIDv2 > .\backup-files\ldf-export.log
goto :EOF
:output if not defined previous_record goto write if "%record%" EQU "%previous_record%" goto :EOF
:write @echo %record%>>servers.txt set previous_record=%record%
NSserverBackup.bat
C: cd \ cd temp
reg save HKLM\Software\Microsoft\Windows\DFS\Roots C:\TEMP\%COMPUTERNAME%-dfsroots.hiv /y reg save HKLM\System\CurrentControlSet\Services\lanmanserver\shares C:\temp\%COMPUTERNAME%-CCS-shares.hiv /y reg save HKLM\System\ControlSet001\Services\lanmanserver\shares C:\temp\%COMPUTERNAME%-CS1-shares.hiv /y
The main backup job copies NSserverBackup.bat to each Namespace server and runs it from there.
rem Check input files
if not exist allRoots.txt goto :EOF if not exist servers.txt goto :EOF
rem clean up before restore
dsquery * "CN=DFS-Configuration,CN=System,DC=DC=domain,dc=name" -filter "(|(objectClass=fTDfs)(objectClass=msDFS-NamespaceAnchor))" | dsrm -q -subtree -noprompt for /F %%i IN (servers.txt) DO ( reg delete \\%%i\HKLM\Software\Microsoft\Windows\DFS\Roots /f reg delete \\%%i\HKLM\System\CurrentControlSet\Services\lanmanserver\shares /f reg delete \\%%i\HKLM\System\ControlSet001\Services\lanmanserver\shares /f reg add \\%%i\HKLM\Software\Microsoft\Windows\DFS\Roots /f reg add \\%%i\HKLM\System\CurrentControlSet\Services\lanmanserver\shares /f reg add \\%%i\HKLM\System\ControlSet001\Services\lanmanserver\shares /f )
rem restore
ldifde -I -f .\backup-files\dfs-export.ldf -k -v > .\backup-files\dfs-import.log for /F %%i IN (servers.txt) DO ( copy .\backup-files\%%i-dfsroots.hiv \\%%i\c$\temp\%%i-dfsroots.hiv /Y copy .\backup-files\%%i-CCS-shares.hiv \\%%i\c$\temp\%%i-CCS-shares.hiv /Y copy .\backup-files\%%i-CS1-shares.hiv \\%%i\c$\temp\%%i-CS1-shares.hiv /Y copy .\NSserverRestore.bat \\%%i\c$\temp\NSserverRestore.bat /Y copy .\allRoots.txt \\%%i\c$\temp\allRoots.txt /Y ) psexec @servers.txt C:\temp\NSserverRestore.bat
NSserverRestore.bat
reg restore HKLM\Software\Microsoft\DFS\Roots C:\temp\%COMPUTERNAME%-dfsroots.hiv reg restore HLLM\System\CurrentControlSet\Services\lanmanserver\shares C:\temp\%COMPUTERNAME-CSS-shares.hiv reg restore HLLM\System\ControlSet001\Services\lanmanserver\shares C:\temp\%COMPUTERNAME-CS1-shares.hiv for /F "tokens=1-3 skip=1 delims= " %%i IN (allRoots.txt) DO dfsutil root forcesync \\domain.name\%%i net stop dfs && net start dfs
The main backup job copies NSserverRestore.bat to each Namespace server and runs it from there.
I had a question that I thought I would share the answer for.
A customer was deploying multiple identical servers with multiple NIC into a testing lab as virtual machines. They needed a way to beat the plug and play detection of NIC cards so that they could set the correct static IP for the NIC which is “patched” to a virtual NIC port. The only static information they could use was giving all the identical, and isolated VMs the same MAC addresses from within Hyper-V
In each identical VM (VM Guest 1, 2, 3 in the picture below), there are 4 NICs. 1 NIC is enabled for DHCP with a Hyper-V dynamic MAC address. The other 3 NICs have 1 of 3 known MAC addresses. The 3 NICs with known static MAC addresses all need static IP addresses. All the servers which share static MAC addresses must also share static IP addresses. And the name of NIC must be changed to make it clear in the VMs installation of RRAS (and to the administrators) which NIC is patched to which Hyper-V network. In this way the servers have identical, non-overlapping networks for administrators to test on – and one additional network where all the VMs can contact each other for sharing files.
The routine was this:
wmic nicconfig where MACAddress=”00:12:34:56:78:9A” call EnableStatic ("1.2.3.4"), ("255.255.255.0")
wmic /output:NICNameUNICODE.txt nic where MACAddress="00:12:34:56:78:9A" get NetConnectionID /FORMAT:LIST
type NICNameUNICODE.txt > NICName.txt
for /F "skip=2 tokens=1,2 delims==" %%i IN (NICName.txt) do netsh interface set interface name="%%j" newname="Some Name"
We have to output WMIC to the text file instead of piping as it pipes with a <CR> at the end of each line, instead of a <CRLF>, which breaks the coming FOR /F command.
But WMIC saves the resulting file as Unicode format, which the FOR /F cannot read, so we run this through TYPE to get the output formatted as UNICODE.
The resulting NICName.txt looks like this:
There was also 1 additional NIC installed which did not have a static MAC address assigned by Hyper-V, and was enabled for DHCP. This NIC also needed renaming:
wmic /output:DHCPNameUNICODE.txt nic where “MACAddress!=’00:12:34:56:78:9A AND MAC!=’00:12:34:56:78:9B’ AND MACAddress!=’00:12:34:56:78:9C’ AND AdapterType=’Ethernet 802.3’" get NetConnectionID /FORMAT:LIST
type DHPNameUNICODE.txt > DHCPName.txt
for /F "skip=2 tokens=1,2 delims==" %%i IN (DHCPName.txt) do netsh interface set interface name="%%j" newname="DHCP LAN"
I hope this helps someone one day with their deployments.
If you enable /3GB in the boot.ini of a Windows Server 2003 x86 server, you risk running out of address space for the kernel.
You can tweak this by adding the switch /USERVA=wxyz where wxyz is the number of megabytes that should be allocated to the user mode processes. This will give more address space back to the kernel.
But how should you choose the correct value for /USERVA?
Here's the easiest way. This doesn't involve pool monitoring applications, debugging tools or enabling Free System Page Table Entries (FSPTEs) tracking registry keys (trackPTEs).
[Boot Loader]
Timeout=5
Default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[Operating Systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows Server 2003, Enterprise" /fastdetect /NoExecute=OptOut /3GB
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows Server 2003, Enterprise without 3GB" /fastdetect /NoExecute=OptOut
How to calculate the right value for /USERVA
Take your value for FSPTEs you got when you rebooted with only /3GB defined in the boot.ini. In our example we'll use 6,400 for the number of FSPTEs.
6,400 is a counter of the number of 4KB memory "blocks" are available to the kernel. So the kernel has 6,400 * 4KB = 25,600 KB = 25MB of free address space to use.
(We need to reboot the system, as we don't know how many PTEs are used by the system load all the required kernel resources when /3GB is specified as this changes the sizes of NPP and PP memory)
FSPTEs must be greater than 10,000 at all times. So let's just make that 15,000 to ensure that future changes to required kernel resources (e.g. new video or network drivers) will not fail.
15,000 * 4KB = 59MB
In this example we have 25MB of free kernel memory and need 59MB, so we must add 59-25 = 34MB
So we need to set USERVA to be 3072MB – 34MB = 3038
So we add the switch /USERVA=3038 to boot.ini
This will give every user-mode application 3038MB of address space, and give the kernel the 59MB of free address space it needs to be happy.
This means that USERVA and FSPTEs are directly related to each other by a factor of 4KB/4*1024 = 256
But will there be any other negative impact to any other critical kernel resource (like Pool of Non-Paged memory or Pool of Paged Memory)?
In my tests, there will be no change in these resources, and this whitepaper confirms that adding or tuning /USERVA will only have an impact on the count of FSPTEs:
http://www.microsoft.com/downloads/details.aspx?FamilyID=ed0e8084-abf7-4c00-ba6a-7d658cdb052a&DisplayLang=en
With the "/USERVA" boot.ini switch, you can customize how the memory is allocated when you use the /3GB switch. The number following /Userva= is the amount of virtual memory address space in megabytes (MB) that will be allocated to each user process. If you set /3gb /Userva=3030 in Boot.ini, 3,030 MB of memory is reserved to the process space, as compared to 3,072 MB when you use the /3GB switch alone. The 42 MB that is saved when you set /Userva=3030 is used to increase the kernel memory space and free system page table entries (PTEs). The PTE memory pool is increased by the difference between 3 GB (specified by the /3GB switch) and the value that is assigned to the /Userva switch. There is no reduction in any other kernel resource as a result of this switch.
Here's the results I found when changing the USERVA on a Windows Server 2003 SP2 x86 server with 3582MB RAM (4GB inserted, but the video card claimed 400MB at power-on):
/3GB State
/USERVA Value
Free System Page Table Entries
Free Kernel Memory
OFF
175,000
684MB
ON
19,900 (1)
78MB
3030
30,700
120MB
2900
64,000
250MB
2800
89,800 (2)
351MB
2650
128,000
500MB
2500
166,500
650MB
2466
(1) = This is the lowest value for FSPTEs on this system, which is higher than 15,000, so no action is needed for this server.
(2) = Exchange mailbox servers should never have a /USERVA lower than 2800 as this will cause problems for store.exe
Below: Vertical Axis = Free System Page Table Entries, Horizontal Axis = Value for USERVA
I had a question from a customer and thought I’d share the answer with everyone. They asked “I want to upgrade our Central Store of ADMX/ADML files for Group Policy from Windows Vista SP2/Windows Server 2008 SP2 to Windows 7/Windows Server 2008 R2. What do we need to worry about?”. So I redirected them to this blog:
http://blogs.technet.com/b/askds/archive/2009/12/09/windows-7-windows-server-2008-r2-and-the-group-policy-central-store.aspx
But we found that there were differences between the ADMX files available in C:\Windows\PolicyDefinitions on Windows 7 and Windows Server 2008 R2. One such difference is highlighted here:
http://blogs.technet.com/b/askds/archive/2008/07/18/enabling-group-policy-preferences-debug-logging-using-the-rsat.aspx
I wondered if there were more differences, so I went through all of the ADMX files of:
Here are the results:
Here is a list of all Files in PolicyDefinitions folder on collected from both Windows 7 and Windows Server 2008 R2 Server (with every role and feature installed) and their dates and sizes:
10-06-2009 23:04 4,717 ActiveXInstallService.admx
10-06-2009 22:53 4,714 AddRemovePrograms.admx
10-06-2009 22:49 1,249 adfs.admx
10-06-2009 22:30 5,393 AppCompat.admx
10-06-2009 22:36 5,965 AttachmentManager.admx
10-06-2009 22:53 3,391 AutoPlay.admx
10-06-2009 22:52 2,968 Biometrics.admx
10-06-2009 22:53 49,181 Bits.admx
10-06-2009 23:01 1,749 CEIPEnable.admx
10-06-2009 22:53 1,361 CipherSuiteOrder.admx
10-06-2009 22:43 1,329 COM.admx
10-06-2009 22:42 13,967 Conf.admx
10-06-2009 22:53 2,600 ControlPanel.admx
10-06-2009 22:53 10,099 ControlPanelDisplay.admx
10-06-2009 22:53 1,293 Cpls.admx
10-06-2009 22:53 1,933 CredentialProviders.admx
10-06-2009 23:00 10,779 CredSsp.admx
10-06-2009 22:53 1,746 CredUI.admx
10-06-2009 23:04 2,141 CtrlAltDel.admx
10-06-2009 22:43 2,437 DCOM.admx
10-06-2009 22:53 13,576 Desktop.admx
10-06-2009 23:07 18,551 DeviceInstallation.admx
10/06/2009 22:50 2,391 DeviceRedirection.admx
10-06-2009 22:59 1,093 DFS.admx
10-06-2009 22:37 1,992 DigitalLocker.admx
10-06-2009 22:52 3,034 DiskDiagnostic.admx
10-06-2009 23:08 2,758 DiskNVCache.admx
10-06-2009 22:38 6,123 DiskQuota.admx
10-06-2009 22:54 989 DistributedLinkTracking.admx
10-06-2009 22:30 10,290 DnsClient.admx
10-06-2009 23:01 7,656 DWM.admx
10-06-2009 22:53 962 EncryptFilesonMove.admx
10-06-2009 22:40 5,097 EnhancedStorage.admx
10-06-2009 23:01 21,737 ErrorReporting.admx
10-06-2009 22:56 1,996 EventForwarding.admx
10-06-2009 22:56 12,429 EventLog.admx
10-06-2009 22:58 2,528 EventViewer.admx
10-06-2009 22:53 3,836 Explorer.admx
10-06-2009 22:51 2,141 FileRecovery.admx
10-06-2009 22:38 6,172 FileSys.admx
10-06-2009 22:45 2,342 FolderRedirection.admx
10-06-2009 22:53 1,517 FramePanes.admx
10-06-2009 22:52 2,229 fthsvc.admx
10-06-2009 22:38 2,256 GameExplorer.admx
10-06-2009 23:10 26,800 Globalization.admx
10-06-2009 22:42 1,485 GroupPolicy-Server.admx
10-06-2009 22:42 23,507 GroupPolicy.admx
10-06-2009 22:42 100,025 GroupPolicyPreferences.admx
10-06-2009 22:40 2,647 Help.admx
10-06-2009 22:40 2,830 HelpAndSupport.admx
10-06-2009 22:37 1,701 HotStart.admx
10-06-2009 22:44 32,865 ICM.admx
10-06-2009 22:43 1,243 IIS.admx
10-06-2009 22:48 3,076,705 inetres.admx
10-06-2009 23:08 1,787 InkWatson.admx
10-06-2009 23:08 3,327 InputPersonalization.admx
10-06-2009 22:41 6,868 iSCSI.admx
10-06-2009 23:01 1,980 kdc.admx
10-06-2009 23:01 3,709 Kerberos.admx
10-06-2009 23:02 1,912 LanmanServer.admx
10-06-2009 22:52 2,205 LeakDiagnostic.admx
10-06-2009 22:39 3,681 LinkLayerTopologyDiscovery.admx
10-06-2009 22:44 7,130 Logon.admx
10-06-2009 23:01 1,786 MediaCenter.admx
10-06-2009 22:31 3,580 MMC.admx
10-06-2009 22:42 56,928 MMCSnapins.admx
10-06-2009 22:42 6,994 MMCSnapIns2.admx
10-06-2009 22:37 1,890 MobilePCMobilityCenter.admx
10-06-2009 22:37 1,986 MobilePCPresentationSettings.admx
10-06-2009 22:49 3,626 MSDT.admx
10-06-2009 22:52 2,147 Msi-FileRecovery.admx
10-06-2009 22:40 16,466 MSI.admx
10-06-2009 22:58 1,298 NAPXPQec.admx
10-06-2009 22:34 3,615 NCSI.admx
10-06-2009 22:47 17,738 Netlogon.admx
10-06-2009 22:31 17,024 NetworkConnections.admx
10-06-2009 22:52 2,443 NetworkProjection.admx
10-06-2009 23:01 25,505 OfflineFiles.admx
10-06-2009 22:54 8,498 P2P-pnrp.admx
10-06-2009 22:44 1,381 ParentalControls.admx
10-06-2009 22:46 9,071 pca.admx
10-06-2009 22:56 3,648 PeerToPeerCaching.admx
10-06-2009 23:08 1,773 PenTraining.admx
10-06-2009 22:33 2,292 PerfCenterCPL.admx
10-06-2009 23:07 7,555 PerformanceDiagnostics.admx
10-06-2009 23:07 1,939 PerformancePerftrack.admx
10-06-2009 23:08 35,966 Power.admx
10-06-2009 22:41 2,029 PowerShellExecutionPolicy.admx
10-06-2009 22:44 6,901 PreviousVersions.admx
10-06-2009 23:01 30,822 Printing.admx
10-06-2009 22:53 3,239 Programs.admx
10-06-2009 23:08 3,344 PswdSync.admx
10-06-2009 22:50 13,257 QOS.admx
10-06-2009 23:08 1,273 RacWmiProv.admx
10-06-2009 22:52 1,972 Radar.admx
10-06-2009 22:52 1,236 ReAgent.admx
10-06-2009 22:57 3,722 Reliability.admx
10-06-2009 22:51 7,150 RemoteAssistance.admx
10-06-2009 23:07 23,268 RemovableStorage.admx
10-06-2009 22:53 6,292 RPC.admx
10-06-2009 22:42 6,991 Scripts.admx
10-06-2009 22:48 2,519 sdiageng.admx
10/06/2009 22:49 2,027 sdiagschd.admx
10-06-2009 22:34 43,882 Search.admx
10-06-2009 23:08 11,602 SearchOCR.admx
10-06-2009 23:01 1,370 Securitycenter.admx
10-06-2009 22:34 3,888 Sensors.admx
10-06-2009 22:48 3,334 ServerManager.admx
10-06-2009 23:04 1,588 Setup.admx
10/06/2009 23:08 1,187 ShapeCollector.admx
10-06-2009 22:54 1,634 SharedFolders.admx
10-06-2009 22:53 1,985 Sharing.admx
10-06-2009 22:53 3,466 Shell-CommandPrompt-RegEditTools.admx
10-06-2009 22:53 1,157 ShellWelcomeCenter.admx
10-06-2009 22:58 5,039 Sidebar.admx
10-06-2009 22:31 7,397 Sideshow.admx
10-06-2009 23:03 9,691 Smartcard.admx
10-06-2009 23:08 2,057 Snis.admx
10-06-2009 23:00 2,307 Snmp.admx
10-06-2009 23:01 1,943 SoundRec.admx
10-06-2009 22:53 25,663 StartMenu.admx
10-06-2009 23:01 2,833 SystemResourceManager.admx
10-06-2009 23:08 1,716 SystemRestore.admx
10-06-2009 22:46 12,737 TabletPCInputPanel.admx
10-06-2009 23:08 12,313 TabletShell.admx
10-06-2009 22:53 9,365 Taskbar.admx
10-06-2009 22:58 5,520 TaskScheduler.admx
10-06-2009 22:49 10,059 tcpip.admx
10-06-2009 22:39 17,774 TerminalServer-Server.admx
04/11/2010 17:56 83,116 TerminalServer.admx
10-06-2009 22:53 2,352 Thumbnails.admx
10-06-2009 23:05 2,726 TouchInput.admx
10-06-2009 23:04 3,409 TPM.admx
10-06-2009 23:08 8,101 UserDataBackup.admx
10-06-2009 22:56 15,021 UserProfiles.admx
10-06-2009 23:04 40,554 VolumeEncryption.admx
10-06-2009 23:04 6,277 W32Time.admx
10-06-2009 22:49 2,512 WDI.admx
10-06-2009 22:52 1,768 WinCal.admx
10-06-2009 22:42 14,532 Windows.admx
10-06-2009 22:53 1,265 WindowsAnytimeUpgrade.admx
10-06-2009 23:08 3,702 WindowsBackup.admx
10-06-2009 22:45 2,024 WindowsColorSystem.admx
10-06-2009 22:39 4,085 WindowsConnectNow.admx
10-06-2009 23:04 5,115 WindowsDefender.admx
10-06-2009 22:53 35,942 WindowsExplorer.admx
10-06-2009 23:08 3,000 WindowsFileProtection.admx
10-06-2009 22:45 27,019 WindowsFirewall.admx
10-06-2009 22:46 2,767 WindowsMail.admx
10-06-2009 23:01 1,254 WindowsMediaDRM.admx
10-06-2009 23:01 22,974 WindowsMediaPlayer.admx
10-06-2009 22:44 2,903 WindowsMessenger.admx
10-06-2009 22:42 7,203 WindowsProducts.admx
10-06-2009 23:00 9,878 WindowsRemoteManagement.admx
10-06-2009 23:00 4,338 WindowsRemoteShell.admx
10-06-2009 22:42 1,314 WindowsServer.admx
10-06-2009 22:59 19,272 WindowsUpdate.admx
10-06-2009 23:04 1,955 WinInit.admx
10-06-2009 23:04 5,237 WinLogon.admx
10-06-2009 22:42 1,342 Winsrv.admx
10-06-2009 22:53 1,406 WordWheel.admx
160 Files
I was working on a case with a customer for something that was too weird to ignore.
We wanted to use DNS Suffix Search Orders on the clients so that clients could query using short names for servers in DNS domains which weren’t their own.
e.g. A PC in the domain child-dom-1.corp.contoso.com wanted to ping the short name “serverX”.
ServerX had registered its name in the DNS zone matching its primary DNS Suffix: child-dom-2.corp.contoso.com
So the answer is to set the DNS Suffix Search Order list. Prior to this the customer had configured the DNS zone child-dom2.corp.contoso.com to use WINS forwarders, pointing to a WINS server which serverX was also using. But WINS was on the way out (see the previous blog for those details on how to decommission WINS).
DNS Suffix Search Order is configured on the properties of the NIC or in Group Policies (for all NICs):
We tried both methods, but it was not able to resolve names in any domain except the Primary DNS Suffix domain.
Once there is a DNS Suffix Search Oder list defined, Windows must use that list over the single, Primary DNS Suffix. So what was going on?
When we ran nslookup and set debug=2 we could see that queries for a non-existent host (e.g. mickeymouse) would reply back with a SUCCESS message for the A record, but no IP address in the answer.
The solution:
In the zone child-dom-1.corp.contoso.com there was a record called * with a type of MX. This record makes requests for ALL types of other records (A, AAAA, CNAME etc) succeed. And because the DNS client was getting back successes, it didn’t need to try alternate DNS Suffixes.
The wildcard MX record was, of course deleted, and everything works as expected.
But why have wildcard MX records?
Wildcard MX records are good for when you have a large number of hosts which are not directly Internet-connected (for example, behind a firewall) and for administrative or political reasons it is too difficult to have individual MX records for every host, or to force all e-mail addresses to be "hidden" behind one or more domain names. In that case, you must divide your DNS into two parts, an internal DNS, and an external DNS. The external DNS will have only a few hosts and explicit MX records, and one or more wildcard MXs for each internal domain. Internally the DNS will be complete, with all explicit MX records and no wildcards.
I’ve been working on helping remove WINS from a customers network. One of the big problems was identifying the remaining clients still using WINS, and just what they were using it for.
We used Network Monitor to capture WINS name resolution queries on the WINS to see which clients were querying for which server names.
What we found was quite interesting.
When a client is configured with a WINS server (via DHCP or statically), it will always attempt to resolve queries for SHORT names (i.e. names without dots in them) via both WINS and DNS at the same time. When it formulates the first DNS query to send out, it uses this logic:
It sends out BOTH a WINS query and a FQDN query to DNS at the same time because it doesn’t know which service can resolve the name, and rather than prefer one over the other and incur the delay, it just blasts both out at the exact same time.
If both replies result in an answer (i.e. an IP address) then the client will use the result from the service which happens to reply back the fastest.
If neither query comes back with a successful result, the DNS client takes over. It will either try DNS devolution on the primary DNS suffix (enabled by default), or will start walking down the DNS suffix search order, if that is configured. DNS devolution is the process of shortening the primary DNS suffix by dropping the left most parts of the DNS suffix until there is only 1 dot left in the DNS suffix.
An example of DNS devolution:
The primary DNS Suffix of the client is child.corp.contoso.com. The client is looking for the server called someserver.contoso.com by asking for server by the short name: someserver.
(Note that DNS wildcard records can mess this logic up – but that’s the topic of my next blog.)
What does this matter for removing WINS?
Well, in our case we started looking at all the WINS queries hitting the server before we started. And there were lots of them. This confused us a bit as all the clients should be Windows XP or newer, they should all be domain joined and should all use DNS. We were seeing the WINS queries because of the method described above where the client will send out BOTH WINS and DNS at the same time when querying for a short name.
Step 1 in removing WINS from our clients was to export the static WINS entries and create static DNS records for them instead. This removed the reliance on WINS for the clients. There are still other devices (notably printers) which register in WINS and need WINS so the print operators can locate the new print devices appearing on the network. The DNS zones only allow for secure updates, so without some other method, WINS will still be needed for these devices. Altering the process for deploying print servers, by identifying them before they hit the field will solve that.
Once that was done we installed Network Monitor 3.3 on the WINS server, and used this capture filter to show the successful answers the WINS server is giving back to the WINS clients:
NbtNs.Flag.R == 0x1
AND NbtNs.Flag.AA == 0x1
AND NbtNs.AnswerCount > 0x0
AND (IPv4.DestinationAddress < 10.1.0.0 OR IPv4.DestinationAddress > 10.1.255.255)
AND (IPv4.DestinationAddress < 169.254.0.0 OR IPv4.DestinationAddress > 169.254.255.255)
AND NbtNs.AnswerRecord.RRName.Name != "*<00><00><00><00><00><00><00><00><00><00><00><00><00><00><00>"
Line-by-line this says: Show all responses, which are answers,where there is more than 0 answers, where I am not replying to a client who is in the server subnet (10.1.0.0/16), nor am I replying to APIPA assigned addresses in my subnet (169.254.0.0/16) and the answer is not a response to a master browser announcement. While WINS uses port 42, it uses this for WINS server replication. WINS queries happen on 137/UDP.
We went through the results looking for names which weren’t in DNS. Which is like trying to find a straw in a great big stack of needles.
Then we disabled the WINS entries in the DHCP scopes for the clients.
Now we can see which clients are statically configured to use WINS. We’ll locate them first and correct them. Finding out exactly which host names they are relying on WINS for is still tricky, especially as the clients send out WINS and DNS queries simultaneously. But we’re on the right track.
We can then focus the filter on the server subnets to locate servers which are configured to register records in WINS:
(IPv4.SourceAddress > 10.1.0.0 AND IPv4.SourceAddress < 10.1.255.255)
AND NbtNs.Flag.OPCode == 0x8
AND NbtNs.NbtNsQuestionSectionData.QuestionName.Name != "CORP.CONTOSO.COM "
AND NbtNs.NbtNsQuestionSectionData.QuestionName.Name != "<01><02>__MSBROWSE__<02><01>"
Which says: Limit the traffic to source IP addresses within the server range (10.1.0.0 – 10.1.255.255) which are WINS Name Registration requests but exclude domain browser election requests for the domain corp.contoso.com (the 2 spaces at the end are important"), and also exclude master browser announcements. What remains are
I hope this helps you in your project to decommission WINS.