• SCSI vs IDE disks in Domain Controllers running as Virtual Machines on Hyper-V in Windows Server 2008 R2

    When installing Windows Server 2008 R2 Domain Controllers on Hyper-V R2, events were generated by AD about being unable to disable write caching for the disks where AD files (database, logs, SYSVOL) were stored. No events were logged for the boot disk. The boot disk is presented as IDE to the VM. The other volumes were presented as SCSI disks. The SCSI disks cannot have their write cache disabled by the Operating System.

    My recommendation is on Domain Controllers (or any other server role requiring that the write cache is disabled) which are running as virtual machines, hosted on Windows Server 2008 R2 Hyper-V, ONLY use IDE disks, not SCSI disks for critical AD files. Note this only affects the way the disks are presented to the VM, not to the host Operating System.

    If the server is a multi-role server (e.g. AD and File Services), the data disks for the second role can be SCSI disks. Check that this other role does not require write caching to be disabled on the data disks.

    Synthetic vs Emulated

    Synthetic

    Synthetic drivers are drivers which bypass the normal driver layering system between user-mode/kernel-mode in both the child and parent partitions. Instead they use the VMBus to talk more directly to the hardware.

    This requires that the child OS partition OS has started up enough to be running the VMBus drivers and services. Until this happens, emulated drivers are used. In this way the OS is said to be “enlightened” to the fact that it is virtualised.

    Emulated

    Emulated drivers have no idea the hardware has been virtualized and function in the classic way of attempting to talk to the HAL and eventually “ring 0”. This requires either binary re-writes of these data calls (a la ESX) or hardware virtualization (a la Hyper-V) to create a “ring -1”.

    There is a performance hit in using emulated drivers compared to their equivalent synthetic counterparts.

    SCIS vs IDE

    When booting a VM, it is not possible to use SCSI disks as those require the OS to be enlightened, which it is not during installation or boot-up.

    So, the boot (and system) partition must be presented to the VM as IDE.

    All other partitions can be either SCSI or IDE.

    IDE is limited to 3 connected disks (1 port is retained for the CD-ROM, which is required for updating the integration components).

    SCSI can have 64 connected disks per controller and 4 controllers per VM, giving a total of 256 SCSI disks per VM.

    SCSI also supports hot-add/removal of disks, which IDE disks do not.

    Performance

    Once the OS in the child partition (the VM) has started, there is no perceptible difference between IDE and SCSI for Windows 7/Windows Server 2008 R2 guests.

    The results of tests of this are located here:

    http://www.nodnarb.net/post/2009/11/30/Microsoft-Hyper-V-2008-R2-IDE-vs-SCSI-Performance.aspx

    clip_image002

    Write Caching

    SCSI disks presented to a VM do NOT support disabling the write cache.

    clip_image004

    Some roles or applications may require that the disk write cache is disabled. An example of this is Active Directory. When installed on a Windows Server 2008 R2, AD will attempt to tell the OS to disable the write cache of disks holding critical data related to AD:

    • · Ntds.dit (AD database)
    • · AD Database logs
    • · SYSVOL

    A warning event will be generated in the “Directory Services” log (ID 1539) for each component described above.

    As an example: if AD database and SYSVOL are on the same SCSI disk (D:) and AD logs are in another SCSI disk (E:), 3 events will be generated. D: will have 2 events generated, E: will have 1 event. If the logs are on C:, which is bootable and therefore is an IDE disk, only 2 events will be generated in all.

    clip_image006

  • Upgrading the ADMX Central Store files from Vista to Windows 7

    I had a question from a customer and thought I’d share the answer with everyone. They asked “I want to upgrade our Central Store of ADMX/ADML files for Group Policy from Windows Vista SP2/Windows Server 2008 SP2 to Windows 7/Windows Server 2008 R2. What do we need to worry about?”. So I redirected them to this blog:

    http://blogs.technet.com/b/askds/archive/2009/12/09/windows-7-windows-server-2008-r2-and-the-group-policy-central-store.aspx

    But we found that there were differences between the ADMX files available in C:\Windows\PolicyDefinitions on Windows 7 and Windows Server 2008 R2. One such difference is highlighted here:

    http://blogs.technet.com/b/askds/archive/2008/07/18/enabling-group-policy-preferences-debug-logging-using-the-rsat.aspx

    I wondered if there were more differences, so I went through all of the ADMX files of:

    • a Windows Server 2008 R2 server with no roles or features installed
    • a Windows Server 2008 R2 server with EVERY role and feature installed
    • a Windows 7 RTM client
    • all of the above with Windows 7 / Windows Server 2008 R2 SP1 installed

    Here are the results:

    • The only ADMX/ADML files which were modified by SP1 related to TerminalServer.admx to include changes relating to Calista or RemoteFX. No other ADMX/ADML files were changed by SP1.
    • Applications like AGPM and Office can add their own ADMX files to the local PolicyDefinitions folder on the server or workstation they are installing to. Make note to add ALL the ADMX/ADML files you need to \\FQDN\SYSVOL\FQDN\policies
    • Installing Windows Search on the server will add the ADMX/ADML files for this on the server. Adding any other role/feature does NOT add ADMX/ADML files to Servers local PolicyDefinitions folder
    • Get your ADMX/ADML files from a server with all the roles and features installed, and a Windows 7 client with all the features installed, and create a “super-set” of all ADMX/ADML files in your Central Store

     

    ADMX/ADML files in Windows Server 2008 R2, which are missing in Windows 7

    • · adfs
    • · GroupPolicy-Server
    • · GroupPolicyPreferences
    • · kdc
    • · mmcsnapins2
    • · NAPXPQec
    • · PowerShellExecutionPolicy
    • · PswdSync
    • · SearchOCR (if Handwriting Recognition is installed)
    • · ServerManager
    • · Snis
    • · TerminalServer-Server
    • · WindowsServer

     

    ADMX/ADML files in Windows 7, which are missing in Windows Server 2008 R2

    • · DeviceRedirection
    • · sdiagschd
    • · Search (if not installed on the server)
    • · ShapeCollector

    Here is a list of all Files in PolicyDefinitions folder on collected from both Windows 7 and Windows Server 2008 R2 Server (with every role and feature installed) and their dates and sizes:

    10-06-2009  23:04             4,717 ActiveXInstallService.admx

    10-06-2009  22:53             4,714 AddRemovePrograms.admx

    10-06-2009  22:49             1,249 adfs.admx

    10-06-2009  22:30             5,393 AppCompat.admx

    10-06-2009  22:36             5,965 AttachmentManager.admx

    10-06-2009  22:53             3,391 AutoPlay.admx

    10-06-2009  22:52             2,968 Biometrics.admx

    10-06-2009  22:53            49,181 Bits.admx

    10-06-2009  23:01             1,749 CEIPEnable.admx

    10-06-2009  22:53             1,361 CipherSuiteOrder.admx

    10-06-2009  22:43             1,329 COM.admx

    10-06-2009  22:42            13,967 Conf.admx

    10-06-2009  22:53             2,600 ControlPanel.admx

    10-06-2009  22:53            10,099 ControlPanelDisplay.admx

    10-06-2009  22:53             1,293 Cpls.admx

    10-06-2009  22:53             1,933 CredentialProviders.admx

    10-06-2009  23:00            10,779 CredSsp.admx

    10-06-2009  22:53             1,746 CredUI.admx

    10-06-2009  23:04             2,141 CtrlAltDel.admx

    10-06-2009  22:43             2,437 DCOM.admx

    10-06-2009  22:53            13,576 Desktop.admx

    10-06-2009  23:07            18,551 DeviceInstallation.admx

    10/06/2009  22:50             2,391 DeviceRedirection.admx

    10-06-2009  22:59             1,093 DFS.admx

    10-06-2009  22:37             1,992 DigitalLocker.admx

    10-06-2009  22:52             3,034 DiskDiagnostic.admx

    10-06-2009  23:08             2,758 DiskNVCache.admx

    10-06-2009  22:38             6,123 DiskQuota.admx

    10-06-2009  22:54               989 DistributedLinkTracking.admx

    10-06-2009  22:30            10,290 DnsClient.admx

    10-06-2009  23:01             7,656 DWM.admx

    10-06-2009  22:53               962 EncryptFilesonMove.admx

    10-06-2009  22:40             5,097 EnhancedStorage.admx

    10-06-2009  23:01            21,737 ErrorReporting.admx

    10-06-2009  22:56             1,996 EventForwarding.admx

    10-06-2009  22:56            12,429 EventLog.admx

    10-06-2009  22:58             2,528 EventViewer.admx

    10-06-2009  22:53             3,836 Explorer.admx

    10-06-2009  22:51             2,141 FileRecovery.admx

    10-06-2009  22:38             6,172 FileSys.admx

    10-06-2009  22:45             2,342 FolderRedirection.admx

    10-06-2009  22:53             1,517 FramePanes.admx

    10-06-2009  22:52             2,229 fthsvc.admx

    10-06-2009  22:38             2,256 GameExplorer.admx

    10-06-2009  23:10            26,800 Globalization.admx

    10-06-2009  22:42             1,485 GroupPolicy-Server.admx

    10-06-2009  22:42            23,507 GroupPolicy.admx

    10-06-2009  22:42           100,025 GroupPolicyPreferences.admx

    10-06-2009  22:40             2,647 Help.admx

    10-06-2009  22:40             2,830 HelpAndSupport.admx

    10-06-2009  22:37             1,701 HotStart.admx

    10-06-2009  22:44            32,865 ICM.admx

    10-06-2009  22:43             1,243 IIS.admx

    10-06-2009  22:48         3,076,705 inetres.admx

    10-06-2009  23:08             1,787 InkWatson.admx

    10-06-2009  23:08             3,327 InputPersonalization.admx

    10-06-2009  22:41             6,868 iSCSI.admx

    10-06-2009  23:01             1,980 kdc.admx

    10-06-2009  23:01             3,709 Kerberos.admx

    10-06-2009  23:02             1,912 LanmanServer.admx

    10-06-2009  22:52             2,205 LeakDiagnostic.admx

    10-06-2009  22:39             3,681 LinkLayerTopologyDiscovery.admx

    10-06-2009  22:44             7,130 Logon.admx

    10-06-2009  23:01             1,786 MediaCenter.admx

    10-06-2009  22:31             3,580 MMC.admx

    10-06-2009  22:42            56,928 MMCSnapins.admx

    10-06-2009  22:42             6,994 MMCSnapIns2.admx

    10-06-2009  22:37             1,890 MobilePCMobilityCenter.admx

    10-06-2009  22:37             1,986 MobilePCPresentationSettings.admx

    10-06-2009  22:49             3,626 MSDT.admx

    10-06-2009  22:52             2,147 Msi-FileRecovery.admx

    10-06-2009  22:40            16,466 MSI.admx

    10-06-2009  22:58             1,298 NAPXPQec.admx

    10-06-2009  22:34             3,615 NCSI.admx

    10-06-2009  22:47            17,738 Netlogon.admx

    10-06-2009  22:31            17,024 NetworkConnections.admx

    10-06-2009  22:52             2,443 NetworkProjection.admx

    10-06-2009  23:01            25,505 OfflineFiles.admx

    10-06-2009  22:54             8,498 P2P-pnrp.admx

    10-06-2009  22:44             1,381 ParentalControls.admx

    10-06-2009  22:46             9,071 pca.admx

    10-06-2009  22:56             3,648 PeerToPeerCaching.admx

    10-06-2009  23:08             1,773 PenTraining.admx

    10-06-2009  22:33             2,292 PerfCenterCPL.admx

    10-06-2009  23:07             7,555 PerformanceDiagnostics.admx

    10-06-2009  23:07             1,939 PerformancePerftrack.admx

    10-06-2009  23:08            35,966 Power.admx

    10-06-2009  22:41             2,029 PowerShellExecutionPolicy.admx

    10-06-2009  22:44             6,901 PreviousVersions.admx

    10-06-2009  23:01            30,822 Printing.admx

    10-06-2009  22:53             3,239 Programs.admx

    10-06-2009  23:08             3,344 PswdSync.admx

    10-06-2009  22:50            13,257 QOS.admx

    10-06-2009  23:08             1,273 RacWmiProv.admx

    10-06-2009  22:52             1,972 Radar.admx

    10-06-2009  22:52             1,236 ReAgent.admx

    10-06-2009  22:57             3,722 Reliability.admx

    10-06-2009  22:51             7,150 RemoteAssistance.admx

    10-06-2009  23:07            23,268 RemovableStorage.admx

    10-06-2009  22:53             6,292 RPC.admx

    10-06-2009  22:42             6,991 Scripts.admx

    10-06-2009  22:48             2,519 sdiageng.admx

    10/06/2009  22:49             2,027 sdiagschd.admx

    10-06-2009  22:34            43,882 Search.admx

    10-06-2009  23:08            11,602 SearchOCR.admx

    10-06-2009  23:01             1,370 Securitycenter.admx

    10-06-2009  22:34             3,888 Sensors.admx

    10-06-2009  22:48             3,334 ServerManager.admx

    10-06-2009  23:04             1,588 Setup.admx

    10/06/2009  23:08             1,187 ShapeCollector.admx

    10-06-2009  22:54             1,634 SharedFolders.admx

    10-06-2009  22:53             1,985 Sharing.admx

    10-06-2009  22:53             3,466 Shell-CommandPrompt-RegEditTools.admx

    10-06-2009  22:53             1,157 ShellWelcomeCenter.admx

    10-06-2009  22:58             5,039 Sidebar.admx

    10-06-2009  22:31             7,397 Sideshow.admx

    10-06-2009  23:03             9,691 Smartcard.admx

    10-06-2009  23:08             2,057 Snis.admx

    10-06-2009  23:00             2,307 Snmp.admx

    10-06-2009  23:01             1,943 SoundRec.admx

    10-06-2009  22:53            25,663 StartMenu.admx

    10-06-2009  23:01             2,833 SystemResourceManager.admx

    10-06-2009  23:08             1,716 SystemRestore.admx

    10-06-2009  22:46            12,737 TabletPCInputPanel.admx

    10-06-2009  23:08            12,313 TabletShell.admx

    10-06-2009  22:53             9,365 Taskbar.admx

    10-06-2009  22:58             5,520 TaskScheduler.admx

    10-06-2009  22:49            10,059 tcpip.admx

    10-06-2009  22:39            17,774 TerminalServer-Server.admx

    04/11/2010  17:56            83,116 TerminalServer.admx

    10-06-2009  22:53             2,352 Thumbnails.admx

    10-06-2009  23:05             2,726 TouchInput.admx

    10-06-2009  23:04             3,409 TPM.admx

    10-06-2009  23:08             8,101 UserDataBackup.admx

    10-06-2009  22:56            15,021 UserProfiles.admx

    10-06-2009  23:04            40,554 VolumeEncryption.admx

    10-06-2009  23:04             6,277 W32Time.admx

    10-06-2009  22:49             2,512 WDI.admx

    10-06-2009  22:52             1,768 WinCal.admx

    10-06-2009  22:42            14,532 Windows.admx

    10-06-2009  22:53             1,265 WindowsAnytimeUpgrade.admx

    10-06-2009  23:08             3,702 WindowsBackup.admx

    10-06-2009  22:45             2,024 WindowsColorSystem.admx

    10-06-2009  22:39             4,085 WindowsConnectNow.admx

    10-06-2009  23:04             5,115 WindowsDefender.admx

    10-06-2009  22:53            35,942 WindowsExplorer.admx

    10-06-2009  23:08             3,000 WindowsFileProtection.admx

    10-06-2009  22:45            27,019 WindowsFirewall.admx

    10-06-2009  22:46             2,767 WindowsMail.admx

    10-06-2009  23:01             1,254 WindowsMediaDRM.admx

    10-06-2009  23:01            22,974 WindowsMediaPlayer.admx

    10-06-2009  22:44             2,903 WindowsMessenger.admx

    10-06-2009  22:42             7,203 WindowsProducts.admx

    10-06-2009  23:00             9,878 WindowsRemoteManagement.admx

    10-06-2009  23:00             4,338 WindowsRemoteShell.admx

    10-06-2009  22:42             1,314 WindowsServer.admx

    10-06-2009  22:59            19,272 WindowsUpdate.admx

    10-06-2009  23:04             1,955 WinInit.admx

    10-06-2009  23:04             5,237 WinLogon.admx

    10-06-2009  22:42             1,342 Winsrv.admx

    10-06-2009  22:53             1,406 WordWheel.admx

                 160 Files

  • Decommissioning WINS

    I’ve been working on helping remove WINS from a customers network. One of the big problems was identifying the remaining clients still using WINS, and just what they were using it for.

    We used Network Monitor to capture WINS name resolution queries on the WINS to see which clients were querying for which server names.

    What we found was quite interesting.

    When a client is configured with a WINS server (via DHCP or statically), it will always attempt to resolve queries for SHORT names (i.e. names without dots in them) via both WINS and DNS at the same time. When it formulates the first DNS query to send out, it uses this logic:

    • If DNS Suffix Search Order list is empty, then use the primary DNS Suffix (typically the DNS name of the domain the client is joined to).
    • If there is a DNS Suffix Search Order list, then use the first entry.

    It sends out BOTH a WINS query and a FQDN query to DNS at the same time because it doesn’t know which service can resolve the name, and rather than prefer one over the other and incur the delay, it just blasts both out at the exact same time.

    If both replies result in an answer (i.e. an IP address) then the client will use the result from the service which happens to reply back the fastest.

    If neither query comes back with a successful result, the DNS client takes over. It will either try DNS devolution on the primary DNS suffix (enabled by default), or will start walking down the DNS suffix search order, if that is configured. DNS devolution is the process of shortening the primary DNS suffix by dropping the left most parts of the DNS suffix until there is only 1 dot left in the DNS suffix.

    An example of DNS devolution:

    The primary DNS Suffix of the client is child.corp.contoso.com. The client is looking for the server called someserver.contoso.com by asking for server by the short name: someserver.

    1. someserver.child.corp.contoso.com [fails to resolve]
    2. someserver.corp.contoso.com [fails to resolve]
    3. someserver.contoso.com [success!]

    (Note that DNS wildcard records can mess this logic up – but that’s the topic of my next blog.)

    What does this matter for removing WINS?

    Well, in our case we started looking at all the WINS queries hitting the server before we started. And there were lots of them. This confused us a bit as all the clients should be Windows XP or newer, they should all be domain joined and should all use DNS. We were seeing the WINS queries because of the method described above where the client will send out BOTH WINS and DNS at the same time when querying for a short name.

    Step 1 in removing WINS from our clients was to export the static WINS entries and create static DNS records for them instead. This removed the reliance on WINS for the clients. There are still other devices (notably printers) which register in WINS and need WINS so the print operators can locate the new print devices appearing on the network. The DNS zones only allow for secure updates, so without some other method, WINS will still be needed for these devices. Altering the process for deploying print servers, by identifying them before they hit the field will solve that.

    Once that was done we installed Network Monitor 3.3 on the WINS server, and used this capture filter to show the successful answers the WINS server is giving back to the WINS clients:

    NbtNs.Flag.R == 0x1

    AND NbtNs.Flag.AA == 0x1

    AND NbtNs.AnswerCount > 0x0

    AND (IPv4.DestinationAddress < 10.1.0.0 OR IPv4.DestinationAddress > 10.1.255.255)

    AND (IPv4.DestinationAddress < 169.254.0.0 OR IPv4.DestinationAddress > 169.254.255.255)

    AND NbtNs.AnswerRecord.RRName.Name != "*<00><00><00><00><00><00><00><00><00><00><00><00><00><00><00>"

    Line-by-line this says: Show all responses, which are answers,where there is more than 0 answers, where I am not replying to a client who is in the server subnet (10.1.0.0/16), nor am I replying to APIPA assigned addresses in my subnet (169.254.0.0/16) and the answer is not a response to a master browser announcement. While WINS uses port 42, it uses this for WINS server replication. WINS queries happen on 137/UDP.

    We went through the results looking for names which weren’t in DNS. Which is like trying to find a straw in a great big stack of needles.

    Then we disabled the WINS entries in the DHCP scopes for the clients.

    Now we can see which clients are statically configured to use WINS. We’ll locate them first and correct them. Finding out exactly which host names they are relying on WINS for is still tricky, especially as the clients send out WINS and DNS queries simultaneously. But we’re on the right track.

    We can then focus the filter on the server subnets to locate servers which are configured to register records in WINS:

    (IPv4.SourceAddress > 10.1.0.0 AND IPv4.SourceAddress < 10.1.255.255)

    AND NbtNs.Flag.OPCode == 0x8

    AND NbtNs.NbtNsQuestionSectionData.QuestionName.Name != "CORP.CONTOSO.COM  "

    AND NbtNs.NbtNsQuestionSectionData.QuestionName.Name != "<01><02>__MSBROWSE__<02><01>"

    Which says: Limit the traffic to source IP addresses within the server range (10.1.0.0 – 10.1.255.255) which are WINS Name Registration requests but exclude domain browser election requests for the domain corp.contoso.com (the 2 spaces at the end are important"), and also exclude master browser announcements. What remains are

    I hope this helps you in your project to decommission WINS.

  • A backup server flooded by DPCs

    Hi,

    I’ve just finished working on a case with a customer that was so interesting that it deserved a blog post to round it off.

    These were the symptoms:

    Often while logged in to the server things would appear to freeze – no screen updates, little mouse responsiveness, if you could start a program (perfmon, Task Manager, Notepad etc.) then you wouldn’t be able to type into it and if you did it would crash.

    This Windows Server 2008 R2 server runs TSM backup software with thousands of servers on the network sending their backup jobs to it. At any one time there could be hundreds of backup jobs running. The load was lower during the day, but it was always working hard dealing with constant backups of database snapshots from servers. The backup clients are Windows, UNIX, Solaris, you name it…

    When the server froze, you’d see 4 of the 24 logical CPUs lock at 100% and the other 20 CPUs would saw-tooth from locking at 100% to using 20-30%. The freeze would happen for minutes at a time.

    CPUs 0,2,4,6 locked at 100%, others saw-tooth

    There are 2 Intel 10GB NICs in a team using Intel's teaming software. The team and the switches are setup with LACP to enable inbound load balancing and failover.

    By running perfmon remotely before the freeze happens we could see that the 4 CPUs that are locked at 100% are locked by DPCs. We used the counter “Processor Information\% DPC Time”.

    A DPC is best defined in Windows Internals 6th Ed. (Book 1, Chapter 3):

    A DPC is a function that performs a system task—a task that is less time-critical than the current one. The functions are called deferred because they might not execute immediately. DPCs provide the operating system with the capability to generate an interrupt and execute a system function in kernel mode. The kernel uses DPCs to process timer expiration (and release threads waiting for the timers) and to reschedule the processor after a thread’s quantum expires. Device drivers use DPCs to process interrupts.

    Because this is a backup server, we’re expecting that the bulk of our hardware DPCs will be generated by incoming network packets and raised by the NICs. Though they could have been coming from the tape library or the storage arrays.

    To look into what exactly is generating DPCs and how long the DPCs last for, we need to run Windows Performance Toolkit, specifically WPR.exe (Windows Performance Recorder). We have to do this carefully. We don’t want to increase the load of the server by capturing the Network and CPU activity of a server which already has high activity on the CPU and Network, and has shown a past history of crashing. But we want to run the capture while the server is in a frozen state. A tricky thing. So we ran this batch file:

    Start /HIGH /NODE 1 wpr.exe -start CPU –start Network -filemode –recordtempto S:\temp

    ping -n 20 127.0.0.1 > nul

    Start /HIGH /NODE 1 wpr.exe –stop S:\temp\profile_is_CPU_Network.etl

    If the server you are profile has a lot of RAM (24GB or more), you’ll want to protect your non-paged pool from increasing and harming your server. To do that you should review this blog and add this switch to the start command: –start "C:\Program Files (x86)\Windows Kits\8.0\Windows Performance Toolkit\SampleGeneralProfileForLargeServers.wprp"

    We’re starting on NUMA node 1 as the NICs were bound to NUMA node 0 and the “Processor Information” perfmon trace we took earlier showed that the CPUs on NUMA node 0 were locked. We’re starting the recorder with a “high” prioritization so that we can be sure it gets the CPU time it needs to work. We’re not writing to RAM, we’re recording to disk in the hopes that if the trace crashes we’ll at least have a partial trace to use. We made sure that S: in this example was a SAN disk to ensure it had the required speed to keep up with the huge data we’re expecting. We’re pinging 20 times to make sure our trace is 20 seconds long. And finally we’re starting a trace of CPU and Network profiles.

    Note that to gather stacks we first had to disable the ability for the Kernel (aka the Executive) to send its own pages of memory out from RAM to the pagefile, where we cannot analyze them. To do this run wpr –disablepagingexecutive on and then reboot.

    We retrieved 3 traces in all:

      1. The first trace to diagnose the problem
      2. The second trace after 2 changes were made which generated about 50% of our problem
      3. The final trace after the final change was made which created the other 50% of the problem

        Diagnosis

        So this blog now becomes a short tutorial on how you can use WPA (Windows Performance Analyzer) to locate the source of DPC issues. WPA is a VERY powerful tool and diagnosing problems is part science, part art. Meaning that no two diagnosis are ever done in the same way. This is just how I used WPA in this case. For this analysis, you’ll need the debugging tools installed and symbols configured and loaded.

        CPU Usage (Sampled)\Utilization By CPU

        First I want to see which CPUs are pegged. For that we use “CPU Usage (Sampled)\Utilization By CPU”, then select a time range by right-clicking:

        Choose a round number (10 seconds in my example) as it makes it easier to quickly calculate how many things happened per minute when comparing to the graphs for the later scenarios:

        Select Time Range

        I chose 20 seconds to 30 seconds as it is a 10 second window where there was heavy load and not blips due to tracing starting or stopping. Then “Zoom” by right clicking again.

        Now all your graphs will be focused on that time range.

        Then shift-select the CPUs which are pegged. In this case it is CPUs 0, 2, 4 and 6. This is because the cores are Hyperthreaded and the NICs cannot interrupt a logical CPU which is the result of Hyperthreading (CPUs 1, 3, 5, 7 etc.). And they are low-numbered CPUs because they are located on NUMA node 0.

        Once they are selected, right-click and choose “Filter to Selection”:

        Filter to Selection

        Next we want to add a column for DPCs so we can see how much of the CPUs time was spent locked processing DPCs. To add columns, just right click on the column title bar (in the screen above this has “Line # | CPU || Count | Weight (in view) | Timestamp”) on the centre of the right hand pane and select the columns you want to display. Once the DPC/ISR column has been added, drag it to the left side of the yellow bar, next to the CPU column:

        Choose columns

        Expanding out the CPU items, we see that DPCs count for almost all of the CPU activity on these CPUs (the count figures for the CPUs activity is 10 seconds of CPU time and the count of CPU time for DPCs under this is over 9 seconds).

        DPC duration by Module, Function

        The next WPA graph we need is the one which can show how long the DPCs last for. We drag in the first graph under “DPC/ISR” called “DPC duration by Module, Function”:

        DPC duration by Module, Function

        One the far right column (“Duration”), we can see how long each module spends waiting with a DPC. This says that 36.8 seconds were spent on DPCs for NDIS.SYS alone. How can it be 36.8 seconds if the sample window is 10 seconds? Well, it is CPU seconds, and we have 24 CPUs, so we could potentially have 240 CPU seconds in all.

        The next biggest waiter for DPCs is storport.sys. But at 1 second, it’s not even close.

        The column with the blue text is called “Duration (Fragmented) (ms) Avg” and is the average time a DPC lasts for during this sample window. The NDIS.SYS DPCs last around 0.22 milliseconds, or 220 microseconds. The count of DPCs for NDIS and storport are comparatively similar (163,000 and 123,000 respectively), but because NDIS took so long on each DPC on average, it ended up locking the CPU for longer than storport did.

        So let’s add the CPU column, move it to the left side of the yellow line with it as the first column to pivot on:

        Filter to busy CPUs

        We can see that our targeted CPUs, 0, 2, 4. 6 have very high durations of DPC waits (using the last column for “Duration”, again) with no other CPU spending very much time in a DPC wait state. So we select these CPUs and filter.

        Expanding out the CPUs, we see that there are many different sources of DPCs, but that NDIS is really the biggest source of DPC waits. So we will now move the “Module” column to be the left-most column and remove the CPU column from view. We then right click on NDIS.SYS and “Filter to Selection” again as we only want to focus on DPCs from NDIS on CPUs 0, 2, 4, 6:

        Filter to NDIS

        One function, ndisInterruptDPC is causing our DPC waits. This is the one we’ll focus on. If we expand this, it will list every single DPC and how long that wait is. Select every single one of these rows by scrolling to the very bottom of the table (in this example there are 163,230 individual DPCs):

        Copy Column Selection

        Right click on the column called “Duration” and choose “Copy Other” and then “Copy Column Selection”. This will copy only the values in the “Duration” column. We can paste this into Excel and create a graph which shows the duration of the DPCs as a function of the number of DPCs present:

        Taken from Excel

        I have added a red line on 0.1 milliseconds because according the hardware development kit for driver manufacturers, a DPC should not last longer than 100 microseconds. Meaning DPC above the red line are misbehaving. And that this is the bulk of our time spent waiting on DPCs.

        So, we have established that we have slow DPCs on NDIS, and lots of them, and that they are locking our 4 CPUs. Our NICs aren’t able to spread their DPCs to any other CPUs and Hyperthreading isn’t really helping our specific issue. But what is causing the networking stack to generate so many slow DPC locks?

        DPC/ISR Usage by Module, Stack

        The final graph in WPA will show us this. From the category “CPU Usage (Sampled)”, drag in a graph called “DPC/ISR Usage by Module, Stack”. Filter to DPC (which will exclude ISRs) and our top candidates are:

        DPC/ISR Usage by Module, Stack

        1. ntoskrnl.exe (the Windows Kernel)
        2. NETIO.SYS (Network IO operations)
        3. tcpip.sys (TCP/IP)
        4. NDIS.SYS (Network layer standard interface between OS and NIC drivers)
        5. IDSvia64.sys (Symantec Intrusion Detection System)
        6. ixn62x64.sys (Intel NIC driver for NDIS 6.2, x64)
        7. iansw60e.sys (Intel NIC teaming software driver for NDIS 6.0)

        To see what these are doing we simply expand the stack columns by clicking the triangle of the row with the highest count, looking for informative driver names and a large drop in the number of counts present, indicating that this particular function is causing a consumption of CPU time.

        NTOSKRNL is running high because we are capturing. The kernel is spending time gathering ETL data. This can be ignored.

        NETIO is redirecting network packets to/from tcpip.sys for a function called InetInspectRecieve:

        NETIO.sys stack expansion

        TCP/IP is dealing with the NETIO commands above to do this “Receive Inspection”:

        TCPIP.SYS stack expansion

        NDIS.SYS is dealing with 2 main functions in tcpip.sys: TcpTcbFastDatagram and InetInspectRecieve again:

        NDIS.SYS stack expansion

        Other than ntoskrnl, these 3 Windows networking drivers all have entries for the drivers listed as 5, 6 and 7 above in their stacks.

        Diagnosis Summary

        Lots of DPCs are caused by 3 probable sources:

        1. Incoming packet inspection by the Symantec IDS system.
        1. The IDS system has to take every packet, compare it to a signature definition, and, if clean, allow it to pass. This action is causing slow DPCs
        • The NIC driver could be stale/buggy and generating slow DPCs.
        1. There is no evidence for this, but it’s usually a good place to start. There could be TCP offloading or acceleration features in the NIC and/or driver which haven’t been enabled but may improve network performance.
        • And finally the NIC teaming software is getting in between the NICs and the CPUs.
        1. That is, after all, the job of the NIC teaming software: to trick Windows into thinking that the incoming packets from 2 distinct hardware devices are actually coming from 1 device. The problem here, however, is that this insertion into the networking stack is pure software, but is likely causing very slow DPCs

        Action Plan

        Our actions were to make changes over 2 separate outage windows:

        1. Update the NIC driver and enable Intel I/OAT in the BIOS of the server.
        1. I/OAT is described in the spec sheet for the NIC like this: “When enabled within multi-core environments, the Intel Ethernet Server Adapter X520-T2 offers advanced networking features. Intel I/O Acceleration Technology (Intel I/OAT), for efficient distribution of Ethernet workloads across CPU cores. Load balancing of interrupts using MSI-X enables more efficient response times and application performance. CPU utilization can be lowered further through stateless offloads such as TCP segmentation offload, header replications/splitting and Direct Cache Access (DCA).”
        • Uninstall the NIC teaming software
        1. 3rd party NIC teaming software inhibits many TCP offloading features, and in this case generates large numbers of slow DPCs
        • On the second outage we uninstalled the IDS system.
        1. IDS was not configured on this (and all other) servers. But as the software had the potential to become enabled, it was grabbing every incoming packet for inspection, despite the fact that it wasn’t configured to inspect the packet or act on violations in any way. Stopping the service is insufficient, the driver must be removed from the hidden, non-plug and play section of the device manager. Manually removing the driver isn’t sufficient. The software will reinstall it at next boot. Only a full uninstall will do.

        After dissolving the NIC Team

        Here is what the picture looked like after we dissolved the NIC team, updated the NIC driver and enabled Intel I/OAT in the BIOS.

        DPC duration - No teaming, I/OAT enabled

        In this 10 second sample we can see that the 4 CPU cores are still effectively locked as the CPU time due to NDIS DPCs is 37.7 seconds (out of a possible maximum of 40 seconds. The number of DPCs has decreased by more than half to 55,000, meaning that the average duration of DPCs has become very long at 682 microseconds – triple the average time from before we removed the NIC team and enabled I/OAT.

        Taken from Excel

        The blue area of the graph above is the picture we had from before changes were made. The pink/orange area is the picture of DPC durations after removing NIC teaming and enabling I/OAT.

        So why did the average duration of DPCs get longer?

        It could be that the IDS software now does not need to relinquish its DPCs to make room on the same CPU cores as the DPCs for the NIC teaming driver. These 2 drivers must be locked to the same CPUs. With no need to relinquish a DPC due to another DPC of equal priority, the IDS DPCs are free to use the CPU for longer periods of time before being forced off.

        At any rate, it certainly isn’t fixed yet.

        After uninstalling Symantec IDS

        And finally here’s what the picture looked like after we uninstalled the IDS portion of the Symantec package. Remember, this service was not configured to be enabled in any way.

        DPC duration - no IDS

        You can see that the average time has dropped from 220 microseconds to 90 microseconds – below the 100 microsecond threshold required by the Driver Development Kit.

        In this 10 second sample there were 127,000 DPCs from NDIS on the 4 heavily used CPUs, but the CPU time they consumed was 11 seconds, a reduction from 36.8 seconds.

        Taken from Excel

        The blue area of the graph above is the picture we had from before changes were made. The pink/orange area is the picture of DPC durations after removing NIC teaming and enabling I/OAT. And the green area is the picture after IDS is removed.

        This is a dramatic improvement. Nearly all DPCs are below the 100 microsecond limit. The system is able to process the incoming load without locking up for high priority, long lasting DPCs.

        What about RSS?

        We’re not quite done though. 4 of our CPUs are still working very hard, often pegged at 100%. But why only 4? This is a 2-socket system with 6 cores on each socket. That gives us 12 CPUs where we can run DPCs. DPCs from one NIC are bound to one NUMA node. We already dissolved our NIC team, so we only have 1 NIC in action, so we are limited to 6 cores. RSS can spread DPCs over CPUs in roots of 2, meaning 1, 2, 4, 8, 16, 32 cores. Meaning we can at most use 4 CPUs per NIC.

        To scale out we would need to add more NICs and limit RSS on each of those NICs to 2 cores. We’d need to bind 3 NICs to NUMA node 0 and 3 to NUMA node 1. We’d also need to set the starting CPUs for those NICs to be cores 0, 2, 4, 6, 8 and 10. In that we can saturate every possible core.

        But to do this, we’d need to ensure that we can have multiple NICs, without using the teaming software. Which means we’d need to assign each NIC a unique IP address. To do that we need to make sure that the TSM clients can deal with targeting a server name with multiple IP addresses in DNS for that name. And if connectivity to the first IP address is lost, that TSM can failover to one of the other IP addresses. We’ll test TSM and  get back with our results later.

        But we need one more fundamental check before doing that: We need to make sure that the incoming packet, hitting a specific NUMA node and core is going to end up hitting the right thread of the TSM server where that packet is going to be dealt with and backed up. If we can’t align a backup client to the incoming NIC and align that NIC to the backup software thread that should process it, then we’ll be causing intra-CPU interrupts, or worse yet, cross NUMA interrupts. This would make the entire system much less scalable.

        image

        So this is how this would all look. The registry key to set the NUMA node to bind a NIC to is “*NumaNodeId” (including the * at the start). To set the base CPU, use *RssBaseProcNumber”. To set the maximum number of processors to use set “*RssBaseProcNumber”.

        These keys are explained here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff570864(v=vs.85).aspx

        and here: Performance Tuning Guidelines for Windows Server 2008 R2

        And more general information on how RSS works in Windows Server 2008 are here: Scalable Networking- Eliminating the Receive Processing Bottleneck—Introducing RSS

        Our problem in the above picture, however, is that our process doesn’t know to run its threads on the NUMA node and cores where the incoming packets are arriving. Had this been SQL server, we could have run separate instances configured to start using specific CPUs. Hopefully, one day, TSM will operate like this and become NUMA-node aware.

        I know this has been a long post, but for those who have read down to here, I do hope this has helped you with your troubleshooting using WPT.

      • Getting started with Storage Replica in Windows Server Technical Preview

        Storage Replica (SR) is a new feature that enables storage-agnostic, block-level, synchronous replication between servers for disaster recovery, as well as stretching of a failover cluster for high availability. Synchronous replication enables mirroring of data in physical sites with crash-consistent volumes ensuring zero data loss at the file system level. Asynchronous replication allows site extension beyond metropolitan ranges with the possibility of data loss.

        Ned Pyle, the Product Manager for Storage Replica, has written a great “getting started” guide here:

        http://social.technet.microsoft.com/Forums/windowsserver/en-US/f843291f-6dd8-4a78-be17-ef92262c158d/getting-started-with-windows-volume-replication?forum=WinServerPreview

        I got mine going after adding the Windows Storage Replication feature in Server Manager:

        image

        It’s configured in Failover Clustering:

        image

        I’m working with a customer who is really excited that in-box volume replication has come to Windows Server. It’s going to be interesting to discover best practices and ideal use cases for Storage Replica as we get closer to the final release.