Blog - Title

October, 2008

  • Troubleshooting KCC Event Log Errors

    My name is David Everett and I’m a Support Escalation Engineer on the Directory Services Support team.

    I’m going to discuss a recent trend I’ve seen where Active Directory Replication appears to be fine but one DC only in one (or more) sites begins logging Knowledge Consistency Checker (KCC) Warning and Error events in the Directory Service event log. I included sample events below.

    For those not familiar with the KCC, it is a distributed application that runs on every domain controller. The KCC is responsible for creating the connections between domain controllers and collectively forms the replication topology. The KCC uses Active Directory data to determine where (from what source domain controller to what destination domain controller) to create these connections.

    In some cases these errors are logged all the time and in others they are logged at regular intervals and they clear on their own only to reappear like clockwork. Typically other DCs in the same site(s), perhaps even in the whole forest, report no KCC errors at all. In some cases the DC logging these errors have a small number of connection objects compared with their peer DCs in the same site:

    Event Type: Warning
    Event Source: NTDS KCC
    Event Category: (1)
    Event ID: 1566
    Date: 5/14/2008
    Time: 1:51:23 PM
    Computer: DC1X
    All domain controllers in the following site that can replicate the
    directory partition over this transport are currently unavailable.

    Directory partition:
    CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=contoso,DC=com


    Event Type: Error
    Event Source: NTDS KCC
    Event Category: (1)
    Event ID: 1311
    Date: 5/14/2008
    Time: 1:51:23 PM
    Computer: DC1X
    The Knowledge Consistency Checker (KCC) has detected problems with the
    following directory partition.

    Directory partition:

    There is insufficient site connectivity information in Active Directory
    Sites and Services for the KCC to create a spanning tree replication topology.
    Or, one or more domain controllers with this directory partition are unable
    to replicate the directory partition information. This is probably due to
    inaccessible domain controllers.

    User Action
    Use Active Directory Sites and Services to perform one of the following
    - Publish sufficient site connectivity information so that the KCC can
    determine a route by which this directory partition can reach this site. This is
    the preferred option.
    - Add a Connection object to a domain controller that contains the directory
    partition in this site from a domain controller that contains the same
    directory partition in another site.

    If neither of the Active Directory Sites and Services tasks correct this
    condition, see previous events logged by the KCC that identify the
    inaccessible domain controllers.

    In some cases this event is also seen; it suggests name resolution is working but a network port is blocked:

    Event Type: Warning
    Event Source: NTDS KCC
    Event Category: (1)
    Event ID: 1865
    Date: 5/14/2008
    Time: 1:51:23 PM
    Computer: DC1X
    The Knowledge Consistency Checker (KCC) was unable to form a complete
    spanning tree network topology. As a result, the following list of sites
    cannot be reached from the local site.


    If you encounter this issue it could be the DC logging the errors is hosting the Intersite Topology Generator (ISTG) role for its site. This role is responsible for maintaining all of the Inter-site connection objects for the site. This role polls each DC in its site for connection objects that have failed and if failures are reported by the peer DCs the ISTG logs these events indicating something is not right with connectivity.

    For those wondering what these events mean here is a quick rundown:

    • The 1311 event indicates the KCC couldn't connect up all the sites.
    • The 1566 event indicates the DC could not replicate from any server in the site identified in the event description.
    • When logged, the 1865 event contains secondary information about the failure to connect the sites and tells which sites are disconnected from the site where the KCC errors are occurring.

    Ok, I’ll get to the point and explain how to identify the root cause and correct this. These errors are pointing to a topology or a connectivity issue. Either there are not enough site links to connect all the sites or more likely network connectivity is failing for a number of reasons.

    If your network is not fully routed (the ability for any DC in the forest to perform an RPC bind to every other DC in the forest) make certain Bridge All Sites Links (BASL) is unchecked. If BASL is unchecked Site Links and/or Site Link Bridges must be configured. Site Links and Site Link Bridges provide the KCC with the information it needs to build connections over existing network routes. If the network is fully routed and you have BASL checked, fine.

    While the network routes may exist the ports needed for Active Directory to replicate must not be restricted.

    The assumption of this blog is these errors continue to be logged even though the site listed in the 1566 event has been added to a site link object and AD topology is correctly configured.

    To locate the source of the KCC events and identify the root cause, you need to execute the following commands while the KCC events are being logged.

    1) Identify the ISTG covering each site by running this command:

    repadmin /istg

    The output will list all sites in the forest and the ISTG for each site:

    repadmin running command /istg against server localhost

    Gathering topology from site Default-First-Site-Name (

                                       Site                                ISTG
    ================== =================
                                     SiteX                               DC1X
                                     SiteY                               DC1Y

    NOTE: Determine from the output if the DC logging these events (DC1X) is the ISTG or not.

    2) If the DC logging the events is the ISTG any one of the DCs in the same site as this ISTG could have connectivity issues to the site identified in the 1566 event. You can identify which DC(s) are failing to replicate from the site identified in the 1566 event by running this command which targets all DCs in the site that the ISTG logging the errors resides in. For example, DC1X is logging the events and it is the ISTG for siteX. To identify which DCs in siteX are failing to replicate from siteY run this command:

    repadmin /failcache site:siteX >siteX-failcache.txt

    The failcache output shows two DCs in siteX:

    repadmin running command /failcache against server

    ==== KCC CONNECTION FAILURES =========================== (none)

    ==== KCC LINK FAILURES ===============================     SiteY\DC1Y        
        DC object GUID: 7c2eb482-ad81-4ba7-891e-9b77814f7473        
        No Failures.

    repadmin running command /failcache against server

    ==== KCC CONNECTION FAILURES =========================== (none)

    ==== KCC LINK FAILURES ===============================     SiteY\DC1Y        
        DC object GUID: 7c2eb482-ad81-4ba7-891e-9b77814f7473         
        46 consecutive failures since 2008-08-12 22:14:39.
    SiteZ\DC1Z        DC object GUID: fh3h8bde-a928-466a-97b0-39a507acbe54        
        No Failures.

    The output above identifies the Destination DC as (DC2X) in siteX that is failing to inbound replicate from siteY. In some cases the DC name is not resolved and shows as a GUID ( If the DC name is not resolved determine the hostname of the Destination DC by pinging the fully qualified CNAME:


    NOTE: DC2X may or may not be logging Error events in its Directory Services event log like the DC1X the ISTG is.

    3) Logon to the Destination DC identified in the previous step and determine if RPC connectivity from the Destination DC to the Source DC (DC1Y) is working.

    repadmin /bind

    • If “repadmin /bind DC1Y” from the Destination DC succeeds:

    Run “repadmin /showrepl <Destination DC>” and examine the output to determine if Active Directory Replication is blocked. The reason for replication failure should be identified in the output. Take the appropriate corrective action to get replication working.

    • If “repadmin /bind DC1Y” from the Destination DC fails:

    Verify firewall rules are not interfering with connectivity between the Destination DC and the Source DC. If the port blockage between the Destination DC and the Source DC cannot be resolved, configure the other DCs in the site where the errors are logged to be Preferred Bridgeheads and force KCC to build new connection objects with the Preferred Bridgeheads only.

    NOTE: Running "repadmin /bind DC1Y” from the ISTG logging the KCC errors may reveal no connectivity issues to DC1Y in the remote site. As noted earlier, the ISTG is responsible for maintaining inter-site connectivity and may not be the DC having the problem. For this reason the command must be run from the Destination DC that repadmin /failcache identified as failing to inbound replicate

    A successful bind looks similar to this:

    C:\>repadmin /bind DC1Y
    Bind to DC1Y succeeded.
    NTDSAPI V1 BindState, printing extended members.
        bindAddr: DC1Y
    Extensions supported (cb=48):
        BASE                             : Yes
        ASYNCREPL                        : Yes
        REMOVEAPI                        : Yes
        MOVEREQ_V2                       : Yes
        GETCHG_COMPRESS                  : Yes
        DCINFO_V1                        : Yes
        RESTORE_USN_OPTIMIZATION         : Yes
        KCC_EXECUTE                      : Yes
        ADDENTRY_V2                      : Yes
        LINKED_VALUE_REPLICATION         : Yes
        DCINFO_V2                        : Yes
        CRYPTO_BIND                      : Yes
        GET_REPL_INFO                    : Yes
        STRONG_ENCRYPTION                : Yes
        DCINFO_VFFFFFFFF                 : Yes
        TRANSITIVE_MEMBERSHIP            : Yes
        ADD_SID_HISTORY                  : Yes
        POST_BETA3                       : Yes
        GET_MEMBERSHIPS2                 : Yes
        NONDOMAIN_NCS                    : Yes
        GETCHGREQ_V8 (WHISTLER BETA 1)   : Yes
        XPRESS_COMPRESSION               : Yes
        DRS_EXT_ADAM                     : No
    Site GUID: stn45bf5-f33f-4d53-9b1b-e7a0371f9a3d
    Repl epoch: 0
    Forest GUID: idk4734-eeca-11d2-a5d8-00805f9f21f5
    Security information on the binding is as follows:
        SPN Requested:  LDAP/DC1Y
        Authn Service:  9
        Authn Level:  6
        Authz Service:  0

    4) If these events occur at specific periods of the day or week and then they resolve on their own, verify DNS Scavenging is not set too aggressively. It could be DNS Scavenging is so aggressive that SRV, A, CNAME and other valid records are purged from DNS causing name resolution between DCs to fail. If this is the behavior you are seeing, verify scavenging settings on these DNS zones:

    • Scavenging settings need to be checked on child domains if the Source or Destination DCs are in child domains.

    Example: if Scavenging is set this way the outage will occur every 24 hours:

    Non-refresh period: 8 hours
    Refresh period: 8 hours
    Scavenging period: 8 hours

    To correct this change the Refresh and Non-refresh periods to 1 day each and set scavenging to 3 days. See Managing the aging and scavenging of server data on Technet to configure these settings for the DNS Server and/or zones.

    Hopefully this clears up the mysterious KCC errors on that one DC.

    - David Everett

  • Port Exhaustion and You (or, why the Netstat tool is your friend)

    Hi, David here. Today I wanted to talk about something that we see all the time here in Directory Services, but that doesn’t usually get a lot of press. It’s a condition we call port exhaustion, and it’s a problem that will cause TCP and UDP communications with other machines over the network to fail.

    Port exhaustion can cause all kinds of problems for your servers. Here’s a list of some symptoms:

    - Users won’t be able to connect to file shares on a remote server
    - DNS name registration might fail
    - Authentication might fail
    - Trust operations might fail between domain controllers
    - Replication might fail between domain controllers
    - MMC consoles won’t work or won’t be able to connect to remote servers.

    That’s just a sample of the most common symptoms that we see. But here’s the big one: You reboot the server(s) involved, and the problem goes away - temporarily. A few hours or a few days later, it comes back.

    So what is port exhaustion? You might think that it’s where the ports on the computer get tired and just start responding slower over time – but, well, computers aren’t human, and they certainly aren’t supposed to get tired. The truth is much more insidious. What port exhaustion really means is that we don’t have any more ports available for communication.

    Now, some administrators out there are going to suspect a memory leak of some kind when this problem happens, and it’s true that memory leaks can cause the same type of issues (I’ll explain why in a moment). But usually we find that most of the time, memory isn’t the issue, and you can end up trying to troubleshoot memory problems that aren’t there.

    In order to understand port exhaustion, you need to first understand that everything I listed above requires servers to be able to initiate outbound connections to other servers. It’s the word outbound that’s important. We usually think of network connectivity requirements in inbound terms – our clients need to connect to a server on a specific TCP or UDP port, like port 80 for web browsing or port 445 for file shares (SMB). But we very rarely think about the other side of that, which is that the communication has to have a source port available to use.

    As you might know, there are 65,535 ports available for TCP and UDP connections in TCP/IP. The first 1024 of those are reserved for specific services and protocols to use as senders or listeners. For example, DHCP requests will always come from port 67 on a client, and the DHCP service (the server component) always listens on port 68. That means that they listen on these ports for inbound communications. Beyond that, ports get dynamically assigned to services and applications for either inbound or outbound use as needed. A port can normally only do one thing – we can either use it to listen for connections from other machines on the network, or we can use it to initiate connections to other machines on the network, but we usually can’t do both (some services cheat and use ports bi-directionally, but this is relatively rare).

    So 65535–1024 is still 64511 ports. That’s a lot! We should almost never run out, right? You’d think so, but there’s another limitation here that you might not be aware of, and that limitation is that we don’t actually use the full range of ports for any dynamic communications. Dynamic communication is any sort of network communication that doesn’t already have a port specifically reserved for sending or receiving it – in other words, the vast majority of network traffic that a Windows computer generates.

    By default in the Windows operating system, we only have a limited number of ports available for outbound communications. We sometimes call these user ports, because user-mode processes are what we really expect to be using these things most often. For example, when you connect to a file server to access a file, you’re connecting to (usually) either port 445 or port 139 on the other side to retrieve that file. However, in order to negotiate the session, you need a client port on your computer to use for this, and so the application making the connection (Windows Explorer, in the case of browsing files) gets a dynamically-assigned port to use.

    Since we only have a limited number of ports available by default, you can run out of them – and when you run out, you’re no longer able to make new outbound connections from your computer to other computers on the network. This can cause an awful lot of communication to break down – including the communication that’s needed to authenticate users with Kerberos.

    In Windows XP/2003 (and earlier) the dynamic port range that we use for this was 1024-5000 by default. So, you had a little less than 4000 ports available for outbound network communication. Ports above that range were generally reserved for application listeners. In Windows Vista and 2008, we changed that range to be more in line with IANA recommendations. If you’re curious, you can read the KB article here. The upshot of the changes is that we actually have a larger default dynamic range in Vista and 2008, but we also messed up everyone who’s ever configured internal firewalls to block high ports (which by the way is something we don’t recommend doing on an internal network. Either way, the end result is that you’ve got a few more ports available to use by default in Vista and 2008.

    Even so, it’s still possible to run out of ports. And when this happens, communication starts to break down. We run into this scenario a lot more often than you might think, and it causes the types of issues I detailed above. 99% of the time when someone has this problem, it happens because an application has been grabbing those ports and not releasing them properly. So, over time, it uses up more and more ports from the dynamic range until we run out.

    In most networks there are potentially dozens, if not hundreds, of different applications that might be communicating with other servers over the network – security tools, management and monitoring tools, line of business applications, internal server processes, and so on. So when you have a problem like this, narrowing down which application is causing the problem can be a challenge. Fortunately, there are a couple of tools that make this easier, and the best part is, they come with the operating system.

    The first tool is NETSTAT. Netstat queries the network stack and shows you the state of your network connection, including the ports you’re using. Netstat can tell you which ports are in use, where the communication is going, and what application has the port open.

    Another cool tool is Port Reporter. Port Reporter does everything that Netstat does, but it runs in real-time rather than just a point-in-time snapshot like Netstat does. Netstat is included in Windows, but you can download Port Reporter for free from our website. (All my examples in this blog will use Netstat).

    So, if you suspect that you might have a port exhaustion problem, then you’d want to run this command:

    netstat –anob > netstat.txt

    This runs Netstat and dumps the output to a text file. You’d want to use a text file since trying to look at the output inside a command prompt is a quick way to give yourself a migraine. Once you’ve done this, you can examine the text file output, and you’ll be able to see what processes are using up ports. What you want to look for is entries where the same process is using a lot of ports on the machine. That is the most likely culprit.

    Here’s an example of what you get with netstat (I’ve snipped it for length)

    C:\Windows\System32\netstat –ano


    Notice that you can see the port you’re using locally, the one you’re talking to remotely, and what the state of the connection is. You can also get the process ID (that’s the o switch in the netstat command), and you can even have netstat try to grab the name of the process (use netstat –anob).

    C:\Windows\System32\netstat -anob


    What you’re looking for in the output is a single process that is using up a large number of ports locally. So for example, on my machine above we can see that PID 608 is using several ports. Usually what will happen when you run into port exhaustion is that you will see that one (or two) processes are using 90-95% of the dynamic range. The other piece of information to look at is where they’re talking to remotely, and what the state of the connection is. So, if you see a process that’s using up a lot of ports, talking to a single remote address or several remote addresses, and the state of the connection is something like TIME_WAIT, that’s usually a dead giveaway that this process is having a problem and not releasing those ports properly.

    Once you have this information, you can usually get things working again by turning off the offending process – but that’s only a temporary fix. Odds are, whatever was causing the problem was a legitimate piece of software that you want to have running. Usually when you get to this stage we recommend contacting the vendor of that application, or taking a look at whatever other servers the application might be communicating with, in order to get a permanent fix.

    I mentioned above that memory leaks can cause this behavior too – why is that exactly? What happens is that in order to get a port to use for an outbound connection, processes need to acquire a handle to that port. That handle comes out of non-paged pool memory. So, if you have a memory leak, and you run out of non-paged pool, processes that need to talk to other machines on the network won’t be able to get the handle, and therefore won’t be able to get the port they need. So if you’re looking at that Netstat output and you’re just not seeing anything useful, you might still have a memory issue on the server.

    At this point you really should be contacting us, since finding and fixing it is going to require some debugging. Cases that get this far are rare however, and most of the time, the Netstat output is going to give you the smoking gun you need to find the offending piece of software.

    - David “Fallout 3 Rules” Beach

  • File Server Migration Toolkit (FSMT) 1.1 Released

    Ned here. The Remote File System developer team wanted us to let you know about the release of FSMT 1.1. Here's their 'press release'. :-) 




    Microsoft is glad to announce the release of Microsoft File Server Migration Toolkit 1.1.  With this version you will be able to migrate and consolidate shared folders from servers running Windows NT Server 4.0, Windows 2000 family of servers, Windows 2003 family of servers, Windows Server 2008, or Windows Storage Server 2008 to a server running Windows Server 2003, Windows Storage Server 2003, Windows Server 2008 or Windows Storage Server 2008.  


    This new version comes with support added to Windows Server 2003 as well as Windows Server 2008, both on x86 and x64 systems, and its available in 5 languages (English, French, German, Japanese and Spanish).


    FSMT 1.1 can be downloaded from the Microsoft Download Center site.


    To have more information about FSMT please visit the Microsoft File Server Migration Toolkit Web Site


    Read the updated FSMT Whitepaper, with FSMT 1.1 information 




    Make sure you stop by their blog at if you have questions or comments. 


    - Ned Pyle

  • New KB Articles 10/19-10/26

    New KB articles related to Directory Services for the week of 10/19-10/26.


    Certification Authority Service Startup Failure


    W32time Service does not start with the Error "System Error 126 has Occurred" "The Specified module could not be found"


    When you enable field engineering on an AD LDS or AD AM directory service on a Windows Server 2003-based or Windows Server 2008-based computer, an LDAP query is executed more slowly than expected, and Event ID 1699 is logged


    You may not be able to add or remove additional namespace servers using the DFS management console in Windows Server 2003 R2


    Error message when you try to store a security descriptor by using an administration tool or a script in Windows Server 2003: "The security ID structure is invalid Facility: Win32 ID no: 80070539"


    Software Restriction Policy Enforcement set to “All Software Files” causes checks against paths/files that are invalid


    A cross-domain Web single sign-on fails if there is a small time difference between Active Directory Federation Services in Windows Server 2003 R2 systems and IBM Tivoli Federated Identity Manager


    Error event IDs 2014 and 2004 and other Error events may be logged when you try to perform a replication on a Windows Server 2003 R2-based server that has DFSR installed


    You are prompted unexpectedly to enter your credentials when you access a SharePoint Server site from a Windows Vista-based or Windows Server 2008-based client computer that has a proxy server configured


    A user encounters an offline file sync conflict shortly after a successful synchronization on a Windows Vista-based or a Windows Server 2008-based client computer


    On a Windows-based computer, NTFS alternate data streams are lost on a shared folder that has the Offline Files feature enabled


    Domain local group from foreign domain can be added using "net localgroup" and GC search


    Installation of applications from network share results in an error: "Windows cannot access the specified device, path, or file"


    Cannot find the certificate request associated with this certificate file. A certificate request must be completed on the computer where it was created


    The "Active Directory Users and Computers" MMC snap-in crashes when you create a computer account in this MMC snap-in on a computer on which Windows Server 2003 was installed by using installation media that has SP2 slipstreamed


    Error message when you try to open some MMC 3.0 snap-ins in a localized version of Windows XP Service Pack 3: "MMC could not create the snap-in. The snap-in might not have been installed correctly."


    AD LDS service start fails with error "setup could not start the service..." + error code 8007041d


    DFSR may not operate correctly when used in conjunction with FSRM file screens


    Windows Server system software that is not supported in a Hyper-V virtual machine environment


    Error message when you try to install the certification authority role on a Windows Server 2008-based computer: "Cannot install Certification Authority"


    The "Set roaming profile path for all users logging onto this computer" Group Policy setting also applies to local user accounts in Windows Server 2008


    Moving DFSR Migration to the ELIMINATED state logs a misleading wrong event regarding read only domain controller objects


    USMT fails to install on Windows Server


  • SSL/TLS Record Fragmentation Support

    This is Jonathan Stephens from the Directory Services team, and I wanted to share with you a recent interoperability issue I encountered. An admin had set up an Apache web server with the OpenSSL mod for SSL/TLS support. Users were able to connect to the secure web site using Firefox, but when they tried to use Internet Explorer the connection failed with the following error: The page cannot be displayed. We were asked to investigate what was happening and fix it if possible.

    When connecting to an SSL-enabled web site with Internet Explorer, the client and server must negotiate an SSL session during a process called the SSL (or TLS) Handshake. The client and server exchange what are called records, each record containing information relevant to a step in the negotiation process. Describing the entire Handshake process is beyond the scope of this post, but you can find more information here.

    Note: SSL 3.0 is a proprietary protocol developed by Netscape Communications. TLS 1.0 is an Internet Standard (RFC 2246) based upon that proprietary protocol. Functionally, there is little difference between SSL 3.0 and TLS 1.0, and for the purposes of this discussion the two are identical.

    As part of the handshake process, the server sends its list of trusted root certificates to the client in the form of a non-encrypted record. This is done so that if the server requires that the client have a digital certificate for authentication, the client is able to select one that will chain up to a root certificate trusted by the server. While there is no defined limit on the number root certificates that can be in this list, there is a limitation on the size of the records exchanged between the client and the server. This limit is defined in RFC 2246 as 16,384 bytes.

    So how does the Handshake protocol handle those scenarios where the list of trusted root certificates exceeds 16,384 bytes? RFC 2246 describes a process called record fragmentation, where any data that would exceed the 16KB record limit is split across multiple fragments. These fragments must be merged into one record by the client in order to retrieve the data.

    Let’s set that aside for a moment and talk briefly about SSL/TLS in Windows. The SSL/TLS protocol is implemented as a security package in Windows; this package is called SChannel, and the associated library is schannel.dll. A Windows application that needs to support SSL/TLS as either a client or a server can use Windows-integrated authentication to leverage the capabilities of the SChannel security package. Two such applications are Internet Explorer (IE) and Internet Information Services (IIS), the Windows web server. Other non-Microsoft products may have their own implementations of SSL/TLS and so would not use SChannel.

    This is precisely what our admin discovered while he was investigating this issue. He found that while users were unable to connect to the web site with IE, they could connect successfully with a third party browser – Firefox.

    To understand exactly what was happening, we took a network trace between IE and the Apache server. In that trace, we could clearly see that the list of root certificates sent to the client by the server was split across two records. The first was 16,384 bytes and the second was 153 bytes.

    The problem here is that SChannel does not support record fragmentation. When receiving data split across multiple records SChannel is not able to merge the data, so when record fragmentation is encountered the Handshake will fail resulting in a failed connection. On the server side, for example IIS, SChannel will truncate data above 16,384 bytes in order to fit it into one record. There are other implementations of SSL/TLS that do support record fragmentation, such as OpenSSL and Firefox. This explains why this problem wasn’t seen when Firefox was used.

    In the vast majority of cases, this does not present a problem. Most of the record data exchanged during the Handshake process is considerably smaller than the 16KB limit defined in the RFC. The potential exception to this is the trusted root certificate list record. If a server trusts more than approximately 100 root certificates the root certificate list could exceed the 16KB limit. Please note the use of the word “approximately”. The actual number of root certificates can vary from environment to environment, and should be determined by testing. Microsoft cannot provide a precise number because limitation is based solely on the total size of the data in the record rather than the number of entries, which can vary in length.

    In the case of IIS, where SChannel is leveraged for the server side of the Handshake, SChannel will truncate the list of trusted root certificates as I mentioned above. This behavior is described in the following KB article:

    933430 Clients cannot make connections if you require client certificates on a Web site or if you use IAS in Windows Server 2003

    The above article describes a 12,288 byte limit for the root certificate list. The hotfix described in that article simply increased that limit to the full 16,384 byte limit defined by the RFC. In those cases, however, where the root certificate list exceeds 16KB, the list will still be truncated by SChannel before the record is sent from the server to the client.

    When using IIS, the above article describes some specific steps an administrator can take to work around this limitation in SChannel. In cases such as this one, where the web server supports fragmentation but the client does not, the only option is to reduce the number of trusted root certificates to get the size under the 16KB limit for a single record.

    In some environments, the lack of support for record fragmentation in SChannel can lead to interoperability problems – failed connections, invalid client certificates, etc. Identifying problems associated with fragmentation is pretty simple; analyzing a brief network trace is usually sufficient to pinpoint instances of fragmentation. As I stated earlier, we usually see this problem in relation to the number of root certificates that are trusted by the server, and currently, the only way we have to resolve this issue is to remove unneeded roots from the server side. We hope to eliminate this problem completely in a future version of Windows.

    UPDATE 8/25/2010: Someone pointed out that I should update this blog post to make clear that the "future version of Windows" referenced above is Windows 7. Sort of. In order to support interoperability with other implementations of SSL/TLS, Windows 7 and Windows Server 2008 R2 both support coalescing fragmented SSL/TLS records on the receiving side, but Windows does not support fragmenting records on the sending side. Any outbound record that exceeds 16KB will still be truncated as described above.

    - Jonathan Stephens

  • DFSDIAG in a nutshell

    Ned here. Our developer team colleagues at the File Cabinet have posted an interesting article on the DFSDIAG tool. Introduced with Windows Server 2008, this utility is excellent for testing, documenting, and troubleshooting your DFS Namespaces environment. Make sure you give the article a read.

    What Does DFSDIAG Do? (FileCabinet Blog) 

    PS: not be confused with the DFSRDIAG tool, which is used with DFSR. Don't worry, I do it all the time myself. :-)

    - Ned Pyle

  • New KB Articles 10/12-10/19

    New KB articles related to Directory Services for the week of 10/12-10/19.


    An SSL connection may fail when you use Internet Explorer to make an SSL connection to an HTTPS Web site that is certified by a Digital Signature Standards (DSS) certificate on a Windows XP-based computer


    Copy process is very slow when you copy large files from one computer to another computer in a high-bandwidth network environment if both computers are running either Windows Vista or Windows Sever 2008


    Windows Search may fail if you search a network folder from the toolbar in the Windows Explorer while offline on a computer that is running Windows Vista or Windows Server 2008


    A Windows Vista-based or Windows Server 2008-based computer behind a NAT device cannot communicate with another computer through an IPsec tunnel-mode connection


    How to configure DFSR logging


    The LSASS.exe process crashes and the computer restarts when you try to start the Network Access Protection Agent service on a Windows XP Service Pack 3 -based client computer


    You cannot enroll for a certificate that is larger than 4096 bits on an SCEP client in Windows Server 2008


  • Getting a CMD prompt as SYSTEM in Windows Vista and Windows Server 2008

    Ned here again. In the course of using Windows, it is occasionally useful to be someone besides… you. Maybe you need to be an Administrator temporarily in order to fix a problem. Or maybe you need to be a different user as only they seem to have a problem. Or maybe, just maybe, you want to be the operating system itself.


    Think about it. What if you are troubleshooting a problem where an agent process like the SMS Client isn’t working? Or an anti-virus service is having issues reading the registry? If only we had some way to look at things while logged in as SYSTEM.

    What is SYSTEM and why is Vista/2008 special?

    SYSTEM is actually an account; in fact, it’s a real honest-to-goodness user. Its real name is “NT Authority\Local System” and it has a well-known SID of S-1-5-18. All Windows computers have this account and they always have the same SID. It’s there for user-mode processes that will be executed as the OS itself.

    This is a bit tricky in Windows Vista and Windows Server 2008 though. In previous operating systems you could simply start a scheduled task CMD prompt and have it interact with the desktop easily. This was construed as a security hole to some people, so in Vista/2008 it’s not possible anymore.

    So how can we take off our glasses and put on the cape with the big red S?

    Method one - PSEXEC

    An easy way to get a CMD prompt as SYSTEM is to grab PSEXEC from Microsoft Sysinternals:

    1. Download PSEXEC and unzip to some folder.

    2. Open an elevated CMD prompt as an administrator.

    3. Navigate to the folder where you unzipped PSEXEC.EXE

    4. Run:

         PSEXEC -i -s -d CMD

    5. You will have a new CMD prompt open, as though by magic.

    6. Type the following in the new CMD prompt to prove who you are:

         WHOAMI /USER


    There you go – anything that happens in that CMD prompt or is spawned from that prompt will be running as SYSTEM. You could run regedit from here, start explorer, or whatever you need to troubleshoot as that account.

    That was pretty easy – why do I have some more methods below? Unfortunately, in several previous versions of the PSEXEC tool the –s (system) switch has not worked. As of version 1.94 it does work again, but that is no guarantee for the future. This brings us to a more iron-clad technique:

    Method two - REMOTE

    We can use the REMOTE.EXE tool which comes as part of the Windows Debugger. While it’s a bit more cumbersome, it will always work:

    1. Download the Windows Debugger (x86 or x64) and install it anywhere (we just need its copy of REMOTE.EXE, so feel free to copy that file elsewhere and uninstall the debugger when done; in the example below I installed to “c:\debuggers”).

    2. Open an elevated CMD prompt as an administrator.

    3. Run:

      AT <one minute from now> c:\debuggers\remote.exe /s cmd SYSCMD

    Where you use 24-hour clock notation (aka ‘military time’). For example, right now it is 3:57PM, so I type:

      AT 15:58 c:\debuggers\REMOTE.EXE /s cmd SYSCMD

    4. Then once 15:58 (3:38PM) is reached, you can run:

      C:\debuggers\REMOTE.EXE /c <your computer> SYSCMD

    Where you are typing your computers’ own NetBIOS name. So for example:

      C:\debuggers\remote.exe /c nedpyle04 SYSCMD


    Neato. I used REMOTE to connect to REMOTE on the same computer. This is a good example of a client-server RPC application. The SYSCMD option I keep using is just a marker that identifies the remote session. Technically you could have lots of these going at once, each with a different marker.

    If I then use WHOAMI /USER again, the proof:


    To leave just type EXIT


    Method two and a half – REMOTE and the Task Scheduler

    Maybe you want to have REMOTE ready to go at a moment’s notice (you plan to do this a lot, eh)? Or what if you want to use one of the other SYSTEM-type accounts, like “Local Service” and “Network Service”? PSEXEC can’t do that and neither can the old AT command.

    Here’s some XML and commands you can use to make the server portion of REMOTE be ready at an instant for various accounts. This time we’ll use the newer, slicker SCHTASKS tool:

    1. Copy the following sample into notepad and save as <something>.xml (in my sample below, I save to “c:\temp\RaS.xml”)

    <?xml version="1.0" encoding="UTF-16"?>
    <Task version="1.2" xmlns="">
      <Triggers />
        <Principal id="Author">
      <Actions Context="Author">
          <Arguments>/s cmd SYSCMD</Arguments>

    Note the highlighted elements above. You will need to make sure that these paths match where REMOTE.EXE is located. Also, the UserID can be set to anything you like, including “nt authority\local service” or “net authority\network service”.

    2. Open an elevated CMD prompt as an administrator.

    3. Run:

       SCHTASKS /create /tn <some task name> /xml <path to xml file>

    Where you provide a real task name and XML file. For example:

       SCHTASKS /create /tn RemoteAsSystem /xml c:\temp\RaS.xml


    4. This created a scheduled task with all the REMOTE info filled out.

    5. Now we can run the REMOTE server piece anytime we want, as often as we want with:

       SCHTASKS /run /tn RemoteAsSystem


    6. Now we can connect with just like we did back in method two:


    That’s it. Hopefully you find this useful someday (or maybe I should hope you never have to find it useful). Got a comment, or another way to do this? Let us know.

    - Ned “Nubbin” Pyle