Blog - Title

December, 2007

  • Troubleshooting networks without NetMon

    Hi, Ned here. You may already be asking yourself why I’m writing about network troubleshooting. Isn’t this the Directory Services blog? Don’t we just care about Kerberos and group policies and the like? Shouldn’t the Networking team do all this heavy TCP/IP lifting?

    Well, without the network, Active Directory and all its little pieces don’t really amount to much. We are a customer of networking ourselves and that means to be effective DS engineers we have to understand the infrastructure that moves all our data around. Otherwise when this important component fails we can’t really determine if DS is having issues or the underlying structure it relies on is in trouble. To be frank, we work a lot of cases here in 3rd tier support that came in as Directory Services symptoms and left resolved as network issues. At one point, 80% of all our DS cases could be tracked back to DNS configuration problems!

    We can’t all be network trace gurus though – it takes a lot of time and experience to get to the point where you can look at a capture in NetMon3.1 (or Wireshark, Ethereal, Packetyzer, etc.) and make meaningful sense of all the details. So what are your options if you suspect a networking problem and you don’t feel that NetMon is in your league? You can call us in Microsoft support, or you can use other tools that are simpler and often just as effective to figure out your issue. That’s what we’ll do today.

    One quick note – I’m sticking with IPv4 here since that’s 99.999∞% of what you’ll see.

    Network troubleshooting from 30,000 feet

    Here’s an extremely unattractive flowchart I put together that covers the basic process. We are going into a great deal more detail below.

    clip_image001

    At its core, we will always troubleshoot the same way:

    1. What’s our symptom and failing component?
    2. Do we have basic network connectivity?
    3. Do we have good name resolution?
    4. Can we test our failing component using reliable tools?

    You may be saying ‘What the heck? Does this guy think I was born yesterday?’ but trust me – plenty of engineers that should know better often rush into step 4 when they really didn’t have a good understanding of step 1 or without trying the basics in steps 2 and 3. Especially when servers are down, the boss is screaming, and the company is losing money.

    Note: Unless specified, everything we do here will be from the computer that is reporting the problem or having the symptom. In all examples the network settings are:

    IP address – 10.10.0.128 (SRC-CLIENT-01.contoso.com)
    Subnet Mask - 255.255.0.0
    Default Gateway - 10.10.0.1
    DNS Server - 10.20.0.20 (DNS-01.contoso.com)
    WINS Server - 10.20.0.30
    Our Destination DC - 10.30.0.166 (DEST-DC-01.contoso.com)

    1. What’s our symptom and failing component?

    We’re troubleshooting something not working– what exactly? Since this is a Directory Services blog I’m going to be greedy and focus on DS components. Are domain controllers not replicating SYSVOL? Are users unable to logon? Is group policy not applying? You need to understand the component in question in order to test it at the Application layer of OSI-TCP/IP.

    clip_image002

    2. Do we have basic network connectivity?

    Next we will determine if the lower layers are working ok. It’s very possible that our component is just one of many victims, but no one else is complaining as loudly. Let’s break out a snippet from the flowchart and follow it with some utilities.

    clip_image004

    Connectivity test with PING– built-in tool in all supported Windows versions

    • Can we verify our own local networking with:

    PING 127.0.0.1
    PING 10.10.0.128
    PING 10.10.0.1

    All should return:

    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss)

    This tests if our NIC responds at all, if our own IP address works, and if we can reach our gateway. If we can’t even reach our gateway but the NIC responds, we probably have a local software firewall issue. Also keep in mind, that most hardware firewalls (often default gateways in customer environments these days.) do not allow you to ping their interfaces. If you know for sure that the firewalls private network interface is working it is OK if it fails to respond to a ping.

    • Can we ping between our problem computer and the destination that our component is trying to reach with:

    PING 10.30.0.166
    PING DEST-DC-01.contoso.com
    PING DEST-DC-01

    This proves that we can get to the machine at all on the wire both with and without name resolution. We can only use this test if your network allows ICMP – some customers decide to turn it off internally on routers and private firewalls (and no, I really haven’t ever heard a good reason why – the days of malware/hackers using ICMP to find machines on a LAN are ten years behind us; I welcome comments on this). If pinging by address fails, it’s important to read the error – DESTINATION UNREACHABLE or REQUEST TIMED OUT means routing is having issues and we should move to the routing tests. COULD NOT FIND HOST means name resolution is broken and we should move to the name resolution tests. You may also want to ping with the –F –L 1472 command to verify that we can ping without fragmenting a 1500 byte packet.

    Routing tests with TRACERT/PATHPING/ARP/ROUTE - built-in tools in all supported Windows versions

    • Can we check which routes we’re taking and where the traffic dies with:

    PATHPING 10.30.0.166
    or
    TRACERT 10.30.0.166

    Both tools accomplish basically the same thing – letting you know where you travel on the network to reach your destination, and where the journey fails. TRACERT shows fairly quick, basic info:

    Tracing route to DEST-DC-01.contoso.com [10.30.0.166] over a maximum of 30 hops:

    1 1 ms 1 ms <1 ms router1.network.contoso.com [10.10.0.1]
    2 <1 ms 1 ms <1 ms router2.network.contoso.com [10.30.0.1]
    3 <1 ms <1 ms <1 ms DEST-DC-01.contoso.com [10.30.0.166]

    Whereas PATHPING trades speed for more details:

    Tracing route to DEST-DC-01.contoso.com [10.30.0.166] over a maximum of 30 hops:

    0 SRC-CLIENT-01.contoso.com [10.10.0.128]
    1 router1.network.contoso.com [10.10.0.1]
    2 router2.network.contoso.com [10.30.0.1]
    3 DEST-DC-01.contoso.com [10.30.0.166]

    Computing statistics for 75 seconds...

    Source to Here This Node/Link
    Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
    0 SRC-CLIENT-01.contoso.com [10.10.0.128]
    0/ 100 = 0% |
    1 0ms 0/ 100 = 0% 0/ 100 = 0% router1.network.contoso.com [10.10.0.1]
    0/ 100 = 0% |
    2 0ms 0/ 100 = 0% 0/ 100 = 0% router2.network.contoso.com [10.30.0.1]
    0/ 100 = 0% |
    3 0ms 0/ 100 = 0% 0/ 100 = 0% DEST-DC-01.contoso.com [10.30.0.166]

    • It’s not usually necessary, but you can also see further routing details with:

    ARP –a
    and
    ROUTE PRINT

    3. Do we have good name resolution?

    We’re only in this step if we failed some of our earlier checking, or if we simply feel that we only have partial name resolution (for example, a DC might have it’s a record but be missing CNAME and SRV records needed for functionality). So now we’ll run through some tests to see why our name resolution isn’t working or to verify that we have all the records we need for our component.

    clip_image006

    Note: It’s important that before you do any name resolution testing you always start with the following commands to ensure that you are not using cached information:

    IPCONFIG /flushdns
    NBTSTAT -R

    Name resolution tests with NSLOOKUP - built-in tool in all supported Windows versions

    • Can we get the DNS server to give us back the A record with:

    NSLOOKUP DEST-DC-01.contoso.com 10.20.0.20

    This will return:

    Server: DNS-01.contoso.com
    Address: 10.20.0.20

    Name: DEST-DC-01.contoso.com
    Address: 10.30.0.166

    Using the fully qualified domain name lets us know A record lookups are working. The important part about using NSLOOKUP is that it actually uses UDP DNS lookups, whereas the DNSCMD command below makes an RPC connection to the DNS to return data, and isn’t a valid test of the DNS protocol itself.

    Name resolution tests with DNSCMD and NSLOOKUP (if appropriate) – support tools download for Windows 2000/XP/2003

    • Can we get the DNS server give us back the CNAME and SRV records of our DC’s with:

    DNSCMD /EnumRecords _msdcs.contoso.com @ /Type CNAME

    and

    NSLOOKUP

    >set type=all

    _ldap._tcp.dc._msdcs.contoso.com

    _kerberos._tcp.dc._msdcs.contoso.com

    This is usually important for Directory Services engineers because the A record is only part of the puzzle. We also care about SRV records and CNAME records. That’s how AD works when it comes to LDAP, Kerberos, replication, and so on. So if you suspect one of those technologies has a name resolution issue this is appropriate to test.

    Name resolution tests with NBTSTAT (if appropriate) - built-in tool in all supported Windows versions

    • Can we get WINS to give us back the records with:

    NBTSTAT -c
    NBTSTAT -n

    This is important since despite all efforts to the contrary, WINS and NetBIOS name resolution are still part of many products, including DFS Namespaces, Netlogon, Terminal Services licensing, and much more.

    If all these name resolution steps check out, it’s time to move to the Application layer testing phase.

    4. Can we test our failing component using reliable tools?

    The one you’ve been waiting for. At this stage we’ve eliminated the overall possible general network connectivity issues, and we suspect that just our component is a victim. If the network is fine, the mostly likely problems are filtered firewall rules and the application layer itself. Let’s go down some common paths to figure it out.

    clip_image008

    LDAP tests with LDP and PORTQRY – support tools download for Windows 2000/XP/2003; download Portqry.

    • Can we verify that LDAP is listening on DC/GC’s with:

    PORTQRY -n DEST-DC-01.contoso.com -p tcp -e 389
    PORTQRY -n DEST-DC-01.contoso.com -p tcp -e 636
    PORTQRY -n DEST-DC-01.contoso.com -p both -e 3268
    PORTQRY -n DEST-DC-01.contoso.com -p tcp -e 3269

    Here’s a sample of working output from the first command:

    TCP port 389 (ldap service): LISTENING

    Using ephemeral source port
    Sending LDAP query to TCP port 389...

    LISTENING is good. :-) TCP-based LDAP ports should always be listening on DC/GC’s and never return NOT LISTENING or FILTERED. UDP-based ports should return LISTENING or FILTERED (as they are connectionless). Seeing TCP as FILTERED or anything as NOT LISTENING should be a red flag to find out why someone has configured a firewall to block or manipulate LDAP traffic.

    NOTE: You should see more data then what is listed in the blog example.

    • Can we connect to the domain controllers with LDP:

    LDP
    Connection --> Connect --> DEST-DC-01.contoso.com
    Connection --> Bind
    View --> Tree --> Select the domain naming context
    Browse a few levels deep.

    By doing the above with a reliable tool (i.e. not an application that does many things unspecific to LDAP and often use ADSI rather than pure LDAP) we can see if unadulterated LDAP binds and queries are working. We also know that authentication is working.

    SMB tests with NET USE and PORTQRY - download Portqry.

    • Can we verify that SMB is listening on port 138 and 445 with:

    PORTQRY -n DEST-DC-01.contoso.com -p udp -e 138
    PORTQRY -n DEST-DC-01.contoso.com -p both -e 445

    The same diatribe above applies here for LISTENING versus FILTERED. If we cannot get to 138 and 445 over the network, endless zillions of components will fail – follow that link to see what I mean, it’s a good one. If SMB is blocked via firewall rules, file sharing, group policy, named pipes, and many other applications will fail.

    • Can we connect over SMB (as an administrator) with:

    NET USE \\DEST-DC-01.contoso.com\C$ /p:n

    This simple and reliable test tells us that we can map a drive through SMB to the server. It also validates that at least NTLM authentication is working (to only use NTLM, use an IP address). You could use KLIST or KERBTRAY from the Resource Kit to confirm if there’s a Kerberos TGS ticket for that connection as well.

    RPC tests with COMPMGMT and PORTQRY - download Portqry.

    • Can we verify the endpoint mapper is available and returning data with:

    PORTQRY -n DEST-DC-01.contoso.com -p tcp -e 135

    The endpoint mapper should always be LISTENING on TCP 135 (never FILTERED or NOT LISTENING) and should return all of its registered endpoint ports and named pipes. If the endpoint mapper is blocked due to firewall rules, a great many applications will fail.

    • Can we connect to the destination server with:

    COMPMGMT.MSC
    Computer Management --> Connect to another computer
    Expand ‘System Tools’

    COMPMGMT is an included app with simple RPC connectivity needs at startup. This will generate several MSRPC binds, query and respond to several RPC endpoints, and generally is a good test of basic RPC functionality. The list of RPC-based applications (from Microsoft and elsewhere) is a mile long and includes such things as AD replication, FRS replication, DFS Replication, and more.

    PORTQRY scripting

    Finally, here’s a little batch file you can use to run PORTQRY with a set of standard DS-related queries and output to a file. This is a useful way to see if any ports are looking troublesome even if you’re not sure which ones to be looking for. For the sharp-eyed, yes HTTP/HTTPS is included. Why? Certificate Authority Web Enrollment issues – we do a lot more in MS DS support than deal with account lockouts. :-)

    @echo off
    REM Sample batch wrapper script for portqry.exe
    REM Designed to verify responsiveness of remote server specified on commandline
    REM Requires PORTQRY.EXE in same directory as script

    REM Example: checkports.cmd DEST-DC-01.contoso.com

    REM Please note that this script is provided "AS IS" with no warranties, and confers no rights.
    REM Use of included script sample is subject to the terms specified at
    REM
    http://www.microsoft.com/info/cpyright.htm

    ECHO Querying DNS
    Portqry -n %1 -p both -e 53 > %1_checkports.txt

    ECHO Querying DHCP
    Portqry -n %1 -p udp -e 67 >> %1_checkports.txt

    ECHO Querying HTTP
    portqry -n %1 -p tcp -e 80 >> %1_checkports.txt

    ECHO Querying Kerberos KDC Service
    portqry -n %1 -p both -e 88 >> %1_checkports.txt

    ECHO Querying NTP Time Service
    Portqry -n %1 -p udp -e 123 >> %1_checkports.txt

    ECHO Querying RPC EndPoint Mapper Service
    portqry -n %1 -p tcp -e 135 >> %1_checkports.txt

    ECHO Querying NetBIOS Name Service (WINS)
    portqry -n %1 -p both -e 137 >> %1_checkports.txt

    ECHO Querying NetBIOS Datagram Service
    portqry -n %1 -p udp -e 138 >> %1_checkports.txt

    ECHO Querying NetBIOS Session Service
    portqry -n %1 -p tcp -e 139 >> %1_checkports.txt

    ECHO Querying LDAP
    portqry -n %1 -p tcp -e 389 >> %1_checkports.txt

    ECHO Querying HTTP over SSL
    portqry -n %1 -p both -e 443 >> %1_checkports.txt

    ECHO Querying SMB
    portqry -n %1 -p both -e 445 >> %1_checkports.txt

    ECHO Querying Kerberos Logon
    portqry -n %1 -p both -e 464 >> %1_checkports.txt

    ECHO Querying LDAP over SSL
    portqry -n %1 -p tcp -e 636 >> %1_checkports.txt

    ECHO Querying Win2000/2003 AD Logon and Directory Replication
    portqry -n %1 -p tcp -o 1025,1026 >> %1_checkports.txt

    ECHO Querying Global Catalog
    portqry -n %1 -p both -e 3268 >> %1_checkports.txt

    ECHO Querying Global Catalog over SSL
    portqry -n %1 -p tcp -e 3269 >> %1_checkports.txt

    ECHO Querying Terminal Server / Remote Desktop
    Portqry -n %1 -p tcp -e 3389 >> %1_checkports.txt

    start notepad %1_checkports.txt

    Further reading:

    http://blogs.technet.com/networking/ Official MS Support blog of networking

    http://blogs.technet.com/netmon/ Official Dev blog of NetMon

    Download NetMon3.1

    Service overview and network port requirements for the Windows Server system

    Happy hunting.

    - Ned Pyle

  • An old-new way to get Group Policy Results

    Hi, Mike again. Here is the scenario: you’re sitting in front of a workstation that has been diagnosed with a Group Policy problem. You scurry to a command prompt and type the ever familiar GPRESULT.EXE and redirect the output to a text file. Then, proceed to open the file in your favorite text editor and then start scrolling through text to start your adventure in troubleshooting Group Policy. But, what if you could get an RSOP report like the one from the Group Policy Management Console (GPMC)—HTML based with sorted headings and the works? Well, you can!

    Let’s face it—the output for GPRESULT.EXE is not aesthetically pleasing to the eye. However, Windows Server 2008 and Windows Vista SP1 change this by including a new version of GPRESULT that allow you to have a nice pretty HTML output of Group Policy results, just like the one created when using GPMC reporting.

    Your new GPRESULT command is GPRESULT /H rsop.html. Running this command creates an .html file in the current directory that contains Group Policy results for the currently logged on user and computer. You can also add the /F argument to force Group Policy Results to overwrite the file name, should the file exist from a previous instance of GPRESULT. Also, if you or someone who signs your paycheck loves reporting and data mining, then GPRESULT has another option you’ll enjoy: change the /H argument to a /X and GPRESULT will provide Group Policy Results in .xml format (yes change the file extension to .XML too). You can then take this output (conceivably from many workstations) and store it in SQL and voila—reporting heaven.

    clip_image002

    Figure 1-HTML output from GPRESULT

    clip_image004

    Figure 2- XML output from GPRESULT

    All you text-based report lovers can relax because the new version still defaults to text-based reporting.

    I know I know… what about Windows Server 2003 and Windows XP? No worries, we can accomplish the same task, from the command line. We can use VBScript and the GPMC object model to provide a similar experience for those still using Windows Server 2003 or Windows XP. Both Windows Server 2003 and Windows XP are able to launch VBScripts. However, GPMC is a separate download for Windows Server 2003 and Windows XP (http://www.microsoft.com/downloads/details.aspx?FamilyID=0a6d4c24-8cbd-4b35-9272-dd3cbfc81887&displaylang=en). GPMC is a feature included in Windows Server 2008 that you can install through Server Manager.

    Here is the code for the script. Copy and paste this code into a text file. Be sure to save the text file with a .vbs extension or it will not run correctly.

    ‘=====================================================================

    ’ VBScript Source File

    ’ NAME:

    ’ AUTHOR: Mike Stephens , Microsoft Corporation
    ’ DATE : 11/15/2007

    ’ COMMENT:

    ’=====================================================================

    Set oGpm = CreateObject(“GPMGMT.GPM”)
    Set oGpConst = oGpm.GetConstants()

    Set oRSOP = oGpm.GetRSOP( oGpConst.RSOPModeLogging, “” , 0)
    strpath = Left(Wscript.ScriptFullName, InStrRev(Wscript.ScriptFullName,”\”, -1, vbTextCompare) )

    oRSOP.LoggingFlags = 0

    oRSOP.CreateQueryResults()
    Set oResult = oRSOP.GenerateReportToFile( oGpConst.ReportHTML, strPath & “rsop.html”)
    oRSOP.ReleaseQueryResults()

    WScript.Echo “Complete”

    WScript.Quit()

    Figure 3- VBScript code to save Group Policy results to an HTML file

    This code shown in figure 3 does not require any modification to work in your environment. Its only requirement is the computer from which the script runs must have GPMC installed. Now, let’s take a closer look at the script, which is a good introduction to GPMC scripting.( Please note that this posting is provided "AS IS" with no warranties, and confers no rights. Use of included script sample is subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.)

    Line 1: Set oGpm = CreateObject(“GPMGMT.GPM”)

    This line is responsible for making the GPMC object model available to the VBScript. If you are going to use the functions and features of GPMC through scripting, then you must include this line in your script. Also, if your script reports and error on this line, then it is a good indication that you do not have GPMC installed on the computer from which you are running the script.

    Line 2: Set oGpConst = oGpm.GetConstants()

    The GPMC object model has an object that contains constants. Constants are nothing more than keywords that typical describe an option that you can use when calling one or more functions. You’ll see in Line 3 and Line 7 where we use the constant object to choose the RSOP mode and the format of the output file.

    Line 3: Set oRSOP = oGpm.GetRSOP( oGpConst.RSOPModeLogging, “” , 0)

    The RSOP WMI provider makes Group Policy results possible. Each client-side extension records their policy specific information using RSOP as it applies policy. GPMC and GPRESULT then query RSOP and present the recorded data as the results of Group Policy processing. RSOP has two processing mode, Logging mode and Planning mode. Planning mode is allows you to model “what if” scenarios with Group Policy and is commonly surfaced in Group Policy Modeling node in GPMC. Logging mode reports the captured results from the last application of Group Policy processing. You can see the first parameter passed to GetRSOP is a constant RSOPModeLogging. This constant directs the GetRSoP method to retrieve logging data and not planning data, which is stored in a different section within RSOP. The remaining parameters are the default values for the GetRSOP method. This function returns an RSOP object, from which we can save RSOP data to a file.

    Line 4: strpath = Left(Wscript.ScriptFullName, InStrRev(Wscript.ScriptFullName,”\”, -1, vbTextCompare) )

    This line simply gets the name of the folder from where the script is running and saves it into the variable strpath. This variable is used in line 7; when we save the report to the file system.

    Line 5: oRSOP.LoggingFlags = 0

    LoggingFlags is a property of the RSOP object. Typically, you use this property to exclude user or computer from the reporting results. Most of the time and for this example, you want to set LoggingFlags equal to zero (0). This is a perfect opportunity to use a constant (created in line 2). However, some of the values are not included in the constant object and LoggingFlags happens to be one of them. If you want to exclude computer results from the report data, then set LoggingFlags equal to 4096. If you want to exclude user results from the report data, then set LoggingFlags equal to 8192.

    Line 6: oRSOP.CreateQueryResults()

    The CreateQueryResults method actually copies the RSOP data logged from the last processing of Group Policy into a temporary RSOP WMI namespace. This makes the data available for us to save as a report.

    Line 7: Set oResult = oRSOP.GenerateReportToFile( oGpConst.ReportHTML, strPath & “rsop.html”)

    The script retrieved RSOP information in line six. In this line, we save the retrieved RSOP information into a file. The first parameter in the GenerateReprotToFile method is a value that represents the report format used by the method. This value is available from the constant object—ReportHTML. The second parameter is the path and filename of the file to which the method saves the data—rsop.html. Later, I’ll show you how you can change this line to save the report to XML. Remember, the script creates the RSOP.HTML file in the same folder from where you started the script.

    Line 8: oRSOP.ReleaseQueryResults()

    The ReleaseQueryResults method clears the temporary RSOP namespace that was populated with the CreateQueryResults method. Group Policy stores actual RSOP in a different WMI namespace. CreateQueryResults copies this data into a temporary namespace. This is done to prevent a user from reading RSOP data while Group Policy is refreshing the data. You should always call the ReleaseQueryResults method when you are done using the RSOP data. The remainder of the script is self explanatory.

    HTML or XML

    I mentioned earlier that you could also save the same data in XML as oppose to HTML. This is a simple modification to line seven.

    Set oResult = oRSOP.GenerateReportToFile( oGpConst.ReportXML, strPath & “rsop.xml”)

    Saving the report in XML is easy. Change the first argument to use the ReportXML constant and the file name (most importantly—the file extension) to reflect the proper file format type.

    Summary

    Group Policy Resultant Set of Policy (RSoP) data is critical information when you believe you are experiencing a Group Policy problem. Text formats provide you most of the information you need but, at the expense of you manually parsing through the data. HTML formats have the same portability as text formats and provide you a better experience for navigating directly to the information for which you are looking. Also, they look much better than text—so they are good for reports and presentation. Lastly, the XML format is awesome for finding things programmatically. You can also store this same information in a SQL database (for multiple clients) and run custom SQL queries to analyze Group Policy processing across multiple clients.

    - Mike Stephens