Welcome to TechNet Blogs Sign in | Join | Help

Expert to Decrypt TLS/SSL Traffic

One of the most popular requests we've had is to provide a way to view encrypted traffic. The new Decryption expert aims to solve this problem for TLS/SSL traffic.

Using the Decryption Expert

The purpose of encrypting data in the first place is to hide private information from a third party who has intercepted your network traffic. At first the ability to decrypt this traffic might seem like a violation of this tenant. However, in order to decrypt the traffic you will need to acquire the certificate which contains the private server key. So you can't use this to decrypt just any traffic; you'll need the private key.

After downloading and installing the expert form CodePlex, you will see an option "NmDecrypt" from the expert menu next time you open a saved trace. Next, narrow down the traffic to the TCP conversation you want to decrypt. You can do this with a filter on the TCP port or by choosing the conversation in the tree. If you have already found an encrypted frame, you can use the Find Conversation feature to locate the conversation for you.

Now, run the expert form the main menu or right click the frame. Once you open the Expert you will be presented with a dialog so that you can enter the certificate, password, target output capture file, and optionally a log file. The capture file source will automatically be filled in for you.

clip_image002

Once you are done entering the information hit Start and the expert will attempt to decrypt the selected conversation. If an error is reported, you can provide a log file name to get more detailed information to which can help understand why you the decryption failed.

Viewing the Resulting Trace

When NmDecrypt completes, the resulting trace is automatically opened. One advantage of creating a new capture file is that you can send it to another user. This means the owner of the private key can decrypt the file without having to exchange the key.

The resulting trace will contain all of the original information plus new frames with a protocol header called DecryptedPayloadHeader. Thus you can find all inserted packets by applying this protocol as a filter. Of course you can also create a color filter as well if you want to easily identify them among the encrypted and inserted defragmented frames.

The Decryption expert will also insert fragmented frames, which can for the most part be ignored. These frames are created in the first pass for the expert and provide some level of transparency if you need to troubleshoot this transformation.

Finally, there may be some cases where multiple SSL messages are combined in one frame. In these cases the expert won't split them into multiple frames. While this might be possible to do, we'll leave it as an exercise for the open source community.

NmDecrypt Documentation

The documentation contains more information about using the expert, such as the encryption algorithms that are supported and typical errors you might encounter. You can access the documentation through the expert menu. We also describe how to extract the certificate for Windows machines in the appendix.

NmDecrypt is Open Source

The best part of all of this is that we've released the expert and all the source code on CodePlex. We encourage you to extend and improve this expert. In fact there are known deficiencies, (some might call them bugs :) ), that you could help to resolve. These have been listed on the issues tab in the CodePlex project. Plus there's no reason this same technique could not be extended for other encryption schemes. More info on developing your own experts is available at on our CodePlex Expert Site and feel free to view our new expert integration video on channel 9. Please download and give the expert a try and enjoy!

Posted by PaulELong | 0 Comments

Measuring Response Times

It's often useful to understand how long it takes for a request to get responded to. This helps you gauge how well a client or server is keeping up. This type of measurement can also be done at different layers; however there are some tricks you'll have to learn.

FrameVariable.TimeDelta

In order to filter on the difference in time, you can use FrameVariable.TimeDelta property. This value represents the time from the last physical frame in the trace. One side effect of this is that you can't filter the time delta that results between two filtered frames or two frames in a specific conversation. Leading to perhaps more confusion, the time delta column you see is updated based on the filtered information.

The following filter will find any frame with a time delta greater than 1 second.

FrameVariable.TimeDelta > 10000000

First you'll notice that you have to convert the value based on .1 microsecond chunks. In the example above, 1 second = 1000000 microseconds = 10,000,000. Second, if you view the Time Delta column, you might see some inconsistencies. The time we portray here is based on the last visible frame. But the filter works on the last physical frame. So as soon as any other filter is applied, including clicking in the conversation tree, the values you see in the Time Delta column will not match a Time Delta filter you apply on top of it. Finally responses don't always follow requests, so this method doesn't always work.

Response Times for a Specific Protocol

The fact that we can only filter TimeDelta based on the last physical frame reveals a problem if you want to determine response times for a specific protocol. To get around this problem, save a filtered version of your trace so frames you want to filter on are in your saved file. For instance if you want to see SMB response times, find a specific SMB conversation and save that out to a separate file. Then open that new capture file and use the time delta filter to find your longer response times.

Finding Slow Servers

Using the TimeDelta filter to find slow responding servers and services at any protocol layer is a great way to locate performance issues. Just remember to first save a filtered version of your trace based on the protocol and connection, then type in your FrameVariable.TimeDelta filter. Another great option here is using the Network Monitor API to programmatically analyze a trace for response times. A great example of this is vRTA which I reference in this blog, though goes beyond just response times.

Posted by PaulELong | 0 Comments
Filed under: ,

Annotated Traces for Windows System Behavior

Microsoft publishes protocol documentation on MSDN that is intended to make it easier for others to develop interoperable implementations. “System Documents” provide overviews of system behavior for key systems such as Active Directory, File Sharing and Windows Security. The MSDN documentation for each of the System Documents is available here. We've recently released sets of annotated network captures on the SysDoc CodePlex Site which cover a subset of scenarios for each of the System Documents.

What Kind of Behavior?

For each system component a few choice scenarios were captured and annotated. For example, File Systems have annotated traces for finding a file and configuring a server. Obviously, it would be quite an undertaking to annotate every scenario, but these annotations attempt to cover typical scenarios or a breadth of components.

What's an Annotated Trace?

Starting with Network Monitor 3.3, we can annotate a trace with comments. For more info about trace commenting please reference our blog called Frame Commenting is Here. Frame annotation provides a convenient way to describe what is happening at specific frames in a trace. Each commented frame has a # symbol next to the frame number. Clicking on a frame with comments populates the Frame Comments window in the UI. There are also ways to go to the next comment, search for a comment, and add a comment title column to the Frame Summary window.

Learning by Example

Besides helping you to understand a specific scenario, these annotated traces can be used to get a feel for how you might dissect a trace with your own scenarios. Getting oriented in a trace for an unfamiliar protocol is one of the first steps. With these annotated traces, you have some well documented examples to get your started. We hope you find them useful.

Posted by PaulELong | 0 Comments

Capturing a Trace at Boot Up

Capturing a trace during a boot is a common task that can be difficult to accomplish. In fact the most fool proof way to capture all traffic at boot is to capture the traffic from a 3rd party capturing machine in promiscuous mode. But this requires you to mirror or span a port on your switch, or insert a simple hub into your network so that you can see the traffic from the booting machine. For Windows 7 and Windows 2008 R2, you might be better off using the Netsh /Capture=yes option (see Windows and Network Monitor Event Tracing). But there is another possibility using NMCap as a service which I will unveil to you now.

SRVANY and INSTSRV

These two old resource kit utilities can be used to start any application as a service. And while they were designed for XP and Win2003, I successfully installed and ran my tests on Vista as well. Keep in mind that there isn't much support available for these tools, and your millage might vary. The Windows 2003 resource kit which contains these tools is available here.

Generic instructions are available in this KB article which is what I used as a template. So you can reference it for more details.

The first step is to create a batch file that starts NMCap. I stored my batch file in c:\bootcap and configured NMCap to store the captures file in 5 Mbyte chunks in this same location. This way I can access each new capture as it is created. If you don't use chained captures, accessing them becomes tricky as you might not want, or be able, to stop NMCap when running as a service. I will talk to that a bit more in the next section.

My batch file consists of this one line:

"c:\Program Files\Microsoft Network Monitor 3\nmcap" /network * /capture /file c:\bootcap\bootcap.chn:5M > "c:\bootcap\out.txt"

Feel free to test and see that it works properly by running at a command prompt before moving to the next steps.

Now that I have a working NMCap batch file, we follow the instructions in the referenced KB and set up this batch file as a service. I followed these steps.

1. At an elevated command prompt type the following, where path points to the location of the resource kit tools:

path\INSTSRV.EXE NmCapBoot path\SRVANY.EXE

2. Edit the Registry to add the application path. I'll echo the warning in the KB about messing with the Registry. If you don't know what you are doing, be careful and do a backup if you are fond of this machine.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NmCapBoot

3. From the Edit menu, click “Add Key”. Type the following and click OK:

Key Name: Parameters
Class : <leave blank>

4. Select the Parameters key.

5. From the Edit menu, click “Add Value”. Type the following and click OK:

Value Name: Application
Data Type : REG_SZ
String : c:\bootcap\c.cmd

6. Close Registry Editor.

You now have a service that will start automatically on reboot. You can also manually start it right now by typing "Net Start NmCapBoot" at the command prompt. This is a good step to prove everything is working. Check the output of c:\bootcap\out.txt to make sure there aren't any problems. Also verify that a capture file gets created. You will see the bootcap.cap appear, but it is not accessible until you fill the first 5 Mbytes and the next chained capture file is created. I find viewing a video on the web is one easy way to achieve this. Once you see bootcap(1).cap get created, you should be able to access the first file, bootcap.cap.

Properly Stopping your Capture

One problem with this method is that shutting the service down with "Net Stop NmCapBoot" doesn't properly close NMCap. No capture file will be created for the last chuck of trace data in the buffer that hasn't already been written to a capture file. In fact, stopping the service with "Net Stop" will leave NMCap running. So you'll have to use Task Manager, select "show processes from all users", and stop NMCap manually.

If there's no easy way to kill the capture, then you might need to trigger the capture to stop. However using a filter requires you load the parsers, which won't work as our installed parsers can’t be compiled and loaded to the non-user account running the service. Also filtering is expensive, as we have to parse the frame, so frames might be dropped on a speedy network and/or low disk space.

One simple solution to both problems is to create some simple parser code with NPL and provide an Offset, Length, Pattern type of match instead. This parsing is much quicker and the parser code to support this is trivial. So create a new sparser.npl text file and place this in c:\bootcap which will be accessed by using the /SetNPLPath parameter with NMCap.


UnsignedNumber blob(n)
{
Size = n;
}

Protocol Frame
{
}

This lets you type a filter like "blob(FrameData, 30, 4)==0x01020304". This particular offset on my network is the IPv4 Destination address for ICMP. Now you can stop NMCap by running "ping 1.2.3.4", though keep in mind you still need to stop the service with the Net Stop command. Mind you could come up with other patterns to stop the trace if the ping doesn't work for you.

Your new batch file, c.cmd, looks like this:

"c:\Program Files\Microsoft Network Monitor 3\nmcap" /SetNPLPath c:\bootcap
"c:\Program Files\Microsoft Network Monitor 3\nmcap" /network * /capture /file c:\bootcap\bootcap.chn:5M /stopwhen /frame blob(FrameData, 30, 4)==0x01020304> "c:\bootcap\out.txt"

This actually calls NMCap twice. The difference is that the first time we set a new NPL path to our sparser.npl which we put in the root of the c:\bootcap directory. This tells NMCap to rebuild and save this as the parser set to use.

Consequently, this changes the parser path for all instances of the UI and NMCap, so you'll have to revert this change if you need to use the normal parsers on this machine. In the UI, you can do this by going to tools, options, from the menu and opening the parsers tab. Then select the Restore Defaults button and follow that by pressing the Save and Reload Parsers button to rebuild the restored default parser set.

Using Alternate Stop Patterns

In the example above, we use the blob type to specify the frame data as the first parameter and then the offset and size. So you can provide a different offset and size if you need to use a different type of frame to stop your trace. The easiest way to do this is to look at an existing trace and use the Offset and Selected Bytes displayed in the Hex details. Then you can create a display filter using the notation above to test and make sure you are triggering the right frame.

Here are a few more examples with offsets. Keep in mind these offsets are specific to my network which is IPv4 on Ethernet for these examples.

 

Pattern Description

Blob Filter

Command to Stop

ICMPv4 Length - Use the length of IPv4 and the fact that IPv4.NextProtocol is ICMP

"blob(framedata, 16, 2)==0x97 AND blob(framedata, 23, 1)==1"

ping /l 123 /n 1 1.2.3.4

ICMPv4 Data - Search for the "abcd" pattern

blob(framedata, 42, 4)==0x61626364

ping /n 1 1.2.3.4

DNS Name Pattern Match - look for the name "stopme" at a particular offset

blob(framedata, 55, 6)==0x73746F706D65

Nslookup stopme

Caveats and Pitfalls

Captures Can Get Overwritten - Since we are using the same NMCap command line, restarting the service or a reboot will cause capture files to get over written.

Service Start Dependencies - What if the service that sends traffic starts before the NMCapBoot service? In some cases you must set the dependencies of other services to wait for NMCap to start running. Another consideration is that NMCap might also depend on some services, like the capture driver. If you are not capturing the information you want, you may have to play around with the dependencies for the services installed on your machine and OS.

Can't Apply Capture Filter - As we mentioned above we don't have access to the full parser set in this configuration. You could solve this problem other ways, like copying the parsers somewhere and pointing directly to them. However, this is still a problem of performance which you will have to gauge yourself. A simple test is to run NMCap with your filters and watch the pending count during high traffic. If the pending count continues to grow, then you might not be able to keep up with the traffic.

Unable to Stop with Ping - There may be situations when you can't provide a ping to stop the trace. For instance if you wanted to trace a shutdown of a machine, it might be difficult to get NMCap to stop properly thus losing the frames you want to see.

SrvAny Saves the Day

Srvany and InstSrv allow for a unique way to run NMCap as a service to capture logon/logoff type traffic OR longer term monitoring across logins. Using the steps above should provide you with enough information to solve this difficult capturing scenario.

Posted by PaulELong | 0 Comments

No Frames Captured Due to Disk Quota

In certain instances, you start a capture and no frames are captured. Or perhaps the UI suddenly stops displaying new frames. The display doesn't indicate any dropped frames and you've already verified that your selected adapter is the one that should see the traffic. Mysteriously, this worked in the past or maybe it never at all. What could be wrong?

Disk Quota Comes Into Play

We have a concept of a disk quota with Network Monitor. The idea is to protect you from filling up your disk drive. In some cases, a user might not be prepared for the fire hose of traffic that can flood your disk drive when you capture from your 1 Gig network. By default the quota is set to %2 of your disk space which means with a 100MB disk, we try to leave you 2MB free. For example on a 1Gig network, the amount of data you are capturing could easily be 100 Megs a second or more. So our intent is to protect the user from a low disk situation. This is especially critical on servers where low disk space can cause havoc.

An unintended outcome of the disk quota is that frames in the UI and NMCap won't get captured once this quota is met. Furthermore no appropriate error message is displayed leaving you befuddled. In the UI the conversation tree will state "waiting for network traffic...", but no frames ever appear. For NMCap with a filter you will see the same kind of behavior and the saved frame count never increments. For NMCap and no filter, the symptom is somewhat different. Instead, once you reach the limit, we will continue to process the remaining frames. However, the pending frame count never returns to zero.

Changing the Disk Quota

You can change the disk quota. In some circumstances, 2% can represent a large amount of disk space since it a percentage of your total disk size. We allow you to set the quota based on an absolute disk value as well as a percentage. In the UI this can be done by going to the Tools menu, Options, and clicking on the Capture tab.

clip_image002

If you are using NMCap, there is a command line option for either choice: /MinDiskQuotaPercentage and /MinDiskQuota. The default here is also 2%.

Wrap Up

So, if you are taking a capture and find the display is not updating any new frames, in addition to making sure you have the correct adapter selected, check and make sure you have more than 2% of your disk space show as free. If not, adjust the disk quota setting if this is not appropriate for your disk size.

Posted by PaulELong | 0 Comments

When You Can't Save Frames From the UI

You might have run into an occasion when doing a capture from the UI that you are unable to save your capture. You might receive a message like "Not enough storage is available to process this command". The UI tends eat up a lot of resources as it saves conversation information and builds the conversation tree. This is why we recommend you use NMCap, the command line capture utility included with Network Monitor, if you are going to capture a considerable amount of data. But if you do get into this situation, there might be a way to save the trace using the Frame Buffer Manager.

Frame Buffer Manager to Save the Day

Frame Buffer Manager is a tool in Network Monitor that allows you to select frames in any order and from multiple capture files and add them to a new capture file. You can sometimes use this feature to get around this problem. Just follow these steps:

  1. Go to File, Frame Buffer Manager
  2. Select the New File button
  3. In the file save dialog, type in a capture file name and hit save
  4. Hit OK to exit the Frame Buffer Manager window
  5. Select all frames in frame summary (Ctrl+A)
  6. Go to File, Frame Buffer Manager again
  7. Select the file you created in step 3 and Hit OK
  8. Go to File, Frame Buffer Manager again
  9. Select the file you created in step 3
  10. Select the Close File button to save the file

The file will now be saved. If the capture size is larger than 20megs, then it will be split into multiple files labeled, for instance, out.cap, out(1).cap, out(2).cap and so on.  You can stitch these back together using NMCap.  Please refer to this blog on stitching chained captures back together.

What You Learned

First, use NMCap if you need to capture for long periods of time. Second, if you do happen to find yourself in this situation with the UI, Frame Buffer Manager can possibly provide a way to save your data.

Posted by PaulELong | 0 Comments

Adapters Are Missing After Upgrading to Windows 7

If you have just upgraded to Windows 7, you might notice that you no longer see any adapters listed in your Select Networks selection. There is a very simple way to fix this problem.

First run CMD as administrator. If you have not done this before, you can use the search option in the start menu to find CMD. Then right click it and select "Run as Administrator". Now type "nmconfig /install" and enter. This will re-bind the Network Monitor Driver to the adapters. Next time you run Network Monitor, the adapters should show up again.

For more information, please see this KB article.

Posted by PaulELong | 0 Comments

Reassembling Packets with the Network Monitor API

Network traffic by nature is fragmented. Limits of various network packet sizes force protocols to chop up data into multiple frames. When you capture data or read it from a trace with the API (NMAPI) you see only the fragments by default. But as the engine is collecting packets, it can be configured to pass up the reassembled payloads as well. For an intro to how assembly works in the UI, please see the video on reassembly. We also released a recent video on Channel 9 which has some information about the API and reassembly. I would also recommend reading the "Introduction to the Network Monitor API" in the help file for a general background.

Configuring the Parser

The first step is to configure your parser to reassemble. Your API tool for breaking apart a frame is called the Frame Parser object. But to create a frame parser, you start by creating a Frame Parser Configuration. This configuration allows you to add data fields and properties. But it also allows you configure your parser for Reassembly and Conversations. In this case Reassembly might depend on Conversations, so we will enable them both. Here's how I setup my Parser Configuration and Frame Parser.

// Returns a frame parser with a filter and one data field.
// INVALID_HANDLE_VALUE indicates failure.
HANDLE
MyLoadNPL(void)
{
HANDLE myFrameParser = INVALID_HANDLE_VALUE;
ULONG ret;

// Use NULL to load default NPL set.
ret = NmLoadNplParser(NULL, NmAppendRegisteredNplSets, MyParserBuild, 0, &g_NplParser);

if(ret == ERROR_SUCCESS){
ret = NmCreateFrameParserConfiguration(g_NplParser, MyParserBuild, 0, &g_FrameParserConfig);

if(ret == ERROR_SUCCESS)
{
// Order is important here, must turn on Conversations before Reasembly.
ret = NmConfigConversation(g_FrameParserConfig, NmConversationOptionNone , TRUE);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
}

ret = NmConfigReassembly(g_FrameParserConfig, NmReassemblyOptionNone , TRUE);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
}

// Property so we can show the highest protocol description.
ret = NmAddProperty(g_FrameParserConfig, L"property.Description", &g_DescPropID);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to add field, error 0x%X\n", ret);
}

ret = NmCreateFrameParser(g_FrameParserConfig, &myFrameParser, NmParserOptimizeNone);

if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to create frame parser, error 0x%X\n", ret);
NmCloseHandle(g_FrameParserConfig);
NmCloseHandle(g_NplParser);
return INVALID_HANDLE_VALUE;
}
}
else
{
wprintf(L"Unable to load parser config, error 0x%X\n", ret);
NmCloseHandle(g_NplParser);
return INVALID_HANDLE_VALUE;
}

}
else
{
wprintf(L"Unable to load NPL\n");
return INVALID_HANDLE_VALUE;
}

return(myFrameParser);
}

After creating your Frame Parser Configuration Object, you'll want to set any options first. This will let the engine optimize properly when adding other things like properties and data fields. It's also important that you turn on conversations before reassembly. Placing them in the wrong order will turn off Reassembly due to a bug in our API.

Above we also added a property so that I can show the description of the current frame. This is not necessary for reassembly to work, but it helps us understand the example.

Parsing the Frames

It is up to the parsers (NPL) to mark each frames fragment type: First=1, Middle=2, Last=3 or None=0. The engine tracks these fragments and returns a new inserted raw frame once a Last fragment is detected for a specific protocol.

When you parse a raw frame using NmParseFrame, the last parameter passed is a pointer to a HANDLE that will contain an InsertedRawFrame if one is present. Otherwise this value will be set to INVALID_HANDLE_VALUE for any frame that doesn't return a reassembled payload. For frames that do have a reassembled payload, the handle returned will contain a raw frame. You can now use your frame parser to parse this raw frame.

The main part of my code simply retrieves frames from the capture file iteratively and calls ParseFrame, which does all the work. If an inserted frame is found, the function calls itself. The function is recursive because the handles for a RawFrame, ParsedFrame and InsertedRawFrame have to be closed in the order they were opened. There are other ways to do this, but for this example a recursive routine was the easiest. You will also want to insure the frames are in order. For instance you could use NmOpenCaptureFileInOrder to make sure the TCP frames are ordered correctly.

In my case I parse and display all the frames so that you can get a feel for the pattern that occurs as frames fragments are marked by the engine. It also helps to shows how fragmentation looks at different protocol layers. If you were interested in only the reassembled frames or frames that are not fragmented to begin with, you could identify those as having a fragment type of None and no InsertedRawFrame.

Here's the recursive frame parsing routine:

// Recursive Parsing routine.  If an inserted frame is found, the recusive routine is called again.  This
// allows us to close our handles in the order there were created.
void
MyParseFrame(HANDLE frameParser, HANDLE rawFrame, ULONG curFrame, PULONG reassembleFrames, int reassembleCount)
{
ULONG ret;
HANDLE ParsedFrame = INVALID_HANDLE_VALUE;
HANDLE InsRawFrame = INVALID_HANDLE_VALUE;

// NmUseFrameNumber and valid unique frame numbers are neccessary for Reassembly to work properly.
ret = NmParseFrame(frameParser, rawFrame, curFrame + *reassembleFrames, NmFieldDisplayStringRequired | NmUseFrameNumberParameter, &ParsedFrame, &InsRawFrame);
if(ret == ERROR_SUCCESS)
{
// Returns the highest level protocol description just to show which
// frame we are working on.
PBYTE buf = GetDescription(frameParser);

// Get the fragment information which helps understand what is happening,
// but not needed for reassembly to work.
NM_FRAGMENTATION_INFO FragInfo;
GetFragType(ParsedFrame, &FragInfo);

wprintf(L"%5d-%d: %5d %-5.5s-%d %-.45s\n", curFrame+1, reassembleCount, curFrame+(*reassembleFrames)+1, FragInfo.FragmentedProtocolName, FragInfo.FragmentType, buf);

free(buf);

if(InsRawFrame != INVALID_HANDLE_VALUE)
{
(*reassembleFrames)++;
MyParseFrame(frameParser, InsRawFrame, curFrame, reassembleFrames, reassembleCount+1);

NmCloseHandle(InsRawFrame);
}
}

NmCloseHandle(ParsedFrame);
NmCloseHandle(InsRawFrame);
}

When doing reassembly you must add the Frame Number parameter. It must also be unique, so you have to remember to increment when adding and parsing the reassembled frames. The GetFragType uses NmGetFrameFragmentInfo API call to determine the fragment type and protocol. You can look at the full example below to see how it works in details, but those ancillary pieces are pretty straight forward.

Looking at an Example

Below is the partial output for an example capture. In my notation, the Frame# contains a number after the dash that shows when multiple iterations occur on a frame. The Reassem# is the frame number that would appear in a reassembled trace in the UI and is what is used to seed each frame with a unique frame number.

Frame# Reassem# FragType Description

5-0: 5 TCP -1 HTTP:Response, HTTP/1.1, Status: Bad gateway,

6-0: 6 TCP -2 TCP:[Continuation to #5]Flags=...A...., SrcPo

7-0: 7 -0 TCP:Flags=...A...., SrcPort=49382, DstPort=HT

8-0: 8 TCP -3 TCP:[Continuation to #5]Flags=...AP..., SrcPo

8-1: 9 -0 HTTP:Response, HTTP/1.1, Status: Bad gateway,

...

In original frames 5-8, you can see a typical TCP fragmentation. Frame 5 is a TCP First fragment. Frame 6 is a middle fragment and frame 7 is traveling in the opposite direction so it's not part of this reassembly stream. Frame 8 is the last frame in the reassembled TCP payload which is marked as the Last fragment. This is where the Inserted Raw Frame is valid and the recursive call to parse the frame would occur. Frame 8-1, is the parsed inserted frame which you can see matches the description of frame #5, but if you looked at it, there would be two differences.

First, since it's an inserted frame it will have a PayloadHeader structure as its top protocol. This is a protocol we manufactured to take the place of the carrying protocol, in this case TCP. Having a duplicate TCP frame would confuse our parsers and perhaps the user as well. So this header takes it place and calls HTTP directly.

Second, this frame will have a larger payload. It will consist of all the payload data from frame 5, 6, and 8.

Two Level Reassembly

In this next example, both TCP and HTTP has fragmented data.

...

33-0: 36 TCP -1 HTTP:Response, HTTP/1.1, Status: Ok, URL: htt

34-0: 37 TCP -2 TCP:[Continuation to #36]Flags=...A...., SrcP

35-0: 38 -0 TCP:Flags=...A...., SrcPort=49384, DstPort=HT

36-0: 39 TCP -3 TCP:[Continuation to #36]Flags=...AP..., SrcP

36-1: 40 HTTP -1 HTTP:Response, HTTP/1.1, Status: Ok, URL: htt

37-0: 41 TCP -1 HTTP:HTTP Payload, URL: http://www.google.com

38-0: 42 -0 TCP:Flags=...A...., SrcPort=49384, DstPort=HT

39-0: 43 TCP -2 TCP:[Continuation to #41]Flags=...A...., SrcP

40-0: 44 TCP -2 TCP:[Continuation to #41]Flags=...A...., SrcP

41-0: 45 -0 TCP:Flags=...A...., SrcPort=49384, DstPort=HT

42-0: 46 TCP -3 TCP:[Continuation to #41]Flags=...AP..., SrcP

42-1: 47 HTTP -3 HTTP:HTTP Payload, URL: http://www.google.com

42-2: 48 -0 HTTP:Response, HTTP/1.1, Status: Ok, URL: htt

...

Frames 33-36 make up the first HTTP fragment. As you can see, the inserted frame at 36-1 is a First fragment, but the protocol is now HTTP. Frames 37-42 make up the next HTTP fragment which is inserted at frame 42-1. This inserted frame is the HTTP Last fragment so now there is yet another inserted raw frame that we must iterate through and parse. Frame 42-2 is the final reassembled frame and contains the original HTTP Response in its entirety. The description matches frame 33 because it the data starts with payload in that frame but it also includes the payloads from frames 34, 36, 37, 39, 40, and 42. However, from the engines point of view, it really collects the payloads from frame 36-1 and 42-1. But each of these is made up from the fragmented frames mentioned above.

The Whole Shebang

Below I've placed the entire source code for the example described in this blog. While it depends on which protocols you are interested in, having access to the reassembled data can provide you with the big picture especially when focusing on application layer traffic.

#include "stdafx.h"
#include "windows.h"
#include "stdio.h"
#include "stdlib.h"
#include "objbase.h"
#include "ntddndis.h"
#include "NMApi.h"

HANDLE g_NplParser = INVALID_HANDLE_VALUE;
HANDLE g_FrameParserConfig = INVALID_HANDLE_VALUE;

ULONG g_DescPropID = 0; // Global Description Property ID.

// Callback for parser building messages
void __stdcall
MyParserBuild(PVOID Context, ULONG StatusCode, LPCWSTR lpDescription, ULONG ErrorType)
{
wprintf(L"%s\n", lpDescription);
}

// Returns a frame parser with a filter and one data field.
// INVALID_HANDLE_VALUE indicates failure.
HANDLE
MyLoadNPL(void)
{
HANDLE myFrameParser = INVALID_HANDLE_VALUE;
ULONG ret;

// Use NULL to load default NPL set.
ret = NmLoadNplParser(NULL, NmAppendRegisteredNplSets, MyParserBuild, 0, &g_NplParser);

if(ret == ERROR_SUCCESS){
ret = NmCreateFrameParserConfiguration(g_NplParser, MyParserBuild, 0, &g_FrameParserConfig);

if(ret == ERROR_SUCCESS)
{
// Order is important here, must turn on Conversations before Reasembly.
ret = NmConfigConversation(g_FrameParserConfig, NmConversationOptionNone , TRUE);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
}

ret = NmConfigReassembly(g_FrameParserConfig, NmReassemblyOptionNone , TRUE);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
}

// Property so we can show the highest protocol description.
ret = NmAddProperty(g_FrameParserConfig, L"property.Description", &g_DescPropID);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to add field, error 0x%X\n", ret);
}

ret = NmCreateFrameParser(g_FrameParserConfig, &myFrameParser, NmParserOptimizeNone);

if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to create frame parser, error 0x%X\n", ret);
NmCloseHandle(g_FrameParserConfig);
NmCloseHandle(g_NplParser);
return INVALID_HANDLE_VALUE;
}
}
else
{
wprintf(L"Unable to load parser config, error 0x%X\n", ret);
NmCloseHandle(g_NplParser);
return INVALID_HANDLE_VALUE;
}

}
else
{
wprintf(L"Unable to load NPL\n");
return INVALID_HANDLE_VALUE;
}

return(myFrameParser);
}

void
UnLoadNPL(void)
{
NmCloseHandle(g_NplParser);
NmCloseHandle(g_FrameParserConfig);
}

ULONG
GetFragType(HANDLE parsedFrame, NM_FRAGMENTATION_INFO *FragInfo)
{
ULONG ret;

FragInfo->Size = sizeof(FragInfo);
ret = NmGetFrameFragmentInfo(parsedFrame, FragInfo);

return ret;
}

PBYTE
GetDescription(HANDLE frameParser)
{
ULONG ret;
NM_PROPERTY_INFO PropInfo;

// Find out the size of the description property so we can allocate a buffer.
// MUST intialize the size and name pointer or NmGetPropertyInfo will fail.
PropInfo.Size = sizeof(PropInfo);
PropInfo.Name = NULL;
ret = NmGetPropertyInfo(frameParser, g_DescPropID, &PropInfo);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Error calling NmGetPropertyInfo, %d\n", ret);
return NULL;
}

ULONG retlen = 0;
NmPropertyValueType propType;
// Add size of WCHAR for null terminator
PBYTE buf = (PBYTE)malloc(PropInfo.ValueSize + sizeof(WCHAR));
ret = NmGetPropertyValueById(frameParser, g_DescPropID, PropInfo.ValueSize, buf, &retlen, &propType);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Error calling NmGetPropertyValueById, %d\n", ret);
return NULL;
}

return buf;
}

// Recursive Parsing routine. If an inserted frame is found, the recusive routine is called again. This
// allows us to close our handles in the order there were created.
void
MyParseFrame(HANDLE frameParser, HANDLE rawFrame, ULONG curFrame, PULONG reassembleFrames, int reassembleCount)
{
ULONG ret;
HANDLE ParsedFrame = INVALID_HANDLE_VALUE;
HANDLE InsRawFrame = INVALID_HANDLE_VALUE;

// NmUseFrameNumber and valid unique frame numbers are neccessary for Reassembly to work properly.
ret = NmParseFrame(frameParser, rawFrame, curFrame + *reassembleFrames, NmFieldDisplayStringRequired | NmUseFrameNumberParameter, &ParsedFrame, &InsRawFrame);
if(ret == ERROR_SUCCESS)
{
// Returns the highest level protocol description just to show which
// frame we are working on.
PBYTE buf = GetDescription(frameParser);

// Get the fragment information which helps understand what is happening,
// but not needed for reassembly to work.
NM_FRAGMENTATION_INFO FragInfo;
GetFragType(ParsedFrame, &FragInfo);

wprintf(L"%5d-%d: %5d %-5.5s-%d %-.45s\n", curFrame+1, reassembleCount, curFrame+(*reassembleFrames)+1, FragInfo.FragmentedProtocolName, FragInfo.FragmentType, buf);

free(buf);

if(InsRawFrame != INVALID_HANDLE_VALUE)
{
(*reassembleFrames)++;
MyParseFrame(frameParser, InsRawFrame, curFrame, reassembleFrames, reassembleCount+1);

NmCloseHandle(InsRawFrame);
}
}

NmCloseHandle(ParsedFrame);
NmCloseHandle(InsRawFrame);
}

int __cdecl wmain(int argc, WCHAR* argv[])
{
ULONG ret = ERROR_SUCCESS;
// The first paramryrt should be a file.
if(argc <= 1){
wprintf(L"Expect a file name as the only command line parameter\n");
return -1;
}

// Open the specified capture file.
HANDLE myCaptureFile = INVALID_HANDLE_VALUE;
if(ERROR_SUCCESS == NmOpenCaptureFile(argv[1], &myCaptureFile))
{
// Initialize the parser engine and return a frame parser.
HANDLE myFrameParser = MyLoadNPL();
if(myFrameParser != INVALID_HANDLE_VALUE)
{
ULONG myFrameCount = 0;
ret = NmGetFrameCount(myCaptureFile, &myFrameCount);
if(ret == ERROR_SUCCESS)
{
ULONG totReassembledFrames = 0;
HANDLE myRawFrame = INVALID_HANDLE_VALUE;

wprintf(L"Frame# Reassem# FragType Description\n");
for(ULONG i = 0; i < myFrameCount; i++)
{
HANDLE myParsedFrame = INVALID_HANDLE_VALUE;
ret = NmGetFrame(myCaptureFile, i, &myRawFrame);
if(ret == ERROR_SUCCESS)
{
MyParseFrame(myFrameParser, myRawFrame, i, &totReassembledFrames, 0);

NmCloseHandle(myRawFrame);
}
else
{
// Print an error, but continue to loop.
wprintf(L"Errors getting raw frame %d\n", i+1);
}
}
}

NmCloseHandle(myFrameParser);
}
else
{
wprintf(L"Errors creating frame parser\n");
}

NmCloseHandle(myCaptureFile);
}
else
{
wprintf(L"Errors openning capture file\n");
}

// Release global handles.
UnLoadNPL();

return 0;
}

Posted by PaulELong | 0 Comments
Filed under:

Network Monitor Videos on Channel 9

We posted some videos to Channel 9  in the last 6 months or so, and I wanted to let everybody know about them.

We have one set of video's that provide some insight into the Network Monitor API and process of creating experts. This series provides an overview of the API and dives deeply in to various aspects like Live Capturing, Parser Engine, and API Overview as well as a general Expert Story to understand the big picture. We plan to release a few more in the upcoming months so stay tuned.

We also have some videos from our Plug Fests, which is where we invite partners to get information on specific technologies to which they want to interoperate.

So please visit the Channel9 (http://channel9.msdn.com/tags/Netmon/) site and learn more about Network Monitor.

Posted by PaulELong | 0 Comments

Using NMAPI to Access TCP Payload

The TCP Payload often carries data that you want to access directly using the Network Monitor API. Below I will detail how to do this using a simple C++ example and the NMAPI.

Why Not add a TCP.Payload Field?

The TCP Payload can carry all types of payloads depending on the protocol that rides on top of TCP. Most often these represent other protocols, but you might not care about the protocol and instead want to see the payload size or payload data directly. You might think that you could access TCP.Payload to access this data, as this is a valid data field. However, TCP.Payload is only instantiated when no other protocol consumes the data. And in most cases, our parsers are complete enough to attempt to parse the data further. This is a limitation of how NPL works, and means we need to find another way to get the payload data.

Why Not use Property.TCPPayload?

Now there is a property, see this blog for more info on properties, called Property.TCPPayload that you could potentially use. The limitation is that it only works with ASCII or UNICODE data. So for binary information the data does not read properly into the property.

The Solution

The solution is to find the TCP payload depending on the TCP header location and size. We can use Property.TCPPayloadLength to obtain the total length of the payload. And to get the offset into the frame we use the TCP header length (TCP.DataOffset.DataOffset). Finally to get the start of the TCP frame we use the offset of TCP.SrcPort which is the first field in a TCP frame. With these pieces of information, we can use NmGetPartialRawFrame API to grab the raw data from the frame.

So here's the code snippet:

void
GetFramePayload(HANDLE ParsedFrame, HANDLE FrameParser, HANDLE RawFrame)
{
ULONG ret;
UINT32 PayloadLen = 0;
ULONG retlen;
NmPropertyValueType PropType;

UINT8 TCPHeaderSize;
ULONG TCPSrcOffset, TCPSrcSize;


// Get Payload Length
ret = NmGetPropertyValueById(FrameParser, TCPPayloadLengthID, sizeof(PayloadLen), (PBYTE)&PayloadLen, &retlen, &PropType);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Error retrieving TCP Payload Length Property, err=%d\n", ret);
return;
}

if(PayloadLen > 0)
{
// Get the Data Offset, used to determine the TCP header size
ret = NmGetFieldValueNumber8Bit(ParsedFrame, TCPDataOffsetID, &TCPHeaderSize);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Error retrieving TCP Header Length Field, err=%d\n", ret);
return;
}

// Get the Offset of TCP.SrcPort which is the first field in TCP.
ret = NmGetFieldOffsetAndSize(ParsedFrame, TCPSrcPortID, &TCPSrcOffset, &TCPSrcSize);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Error retrieving TCP SRC Header/Offset, err=%d\n", ret);
return;
}

wprintf(L"Offset: %d, Length: %d, HeaderLen: %d\n", TCPSrcOffset/8, PayloadLen, TCPHeaderSize*4);

// Allocate a buffer based on the Payload Length Property.
PBYTE buf = (PBYTE)malloc(PayloadLen);

// Read in the partial frame. The Offset is in bits. TCPHeaderSize is off by a factor of 4.
ret = NmGetPartialRawFrame(RawFrame, TCPSrcOffset/8 + TCPHeaderSize*4, PayloadLen, buf, &retlen);

// Do what ever you want with buf now. I'll assume it's ASCII and print it.
wprintf(L"%S", buf);
}
}

And here is the initialization code for each of our frame parser to see how each data field and property was added:

HANDLE
MyLoadNPL(void)
{
HANDLE myFrameParser = INVALID_HANDLE_VALUE;
ULONG ret;

// Use NULL to load default NPL set.
ret = NmLoadNplParser(NULL, NmAppendRegisteredNplSets, MyParserBuild, 0, &NplParser);

if(ret == ERROR_SUCCESS){
ret = NmCreateFrameParserConfiguration(NplParser, MyParserBuild, 0, &FrameParserConfig);

if(ret == ERROR_SUCCESS)
{

ret = NmAddProperty(FrameParserConfig, L"Property.TCPPayloadLength", &TCPPayloadLengthID);
if(ret != 0)
{
wprintf(L"Failed to add Property.TCPPayloadLength, error 0x%X\n", ret);
}

ret = NmAddField(FrameParserConfig, L"TCP.SrcPort", &TCPSrcPortID);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to add field, TCP.SrcPort, error 0x%X\n", ret);
}

ret = NmAddField(FrameParserConfig, L"TCP.DataOffset.DataOffset", &TCPDataOffsetID);
if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to add field, TCP.DataOffset, error 0x%X\n", ret);
}

ret = NmCreateFrameParser(FrameParserConfig, &myFrameParser);

if(ret != ERROR_SUCCESS)
{
wprintf(L"Failed to create frame parser, error 0x%X\n", ret);
NmCloseHandle(FrameParserConfig);
NmCloseHandle(NplParser);
return INVALID_HANDLE_VALUE;
}
}
else
{
wprintf(L"Unable to load parser config, error 0x%X\n", ret);
NmCloseHandle(NplParser);
return INVALID_HANDLE_VALUE;
}

}
else
{
wprintf(L"Unable to load NPL\n");
return INVALID_HANDLE_VALUE;
}

return(myFrameParser);
}

By using TCP.SrcPort, we get rid of any dependency of the stack. This will work on IPv4, IPv6 or any tunneled protocols. Also the TCP.PayloadLength is computed by the parsers which again is agnostic to the carrying protocols.

Party on Your Payload

Now that you have your payload in a BYTE buffer, you can do what ever you want with it. For instance, if you wanted to create an expert to show each payload and response as text, you could simply take the frame number that is referenced and use that to determine the conversation key for the TCP conversation, i.e. using a property Conversation.ID.TCP. Then you can use this to filter all other packets in the same trace with the same TCP Conversation ID. This would give you a high level view of text based traffic like HTTP and FTP. Of course there is a little more work to deal with fragmented data, but the API gives you all the tools to accomplish this.

Posted by PaulELong | 2 Comments

SMB Opportunistic Locking Behavior

Behold the mysterious world of OpLocks (Opportunistic Locking). Often OpLocks will be disabled by a user or system administrator in order to help address a performance problem. And this practice might not always be the best course of action. Understanding how OpLocks behave in a trace can provide you more information so you can properly diagnose an OpLock issue.

What is an OpLock

OpLocks are used to enhance performance on a network where multiple people are accessing the same file. By the way these are somewhat different than the notion of "optimistic locking" in databases. Imagine that you are the only person editing a file on a server. Because nobody else has the file open, you could cache your changes locally for both read and writes. This would improve your performance because you wouldn't have to go over the network for any of this cached information.

Now imagine somebody else opens the file after you do. If you have changes in your local cache, this new user won't see those changes. OpLocks, or more specifically a break of an OpLock in this case, is how your computer is told to flush its local cache.

In general there are different levels of OpLocks, like Batch, Exclusive, and Level 2 which define how a file can be shared with respect to this local caching. But rather than go into a lot of detail about the specifics, let me point you to some references which do a good job of describing more detail.

Example OpLocks in a Trace

In this example we have two clients - Windows XP (SMB) and Windows Vista (SMB2) viewing the same directory on a 3rd computer using explorer. As explorer reads the data, file collisions occur which cause various OpLock traffic. We will focus on a piece of this traffic and describe how the OpLock behavior is working. Once you see what normal traces look like, you can use this information to troubleshoot issues with OpLocks.

Setting up the Trace in Network Monitor

One nice feature I like to use is aliases. This gives me the ability to change IP addresses to something I can better recognize, especially when working with 3 machines as in this case. By right clicking on an address in the source or destination column, I can select "Create Alias for..." and then provide a friendly name. In my case I will call them SRV for the server, and Vista and XP for each client.

The second thing I'll do is add the display filter "SMB or SMB2" so that I only see these protocols. This will get rid of any TCP or unrelated traffic for this demonstration.

Finally, I also added comments to this particular trace. Comments are an easy way to document the traffic that occurs for others to learn from. By adding the "Comment Title" as a column, these comments show up and provide some commentary about what is going on. By the way, the # next to the frame number signifies which frames have a comment. Alternatively you can keep the comment tab open to see each comment as you click on frame. Using the latter method enabled you to see more detail in the description column.

Traffic Analysis

I copy and pasted the data from the Network Monitor summary view. Here is the traffic that occurs between the 3 machines:

 

Frame Number

Source

Destination

Description

Comment Title

3110#

Vista

SRV

SMB2:C CREATE (0x5), Context=DHnQ,Create Durable Open Handle, Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID , FileName = ...\Documents\desktop.ini@#3110

Vista Client Opens desktop.ini, request oplock batch

3111

XP

SRV

SMB:C; Transact2, Query Path Info, Query File Basic Info, Pattern = \...\Documents\desktop.ini

 

3112

SRV

XP

SMB:C; Locking Andx, FID = 0x400E (\...\Documents\desktop.ini@#2519)

 

3113

SRV

XP

SMB:R; Transact2, Query Path Info, Query File Basic Info

 

3114#

SRV

Vista

SMB2:R CREATE (0x5) Interim Response, FileName = ...\Documents\desktop.ini@#3110

Server response that this command is Pending

3116#

XP

SRV

SMB:C; Close, FID = 0x400E , FileName=\...\Documents\desktop.ini@#2519

XP Client closes desktop.ini

3117

SRV

XP

SMB:R; Close, FID = 0x400E , FileName=\...\Documents\desktop.ini@#2519

 

3118#

SRV

Vista

SMB2:R CREATE (0x5), Context=MxAc,Maximal Access, Context=DHnQ,Create Durable Open Handle, Context=QFid,Request Unique File ID, FID=0xFFFFFFFF002000C5(...\Documents\desktop.ini@#3110)

Server responds to the Vista client with batch oplock granted

3119#

XP

SRV

SMB:C; Nt Create Andx, FileName = \...\Documents\desktop.ini

XP client wants to open desktop.ini again

3120#

SRV

Vista

SMB2:N OPLOCK BREAK (0x12), Oplock Level II Notification, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110

Server send Oplock break to Level 2 Notification to Vista client

3122

Vista

SRV

SMB2:C CREATE (0x5), Context=DHnQ,Create Durable Open Handle, Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID , FileName = ...\Links@#3122

 

3123

SRV

Vista

SMB2:R CREATE (0x5), Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID, FID=0xFFFFFFFF002000CD(...\Links@#3122)

 

3124#

Vista

SRV

SMB2:A OPLOCK BREAK (0x12), Oplock Level II Acknowledgment, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110

Vista Client sends Oplock Level 2 Acknowledge to Server

3125#

SRV

Vista

SMB2:R OPLOCK BREAK (0x12), Oplock Level II Response, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110

Server sends break OpLock break to Level 2 response

3126

SRV

XP

SMB:R; Nt Create Andx, FID = 0x8008 (\...\Documents\desktop.ini@#3119)

 

As we start in frame 3110, we see that the Vista client opens desktop.ini and requests a Batch OpLock. Since the OpLock request is part of the SMB Create, the actual request is buried in the frame details.

Frame: Number = 3110, Captured Frame Length = 386, MediaType = ETHERNET

...

+ SMBOverTCP: Length = 264

- SMB2: C CREATE (0x5), Context=DHnQ,Create Durable Open Handle, Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID , FileName = paullo\Documents\desktop.ini@#3110

SMBIdentifier: SMB

+ SMB2Header: C CREATE (0x5),TID=0x0009, MID=0x04F2, PID=0xFEFF, SID=0x0001

- CCreate: 0x1

StructureSize: 57 (0x39)

SecurityFlags: 0 (0x0)

RequestedOplockLevel: SMB2_OPLOCK_LEVEL_BATCH - A batch oplock is requested.

...

 

Frames 3111-3113 contain other traffic our XP client is doing which also happens to touch desktop.ini.

In frame 3114 the server returns a STATUS_PENDING because the server is not yet ready to respond.

Frame: Number = 3114, Captured Frame Length = 194, MediaType = ETHERNET 

...

+ SMBOverTCP: Length = 73

- SMB2: R CREATE (0x5) Interim Response, FileName = paullo\Documents\desktop.ini@#3110

SMBIdentifier: SMB

- SMB2Header: R CREATE (0x5),TID=0x0000, MID=0x04F2, PID=0x0000, SID=0x0001

StructureSize: 64 (0x40)

Epoch: 0 (0x0)

+ Status: 0x103, Facility = FACILITY_SYSTEM, Severity = STATUS_SEVERITY_SUCCESS, Code = (259) STATUS_PENDING

Command: CREATE (0x5)

...

Frame: Number = 3114, Captured Frame Length = 194, MediaType = ETHERNET

The XP Client is closing desktop.ini so the server will wait for that to complete first. This way it can grant the Batch OpLock the Vista client is requesting. If the XP client keeps the file open, the OpLock might have been denied. Once it completes, the SMB2 Create response is finally returned and the Batch OpLock is granted in frame 3118.

Frame: Number = 3118, Captured Frame Length = 394, MediaType = ETHERNET 

...

+ SMBOverTCP: Length = 272

- SMB2: R CREATE (0x5), Context=MxAc,Maximal Access, Context=DHnQ,Create Durable Open Handle, Context=QFid,Request Unique File ID, FID=0xFFFFFFFF002000C5(paullo\Documents\desktop.ini@#3110)

SMBIdentifier: SMB

+ SMB2Header: R CREATE (0x5),TID=0x0000, MID=0x04F2, PID=0x0000, SID=0x0001

- RCreate: 0x1

StructureSize: 89 (0x59)

OplockLevel: SMB2_OPLOCK_LEVEL_BATCH - A batch oplock was granted.

...

Frame: Number = 3118, Captured Frame Length = 394, MediaType = ETHERNET

Next another create request for desktop.ini appears in frame 3119 as the XP client wants to reopen the file again. Since this is a second open of the same file, the server has to notify the Vista client to break its OpLock to Level 2 in frame 3120.

 

3119#

XP

SRV

SMB:C; Nt Create Andx, FileName = \...\Documents\desktop.ini

XP client wants to open desktop.ini again

3120#

SRV

Vista

SMB2:N OPLOCK BREAK (0x12), Oplock Level II Notification, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110

Server send Oplock break to Level 2 Notification to Vista client

The exact algorithm for breaking an OpLock is explained in the system documents referenced above and is related to the file system, so I won't go over those specifics. But in general since two clients have the same file open, the local client caching algorithm has to change. The Vista client can no longer assume the file won't be changed and there for can't cache the file locally.

In frame 3124, the "notify" is acknowledged and now the server can respond back to the Vista client in frame 3125 that the OpLock was broken to level 2. Finally Frame 3126 is the response back to the XP client that the open on desktop.ini has been completed.

 

3124#

Vista

SRV

SMB2:A OPLOCK BREAK (0x12), Oplock Level II Acknowledgment, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110

Vista Client sends Oplock Level 2 Acknowledge to Server

3125#

SRV

Vista

SMB2:R OPLOCK BREAK (0x12), Oplock Level II Response, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110

Server sends break OpLock break to Level 2 response

3126

SRV

XP

SMB:R; Nt Create Andx, FID = 0x8008 (\...\Documents\desktop.ini@#3119)

 

Troubleshooting Performance and OpLocks

The previous example worked smoothly as it usually does. But in some instances an OpLock request does not get a response in a timely fashion. In those cases you might see a 35 second delay which is the default timeout for an OpLock. This could cause application timeouts or what seems like a hanging application from the user’s perspective. Also this 35 second delay is a sure sign OpLocks are involved in a performance issue. Just remember that as shown in the example above, multiple clients are probably involved. And it's this type of interaction you must learn to recognize in order to troubleshoot a performance problem with OpLocks.

Posted by PaulELong | 0 Comments

Delayed Write Failure Trace Study

In this "Trace Study”, we'll look at a case where the customer is seeing delayed write failures logged in the event log. Delayed write failures are reported when a file being written over the network is inaccessible for a time. Based on a trace taken at the same time as the error was logged, we will determine the cause.

Zooming In

Since we know the file name reported in the event log error, we'll use that name to find where in the trace we are accessing this file. We start by building a filter that uses a property we set for any SMB frame which references a file.

Property.SMBFileName.Contains("dir.txt")

This displays a bunch of frames that reference the "dir.txt" file, but this does not represent the entire conversation. To get the entire conversation, right click any frame and select Find Conversation->SMB. Then remove your display filter and now you will see all the frames associated with this particular SMB conversation. An SMB conversation is usually all operations involving a single file.

The next step is to look for an error of some kind. We do this by creating a color filter (http://blogs.technet.com/netmon/archive/2007/06/28/color-filtering-error-messages.aspx) to make SMB error frames stand out. We'll use this color filter:

(smb.DOSError.Error != 0 AND smb.DOSError.Error != 22)

OR

(smb.NTStatus.Code != 0 && smb.NTStatus.Code!= 22)

I made my color filter have a red background and a white foreground, a color scheme I use to identify errors.

With this color filter enabled, I simply scroll through the trace looking for a red frame to stand out. As they pop up you'll have to look at the specific error and see if it applies. In my case I see a STATUS_NETWORK_SESSION_EXPIRED. Following this traffic I see a Session Setup, and then continued SMB Writes before and after.

 

SMB:C; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes at Offset 32780

SMB:R; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes

SMB:C; Transact2, Query File Info, Query File Standard Info, FID = 0x400C (\files\dir.txt@#1644)

SMB:R; Transact2, Query File Info, FID = 0x400C (\files\dir.txt@#1644) - NT Status: System - Error, Code = (860) STATUS_NETWORK_SESSION_EXPIRED

SMB:C; Session Setup Andx, Krb5ApReq (0x100)

SMB:R; Session Setup Andx, Krb5ApRep (0x200)

SMB:C; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes at Offset 32780

SMB:R; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes

Obviously this is not normal traffic for SMB. Session Setups occur when you first make a connection to a share, but not in the middle of a file transfer. What caused this session to expire?

Zooming Out

When we used the "Find Conversation->SMB" above, we narrowed down the traffic to just one SMB conversation. But something happened on another network conversation in between our Session Setup and the last error. To figure out where to go next, we'll have to zoom out and look at the rest of the traffic around the error in question. I'll select the error frame to keep my context and then click on "All Traffic" at the top of the conversation tree to remove the SMB conversation filter. When I do, I see the following traffic:

 

SMB:C; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes at Offset 32780

SMB:R; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes

SMB:C; Transact2, Query File Info, Query File Standard Info, FID = 0x400C (\files\dir.txt@#1644)

SMB:R; Transact2, Query File Info, FID = 0x400C (\files\dir.txt@#1644) - NT Status: System - Error, Code = (860) STATUS_NETWORK_SESSION_EXPIRED

KerberosV5:TGS Request Realm: CORP1.LOCAL Sname: cifs/c01e3n01ads.corp1.local

TCP:Flags=...A...., SrcPort=1162, DstPort=Microsoft-DS(445), PayloadLen=0, Seq=1084491174, Ack=239237167, Win=4163

KerberosV5:TGS Response Cname: Kevin

KerberosV5:TGS Request Realm: CORP1.LOCAL Sname: krbtgt/CORP1.LOCAL

KerberosV5:TGS Response Cname: Kevin

KerberosV5:AS Request Cname: Kevin Realm: CORP1.LOCAL Sname: krbtgt/CORP1.LOCAL

KerberosV5:AS Response Ticket[Realm: CORP1.LOCAL, Sname: krbtgt/CORP1.LOCAL]

KerberosV5:TGS Request Realm: CORP1.LOCAL Sname: krbtgt/CORP1.LOCAL

KerberosV5:TGS Response Cname: Kevin

SMB:C; Session Setup Andx, Krb5ApReq (0x100)

Kerberos Ticket Expired

Once the UI has completed updating the frame summary, my current selection remains on the SMB Error frame which keeps my place. But now some new Kerberos frames show up. This information together with the "Session Expired" message tells us the whole story.

The expired SMB session means we need to re-authenticate. In this case the Kerberos ticket expired and a new ticket had to be issued to us by the server. If we had the original setup traffic, we would be able to see the initial Kerberos ticket with its expiration time. Once this Kerberos negotiation completes, the SMB session is reset using the new Kerberos ticket and the SMB traffic continues where it left off. This authentication interruption in the traffic is what caused our "Delayed Write Failure" event log error message in the first place.

Getting to the Bottom of Things

In this case the Delay Write failure is easily explained. But there are many ways a delayed write failure can be triggered. You can use these same steps to zoom in and zoom out of a trace to understand this type of problem. Next time you see a Delay Write failure in your event log, I hope you can use these steps to figure out why it occurred.

Posted by PaulELong | 0 Comments

Chained Captures and Stitching Them Back Together

When you use NMCap to capture data you have an option to save the capture files as a chain. As the current capture file format has a limited size, this option allows you to continually capture the data in successive files. This also gives you some flexibility to limit the size. If you are sending files to another person for analysis you could send only the files that relate to the time period where a problem occurred. After using this feature; however, it might be useful to filter and re-stitch these capture files back together.

Capturing Chained Files with NMCap

You can capture using chained files using NMCap by naming the file with a .chn extension. The resulting files are named .cap, but they'll be a "capfile(#).cap" for every chained capture file after the first one. So for instance using the following command:

NMCap /network * /capture ipv4.address==1.2.3.4 /file foo.chn:1M

Will produce capture files which are 1 meg in size and have the following names in this order: foo.cap, foo(1).cap, foo(2).cap and so on. I've also provided a capture filter to limit the traffic to just one address. However, for the best performance I would leave any filtering out.

Combining Captures with NMCap

Using NMCap, you can recombine these to create one large capture file. To do this use the /InputCapture option as follows:

NMCap /InputCapture foo.cap foo(1).cap foo(2).cap /Capture /File out.cap

You could additionally add a filter to limit the information that gets transferred. For instance, say I only wanted to see port 80 traffic in the resulting trace. In that case the following NMCap will get the job done.

NMCap /InputCapture foo.cap foo(1).cap foo(2).cap /Capture tcp.port==80 /File out.cap

Using a Script to Combine Many Capture Files

Now, this might get somewhat tedious the more files you have. We can solve this problem by using a simple CMD Script to create collect all the files for us. Just create a file using notepad called stitch.CMD and place in it these contents:

REM Usage: stitch InCapFileBaseName OutCapFile.cap [Filter]

REM Creates flat output of capture files by date

dir /b /od %1*.cap > %TEMP%\captures.txt

REM Stores ordered file list in environment variable

SET INCAP=/InputCapture

for /f %%c in (%TEMP%\captures.txt) do call :addCap %%c

REM Calls NMCap to combine files

NMCap %INCAP% /capture%3 /file %2.chn:500M[MAH3]

goto :eof

REM Routine to append a file to the environment variable

:addCap

SET INCAP=%INCAP% %1

goto :eof

The CMD script file takes three parameters; the first is the original file name without the .cap extension. The second is the output capture file. Add the 3rd is the filter which is optional. You'll also want to run the script in the directory where all your captures are. Since it searches for *.cap, make sure there aren't any extraneous captures.

Posted by PaulELong | 0 Comments

I Can't View My Windows Home Server at Home

I have a friend who just received his Windows Home Server. Home Server allows you to access it remotely so you can share photos, Remote Desktop and backup documents. The provided documentation includes details on how to setup your router, open ports, and setup an external name like “myhomesrv. homeserver.com.” The problem was, when he went to test this out by typing the address in his web browser, he was shown his router's administrative web page instead of his Windows Home Server web page. Yet, I was able to access the web page fine from my work machine.

Collecting Evidence

I told my friend to download Network Monitor and get a trace. I also asked that he clear his local DNS cache by typing "ipconfig /flushdns". This is important because if a name is already cached it won't try and resolve the name again. This step ensures the resolution traffic will be captured when we reproduce the problem. In just a few minutes he sent me the capture file, and I opened it up.

Filtering on the External Name

I start by opening the trace and looking for DNS traffic by applying the display filter "DNS". In this particular trace there's a bunch of DNS traffic, but by looking at the summary line I can see the name my friend was trying to resolve.

192.168.2.2

192.168.2.1

DNS:QueryId = 0x847E, QUERY (Standard query), Query for myhomesrv.homeserver.com of type Host Addr on class Internet

192.168.2.1

192.168.2.2

DNS:QueryId = 0x847E, QUERY (Standard query), Response - Success, Array[xxx.143.174.204,yyy.46.154.126]

I see the query for "myhomesrv.homeserver.com" and then look for the matching response. In this case it was the next frame, but if you had a lot of traffic you could do a search for a DNS frame with the matching Query ID. And if you didn't know how to create a filter for the QueryID, you could right click on it in the frame details and “add to display filter” to understand how it should look.

Without even having to dig into the frame, you can see the response has all IP address info bubbled to the summary line. (By the way, I've obscured the address with xxx and yyy, but normally these would show as real numbers.) The proof I was looking for was to make sure the name, myhomesrv.homeserver.com, was being resolved to the external IP address of the router. Indeed the IP addresses matched, so I know that the name is resolving properly.

Next, I looked for the TCP setup and HTTP request that should occur since we were trying to browse his personal page. This occurs right after the DNS traffic as well.

192.168.2.2

myhomesrv.homeserver.com

TCP:Flags=......S., SrcPort=60824, DstPort=HTTP(80), PayloadLen=0, Seq=2533385604, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

myhomesrv.homeserver.com

192.168.2.2

TCP:Flags=...A..S., SrcPort=HTTP(80), DstPort=60824, PayloadLen=0, Seq=113434048, Ack=2533385605, Win=5840 ( Negotiated scale factor 0x0 ) = 5840

192.168.2.2

myhomesrv.homeserver.com

TCP:Flags=...A...., SrcPort=60824, DstPort=HTTP(80), PayloadLen=0, Seq=2533385605, Ack=113434049, Win=16425 (scale factor 0x2) = 65700

192.168.2.2

myhomesrv.homeserver.com

HTTP:Request, GET /

mhomesrv.homeserver.com

192.168.2.2

TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=60824, PayloadLen=0, Seq=113434049, Ack=2533386251, Win=7106 (scale factor 0x0) = 7106

mhomesrv.homeserver.com

192.168.2.2

HTTP:Response, HTTP/1.0, Status Code = 200, URL: /

We see that the client attaches to myhomesrv.homeserver.com, which is the same resolved name we saw picked up by DNS in the traffic before. The Network Monitor parsers will automatically resolve names for you when it sees name resolution traffic, but you can always add different columns or simply dig into the frame to verify the IP address.

Now, we see that the traffic is going to the right address. It appears that the name resolution is working correctly and doing want we want. However, the response shows information that looks like my friend’s router’s web page.

Of course this isn't a surprise because this is what we see in the browser as well. Then what happened? Why did the web page from his router appear instead of his home server?

Doing Some Homework

We've identified some strange behavior, what next? A trace from the ISP might give us more information. Personally, I can't even get my ISP to answer simple billing questions so asking for a trace would probably be fruitless. But maybe we can see if other people are experiencing the same problem. After doing some Bing searches, I came across this blog (http://www.myhomeserver.com/?page_id=67). In particular in Step 7 it mentions the "loopback issue".

It appears that some routers don't know what to do with an external address when sent from the inside. As we see, this matches the behavior in the trace. The DNS request returns the address we expect, and the following HTTP request is also sent to the right place. However, we see that the response from the router comes back with the router’s web page. Instead we should have seen the HTTP request get bounced to our Home Server’s internal address.

Buy a New Router?

Well maybe that's extreme. I would suggest checking for a firmware upgrade first. A less expensive simple solution is to use the Home Server machine name in these circumstances. In any case my friend is now able to access his Home Server’s website internally by using http://myhomesrv and externally with the address http://myhomesrv.homeserver.com.

Posted by PaulELong | 1 Comments

TCP Analyzer Expert: Make Your Network Run Faster

Performance problems suck...time! But years of "Where's Waldo" has trained our brains in preparation for this moment. The TCP Analyzer expert, available from our Experts Download Page[ http://go.microsoft.com/fwlink/?LinkID=133950] takes advantage of that training by graphically representing TCP traffic. By looking at normal traffic or comparing the presented picture graph to some known TCP issues you can easily diagnose performance problems.

With the TCP Analyzer Expert you can load a trace, use the conversation tree to locate a TCP stream, and run the expert. If you don't have anything selected, the expert will use the first TCP conversation in the trace. Once it's run it presents you with a UI which will allow you to graph the TCP traffic, Analyzer Round Trip Time, and do some high level diagnosis based on some known issues.

How to Analyze Traffic

Say you suspect a problem or want to analyze some traffic. The first thing you need to do is collect a trace using Network Monitor. TCP Analyzer can try to "guess" the general problem and describe the issue. But for this to work properly you will need to take the trace from the machine initiating the connection. Also it helps to have the entire TCP connection as the window size is negotiated during the TCP 3-way handshake.

Once you start a trace, you then reproduce the performance test and stop the capture. Then save the capture, as Experts can only be run on saved traces. Go back to the start page where you'll see the file you just saved in the recent capture list and open it up.

Finding the TCP Conversation

The next trick is locating which TCP stream you want to run the expert on. In this case I copied a file using explorer and I knew then name of the file I copied. So I created the following filter.

ContainsBin(FrameData, UTF16BE, "myfile")

It could have potentially been ASCII as well, but with SMB I knew it would probably occur as Unicode. BTW, UTF16BE stands for Unicode 16 Big Endian. These days Unicode has many flavors, but UTF16BE is the most common one for Windows machines.

This filter located a bunch of SMB frames which meant I was on the right track. I right clicked a frame, selected Find Conversation, and choose TCP. This locates all other frames in the same conversation which the TCP Analyzer will use to determine which stream to use when it runs. Remember, to see the full stream in Network Monitor, remove the display filter you used to find the frame originally.

Now with the correct conversation selected, I run the TCP Analyzer Expert form the Experts menu. This runs the expert, but in order to get a graph to show up I have to press the graphing button from the toolbar.

image

Since there is traffic flowing in both directions, you need to determine which you want to concentrate on. You can use the port or IP address to figure this part out. Once you make this determination click the graph. This will display the graph in the main window allowing to you zoom in/out with the mouse wheel and you can drag the main graph around as well to pan.

You can also analyze the Round Trip Time, which is the graph in the middle. However there are some restrictions that have to be met before any information will be available. We won't cover RTT in this blog, but you can see the help for the expert for more information.

Decoding the Graph

The Axis

The Y axis shows the sequence numbers for the given direction. These are defined by TCP when a session initializes. Each sequence number represents the number of bytes transmitted. So sequence 1000-2000 represents 1000 bytes.

The X axis is time and is measure in (ms). This matches the offset as displayed in Network Monitor.

Legend Details

On the time-sequence graphs there are various symbols which can occur. Here's a list of what they mean. image

·  Receiver Window - Receiver is telling the sender it is currently willing to receive up to this point in the data stream.

·  Acknowledged - Receiver is telling the sender it has successfully received all the data up to this point in the data stream.

· Data - The point in the data stream the sender is currently sending.

· SYN - The SYNcronize packet sent at the start of the connection.

· FIN - The FINish packet sent at the end of the connection.

· Discontinuity - Any break in the data stream where the data in the indicated packet doesn't sequentially follow the data in the previous packet. Out-of-order, lost, or retransmitted packets can all cause discontinuities, as can gaps in the capture.

· Presumed Lost - A packet that was later retransmitted (if a sequential group of packets are all later retransmitted, only the first one will be indicated this way).

· Retransmission - A packet that is a retransmission of another packet in the capture.

Understanding Bandwidth-Delay Product

The speed at which you can send data in TCP is dependent on both the bandwidth of your network and the delay. The bandwidth is often referred to in terms like 10Mbps or 100Mbps, which is in bits per second. The delay is how long it takes for data to travel from one place to another and back. While this is related to the speed of light, other things like routers and the computers that are communicating can increase this delay as it takes time to process packets.

By multiplying bandwidth and delay together, we get the maximum amount of data that be "in flight" over one connection between two computers. As you'll see, whether this maximum is utilized depends on how well TCP is tuned. It's important to understand as the delay gets longer it becomes more important to fill the available window.

Pictures of Wrong Behavior

In TCP there are some typical problems that creep up over and over. Sometimes these are configuration issues with the client/server TCP stack or application. And in some cases, the problems can be easily fixed by adjusting the application or TCP window size. Of course, this may also be caused by your network which may require more drastic measures.

The best way to understand right from wrong is to base-line your network when it is working properly. This way you can look at the bandwidth numbers alone and understand if you have degraded. But in absence of this data, you can use the following pictures as a reference in order to identify some common problems.

Bandwidth Limited:

image

In this case you see that the sent data fills up the window as the data packets (blue X) approach the receive window (red X). The packets are sent at a regular interval, so the only thing limiting your through put is the available bandwidth. This is normally what you want to see as your throughput will always be limited by something.

Receiver Limited:

image

The packets fill the receiver window, but they go out in bursts as fresh acknowledgement packets arrive and open up the window. This burstiness is an indicator that the window is smaller than the delay-bandwidth product, and thus the protocol can't keep the data stream flowing smoothly.

Sender Limited:

image

This indicates that one end's window size is less than the bandwidth-delay product. However, unlike the receiver-limited case above, the data packets fall well short of filling the receiver's advertised window. This is a good indicator that the sender's window was the limiting factor. In some cases this is because the application doesn't fill the window completely. As this often does not show up under low latency, a developer might not detect this type of problem in testing.

Congested Limited:

image

The earlier data points (lower left) look like a bandwidth-limited connection, until two lost packets cause TCP to severely limit the sender's congestion window after recovering from the losses. Note that the last data points (upper right) show the data packets aren't filling the receiver's advertised window as TCP is limiting the sender to a smaller congestion window.

It's important to note that the pictures were created in test environments. Real word applications tend to be more conversational and you'll often have to narrow down the part of the picture you need to focus on. For instance when you start a file copy with explorer, there's a lot of traffic that goes back and forth as you browse for the folder, select the file and then finally drag and drop it on the destination folder. You'll have to learn how to differentiate the actually transfer part from the rest of the traffic.

Power of the Picture

TCP Analyzer does an awesome job of taking a lot of information and summarizing in a picture that can be used to give a good overview of your network’s performance. It can take practice to learn how to read as you understand these scenarios that were presented as well as others. But as you learn you'll find that this is a powerful tool in your tool belt.

Posted by PaulELong | 0 Comments
More Posts Next page »
 
Page view tracker