Hi there,

 

In today’s blog post, I’m going to talk about how to troubleshoot a WAN performance problem by using simultaneous network traces.

 

One of customers reported that they were having performance problems when retrieving data from a SQL server running in a remote subnet. Interestingly the problem wasn’t in place when the client and SQL server were running in the same subnet J. Even though it was obvious from that testing that the WAN connectivity had something to do with the problem, our customer wanted to make sure that there were no issues at the client side and server side.

 

In such problem scenarios, the first thing we ask for is simultaneous network traces collected while reproducing the problem. Generally just looking at a one side network trace might yield incorrect results and may result in looking at the wrong direction. From that perspective, simultaneous network traces are a must.

 

Provided that simultaneous network traces are collected, how are you going to troubleshoot this problem? Well, there could be a number of approaches but I generally use the following method:

 

a) Find the same session (this will be most likely a TCP or UDP session) at both side traces. This may be a bit harder than expected especially if there’s NAT between the client and server side. Otherwise it should be an easy one.

 

b) After correctly identifying the same session at both sides, your next task will be to find the packet delays in the traces. The most appropriate way to do so is to order packets in the network trace based on time delta (time between two packets that are displayed back to back in a network trace).

 

c) After that, you can identify the highest time deltas within the trace (for example you can re-order packets based on time delta). If the problem is a WAN delay issue, this will generally look like that:

 

=> At the client side trace, most of the delays are seen in packets that are sent by the server side

=> At the server side trace, most of the delays are seen in packets that are sent by the client side.

 

In other words, from client perspective, it’s Server that is causing the delays and hence the performance problem. From server perspective, it’s Client that’s causing the delays and hence the performance problem. Actually both sides are WRONG! This stems from the WAN delay J. I think this may also underline the importance of collecting simultaneous network traces in such scenarios otherwise you will get the wrong results..

 

Well, now let’s take a look at an example to better understand this:

 

EXAMPLE:

========

Client is having performance problems in retrieving data from the SQL server and this is only visible when the client and server are running at different subnets. There’s also NAT (network address translation between the client and server). We have the IP address information and simultaneous network traces collected while reproducing the problem:

 

A. IP address information:

 

=> From client side perspective:

 

Client IP:             192.168.1.66

Server IP:            10.1.1.7

 

=> From server side perspective:

 

Client IP:             172.16.5.14

Server IP:            192.168.200.148

 

Note: IP address information is changed deliberately for data privacy

 

 

B. The session in which we see the performance problem:

 

Since this was a SQL server connectivity problem, it was obvious that the client should be contacting the SQL server at TCP port 1433 (if the default configuration wasn’t changed). After a few checks I found the following session in the simultaneous network traces:

 

Wireshark filter for Client side session:

(ip.addr eq 192.168.1.66 and ip.addr eq 10.1.1.7) and (tcp.port eq 46739 and tcp.port eq 1433)

 

Wireshark filter for the same session seen at Server side trace:

(ip.addr eq 172.16.5.14 and ip.addr eq 192.168.200.148) and (tcp.port eq 63584 and tcp.port eq 1433)

 

 

Now let’s look at how the communication is seen from the client perspective and server perspective:

 

C. From client perspective:

 

No.     Time            Source                    Destination             Protocol Info

    406 20.541677   192.168.1.66           10.1.1.7                  TCP      46739 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=11322755 TSER=0 WS=6

    407 0.089037    10.1.1.7                    192.168.1.66          TCP      1433 > 46739 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1380 WS=8 TSV=359429749 TSER=11322755

    408 0.000036    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

    409 0.000142    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

    410 0.089013    10.1.1.7                    192.168.1.66          TDS      Response Packet

    411 0.000058    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

    412 0.000861    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

    413 0.091710    10.1.1.7                    192.168.1.66          TDS      TDS7/8 0x12 Packet

    414 0.006916    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

    415 0.093413    10.1.1.7                    192.168.1.66          TDS      TDS7/8 0x12 Packet

    416 0.002583    192.168.1.66            10.1.1.7                  TDS      Unknown Packet Type: 23[Unreassembled Packet]

    425 0.091253    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

    426 0.000434    192.168.1.66            10.1.1.7                  TDS      Query Packet

    427 0.089305    10.1.1.7                    192.168.1.66          TDS      Response Packet

    428 0.000330    192.168.1.66            10.1.1.7                  TDS      Query Packet

    429 0.088216    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

    431 0.040908    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

    432 0.174717    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet

   433 0.103898    10.1.1.7                     192.168.1.66          TDS      Response Packet[Unreassembled Packet]

    434 0.000056    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

    467 0.062827    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet[Unreassembled Packet]

    473 0.091735    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

    474 0.000061    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

    475 0.008514    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet

    476 0.093648    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

    477 0.001301    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet[Unreassembled Packet]

    478 0.087454    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

    479 0.039780    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

...

 

From client’s perspective, responses from SQL server side come with a delay approximately 0.08 seconds or so. The client sends the next query without waiting...

 

D. From server perspective:

 

No.     Time        Source                Destination           Protocol Info

     44 21.123385   172.16.5.14              192.168.200.148       TCP      63584 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1380 TSV=11322755 TSER=0 WS=6

     45 0.000039    192.168.200.148       172.16.5.14TCP      1433 > 63584 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8 TSV=359429749 TSER=11322755

     46 0.089015    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

     47 0.000461    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

     48 0.000117    192.168.200.148       172.16.5.14TDS      Response Packet

     49 0.087810    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

     50 0.001553    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

     51 0.000427    192.168.200.148       172.16.5.14TDS      TDS7/8 0x12 Packet

     52 0.099177    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

     53 0.004255    192.168.200.148       172.16.5.14TDS      TDS7/8 0x12 Packet

     54 0.091772    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 23[Unreassembled Packet]

     55 0.001104    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     56 0.089833    172.16.5.14               192.168.200.148       TDS      Query Packet

     57 0.000183    192.168.200.148       172.16.5.14TDS      Response Packet

     58 0.089154    172.16.5.14               192.168.200.148       TDS      Query Packet

     59 0.000150    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     60 0.128460    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

     61 0.178927    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

     62 0.001507    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     63 0.098217    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

     64 0.066525    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     65 0.000168    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     66 0.089069    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

     67 0.011578    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

     68 0.001386    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     70 0.089248    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     71 0.000147    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     72 0.129948    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

     73 0.002327    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

     74 0.001011    192.168.200.148       172.16.5.14TDS      Response Packet

     75 0.088519    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=3323 Ack=2497 Win=12608 Len=0 TSV=11323115 TSER=359429884

     76 0.003928    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     77 0.000144    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     78 0.097206    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

     79 0.001297    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     80 0.089896    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     81 0.000147    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     82 0.099239    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     83 0.002980    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

     84 0.000025    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=2869 Ack=6121 Win=65536 Len=0 TSV=359429923 TSER=11323187

     85 0.000272    192.168.200.148       172.16.5.14TDS      Response Packet

     86 0.091009    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     87 0.000145    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     88 0.125878    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     89 0.002955    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

     90 0.000022    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=3783 Ack=8147 Win=65536 Len=0 TSV=359429945 TSER=11323242

     91 0.000233    192.168.200.148       172.16.5.14TDS      Response Packet

     92 0.095498    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     93 0.000187    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

     94 0.111501    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

     95 0.005919    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

     96 0.000019    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=4760 Ack=10928 Win=65536

 

When checking the trace from server perspective, SQL server returns TDS responses almost always immediately. But the TDS requests from the client side come to SQL server with an average delay of 0.08 seconds or so.

 

RESULTS:

========

1) From client’s perspective, SQL server is slow in sending TDS responses

 

2) From SQL server’s perspective, client is slow in sending TDS requests.

 

3) The real problem is that TDS performance issue was stemming from WAN connectivity delay (like link delay/network device delays etc). In this instance the average WAN delay was about 80 milliseconds (0.08 seconds). Even though it looks like a small number at first look, considering that you may be exposed to that small delay let’s say 100 times while making a transaction, this could easily become an issue (like 8 seconds). So never underestimate such small values J

 

Hope this helps

 

Thanks,

Murat