We often have customers who want to better understand the resolution to their networking support issue and why what we did fixed their issue. Depending on the issue, this explanation may be quite involved and complex. This would be like asking the pilot on the way off the airplane to quickly and briefly explain the aerodynamics, electronics, hydraulics etc that went into landing the plane and how they all work together. Since there is no brief explanation of the in-depth workings of how computer networks do all the things they do I will be covering this topic in a series of blog posts in which I hope give at least a basic understanding of network protocols and architecture. My intention here is to lay a foundation from which a deeper understanding of Windows networking can be gained. We will discuss IPv4 over Ethernet only, and will not be delving into IPv6 or other physical network topologies.
In order to continue we need to at least mention the 7 Layers of the International Standards Organization/Open System Interconnect (OSI) model. This is the model where we get the reference for Layer 2 and Layer 3 when referring to types of switches and routers, for example.
This model consists of the following seven layers:
Layer 1 is the Physical layer and this consists of the Network Interface Card (NIC) and other components that allow a system to physically and logically connect to a network. This is as deep as we need to go into Layer 1 for this discussion.
We will focus more on Layer 2 in this blog post, specifically starting with Layer 2 routing, and then get into Layer 3 routing in the next blog entry.
For more information on the OSI model and Window Network Architecture see the following:
Windows Network Architecture and the OSI Model http://msdn.microsoft.com/en-us/library/aa938287.aspx
TCP/IP Architecture http://technet.microsoft.com/en-us/library/cc751234.aspx
The first thing we will need in order to communicate on a computer network is some method for structuring what is being sent. This will be important not only for the computer itself but also allows other devices on the network such as routers and switches to be able to properly handle network traffic. The two standards we are using to accomplish this for TCP/IP, are Ethernet II and IEEE 802.3, and they are found in Layer 2 of the OSI model. These standards define what is included when data is "framed" to be sent so, data sent on a network will often be referred to as frames. For more in-depth discussion on how this is structured and the inner workings of Windows networking the following books are excellent references:
"Microsoft® Windows® Server 2003 TCP/IP Protocols and Services Technical Reference" http://www.microsoft.com/learning/en/us/books/5030.aspx
"Windows Server® 2008 TCP/IP Protocols and Services" http://www.microsoft.com/learning/en/us/Books/11630.aspx
Now that we have a standard for constructing a frame to put on the wire, we will need a way to determine where we are going to send the frame. In order to communicate with other computers on the network a system or "Node" must have a way of identifying itself and other systems within the local subnet or "Broadcast Domain", a Broadcast Domain being the network that is reachable by broadcast. This identification is done using the MAC address of the network adapter. A MAC address may also be referred to as an Ethernet Address or Physical Address. This address is assigned by the manufacturer at the time the network adapter, also known as the Network Interface Controller (NIC), is created. It is possible to find NICs that allow the MAC address to be changed manually but care should be taken in doing this as this could cause addressing problems on the local subnet.
A MAC address consists of 6 Bytes, the first 3 of which are used for the Organizationally Unique Identifier (OUI) which is unique to the manufacturer of the NIC.
You can determine the manufacturer of your NIC by running an IPConfig /all on your system from a command prompt. Next take the first 3 bytes of the Physical Address and plug them into the "Search For" under "Search the public OUI listing" at the following link:
http://standards.ieee.org/regauth/oui/index.shtml
The last 3 bytes of the MAC are specific to the NIC. Together they provide a unique local address.
As I mentioned, this gives a system a way to identify itself and other systems on the local network. However, for this to be useful we need a way to discover what the MAC address of other systems are, as well as to allow that address to be discovered by other systems. To this end, Address Resolution Protocol (ARP) was developed, and is described in RFC 826.
So that sounds pretty good, you might say. I have my MAC address. I'm all set to talk on the network. You would be mostly correct. From a layer 2 stand point you do have everything you need to communicate, however, most operating systems, including the Windows operating systems, allow communication to take place with systems beyond the Broadcast Domain. For this reason we have Layer 3, the Internet Protocol Layer, where we have Internet Protocol (IP) addresses. This is significant because the system cannot just always assume that traffic will be on the local subnet. The concept I want you to grasp here is that MAC addresses are for "local" routing and IP addresses are for "global" routing. This is a bit over simplified but let's run with this for now and it will all get clearer as we get farther into IP routing. For now, we just need to know that each system will have both an IP address as well as a MAC address. In order to communicate with other systems, we will need a way to match the IP address of the system we want to communicate with to its MAC address. In addition, the source system will want to share its own MAC and IP address so that the target node will know how to communicate back. In order to accomplish this matching of MAC to IP addresses we use Address Resolution Protocol (ARP).
You will notice as we discuss ARP that the requests are structured a certain way. This is because we are conforming to standards such as RFC 826 which defines ARP, and RFC 5227 which defines IPv4 Address Conflict Detection. ARP is used to resolve the next hop IP address of a node to its corresponding MAC address. This is significant because the next hop IP address is not necessarily the destination IP address. Remember the concept I mentioned earlier that the IP address can allow for global routing.
When looking at the ARP in a network capture you will see that there are four fields used to identify the source and target IP and MAC address.
In the ARP Request the fields are filled in as follows:
When the broadcast ARP request is received on Node 2, Node 2 updates its ARP cache with the information it received in the request. In the ARP reply, notice that the SHA and SPA are updated to match the correct information for the sending system. This is the information used by Node 1 to update its ARP cache. Once this is done, both systems will have the MAC and IP information it needs to communicate with the other node.
Once an address gets put in the ARP cache it is maintained for a set amount of time. The default behavior is a two minute timer that is reset every time the destination MAC is used for a total of 10 minutes. After 10 minutes of use the destination MAC address is discarded and must be resolved again with a new ARP request. If after two minutes the destination MAC has not been used it is discarded.
It is important to remember that in order for ARP to work, the requester must already have a destination IP address that it will request the MAC address for. The IP address for the destination may be entered manually or may be discovered through name resolution.
Note that starting with Windows Vista we no longer refer the cache as ARP cache, we will discuss this further, later in the this blog post.
ARP is also used to detect IP address conflicts. Address conflict detection is used to insure that a system that is brought up on the network or that is assigned a new IP address does not have an address that conflicts with a system already on the network.
In address conflict detection, we use what is known as a Gratuitous ARP. When a system is configured with an IP address either manually or by DHCP it will send a Gratuitous ARP to insure that another node on the network is not already configured with this IP address. In the case of a conflict the two nodes are defined as follows. The Offending Node is the node that is sending the gratuitous ARP, and the Defending Node is a system already configured with the IP Address in question. The contents of this request and how this affects the ARP cache on other systems on the network differs depending on the OS.
In Windows XP and Windows Server 2003 the Gratuitous ARP request is sent with the Senders MAC filled in with the MAC of the sending system and the Target MAC set to 0's, but the Senders and Target IP address are both set to the address of the sending system. If a conflict is detected then the defending system replies with its IP and MAC address.
Example:
The problem with this method is that all the nodes that receive this broadcast and have an ARP cache entry for this IP address will update their ARP cache with invalid data. So the defending node will now need to send its own Gratuitous ARP to correct the cache on the other systems on the network. Because of this, starting with Windows Vista the Gratuitous ARP is handled differently.
In Windows Vista and Windows Server 2008, ARP Cache is now known as Neighbor Cache. The ARP -a command will still display the legacy ARP Cache and we can still add static ARP entries.
The contents of the neighbor cache can be displayed with the following netsh command.
netsh interface ipv4 show neighbors
When this command is run you will notice that we have different states for neighbors. The following states are possible:
In Windows Vista and Windows Server 2008 there are some built in protections that reduce the chance of the Neighbor cache getting updated with incorrect information. This also helps keep the requesting system from incorrectly updating other systems.
First, a Windows Vista or Windows Server 2008 will not update the Neighbor cache if an ARP broadcast is received unless it is part of a broadcast ARP request for the receiver. What this means is that when a gratuitous ARP is sent on a network with Windows Vista and Widows Server 2008, these systems will not update their cache with incorrect information if there is an IP address conflict.
Additionally, when a gratuitous ARP is sent by a Windows Vista or Windows Server 2008, the following change has been made – the SPA field in the initial request is set to 0.0.0.0. This way the ARP or neighbor caches of systems receiving this request are not updated. So, if there is a duplicate IP address, the receivers do not need to have their cache corrected.
There will be times when a system needs to resolve the MAC address of a system that is not reachable within the Broadcast Domain. When this happens, we can use another device on the network to answer the ARP request, this is known as Proxy ARP. Proxy ARP is the answering of ARP Requests on behalf of another system. One example of this is when a Remote client connects to Windows Routing and Remote Access (RRAS) server. When the client connects to a RAS server it is assigned an IP address from the server and the server keeps track of which client was assigned the IP address. When clients on the internal network and remote clients attempt to communicate with each other the RAS server will use Proxy ARP to reply with its own MAC address. As far as the client sending the ARP request is concerned it has successfully resolved the IP to the MAC of the remote client. In the example, the LAN client is sending an ARP request for the IP of the Remote Access Client. Notice that the ARP reply comes from the RAS server using its own MAC.
Next time we will discuss IP routing and get deeper into IP addresses.
- Clark Satter
Lately, I have been seeing a number of issues/concerns from people where they manually stop the Firewall service and lose connectivity to the machine. They always seem surprised when I explain that it is by design.
In versions of Windows XP prior to Windows XP SP2, there is a window of time between when the network stack starts and when the Windows Firewall Service (ICF) starts to provide protection. The firewall driver does not start to filter TCP/IP packets until the service is loaded and the appropriate policy is applied. The firewall service depends on several functions and must wait until those functions clear before the service pushes the policy to the driver. During this window of time, a packet could be received and delivered to a service without being filtered. This could potentially leave the computer vulnerable to an attack by exposing ports that would otherwise be protected by the firewall.
Note: The time period is based on the speed of the computer.
In Windows XP SP2, the firewall driver has a new static policy rule called the boot-time policy. The boot-time policy performs stateful filtering and eliminates the window of vulnerability when the computer is starting. The boot-time policy enables the computer to open ports so that basic networking tasks such as Domain Name System (DNS) and Dynamic Host Configuration Protocol (DHCP) can occur. The boot-time policy also enables the computer to communicate with a domain controller to obtain appropriate policies. As soon as the firewall service is running, the run-time policy is loaded, applied, and the boot-time filters are removed. The boot-time policy cannot be configured.
There was another security feature added so that if the firewall service is stopped or crashes, the boot-time filters are again loaded to protect the computer. This would prevent an attacker from crashing the firewall service and exposing the machine.
This can cause confusion if you are not aware of it and try to simply stop the firewall service to eliminate it as a potential cause while troubleshooting a connectivity issue.
In Windows Vista the boot-time policy functions the same as it does in Windows XP SP2 except that the service is MPSSVC.
There are multiple ways to manually stop the Windows Firewall:
One of the more common methods to use to stop the firewall service as a test is to use Net stop MPSSVC (for Windows Vista) or Net stop SharedAccess (for Windows XP) but both of these will cause the boot-time filters to load. The proper way to completely stop the firewall is by setting the service to disabled in Services Manager and then stopping the service through one of the GUIs or Netsh. This will prevent the boot-time filters from loading when the firewall service is stopped.
Figure 1. Setting the firewall service to disabled in Services manger.
It is worth noting that when you stop the MPSSVC service, IPSec policies are no longer in effect.
This could be a potential issue for third-party firewall services that want to replace the Windows Firewall but don't provide IPSec functionality. The recommended way to resolve this situation is to set the firewall to allow all traffic and leave the service running. Microsoft provides an API call that third-party services can use to stop the Firewall Service. This call sets the firewall to allow all traffic while leaving the service running so IPsec can still function and is the expected method for third-parties to use.
In Windows 7 and Windows Server 2008 R2, you first need to disable and stop the “Base Filtering Engine” service. Only stopping the Firewall service as described above will put you in block mode.
Another option is to stop the “Network List Service”. This will not allow the Firewall service to associate a profile and therefore it will be unable to block any traffic.
You will also want to investigate this MSDN link: I Need to Disable Windows Firewall
Microsoft does not recommend you stop the firewall service (or a third-party firewall service) except for troubleshooting even if you are behind another edge/perimeter firewall. If another machine on the local subnet gets infected, a machine that is not running a host firewall is vulnerable.
- David Pracht
I just wanted to provide a quick “what to do” when having an IP address conflict. I’ve had a few of these cases this year. I’m not sure that one needs to open a support case with us to resolve this. In some instances, critical applications/resources are down due to IP address conflicts. I’ll provide a few steps one should take in troubleshooting this situation.
IP address conflicts happen within a subnet. I had a call recently in which a cluster resource wouldn’t come online. When the cluster resource attempted to come online, it issued an ARP request for its IP address. In a working scenario with no conflict, there would be no response. In this instance, a VMWare server responded. The cluster resource reported the conflict and failed to initiate. They were unable to locate the VMWare server, so the customer changed the IP address on the cluster resource.
Basic steps:
It will help if there is a record of MAC addresses. Use Excel to keep a list of MAC addresses in your environment. Easily use SQL to query for the MAC returned by "ARP -a" command. Other monitoring software such as Systems Center Operations Manager will record and store MAC addresses.
Hopefully this saves you a support call down the road. Archive it if you must. Inevitably, you’ll have an IP address conflict. This will save you some time.
- Rich Chambers
Last month, one of my colleagues wrote an article to share his experience with changing a setting on Server Core (http://blogs.technet.com/networking/archive/2009/01/08/configuring-advanced-network-card-settings-in-windows-server-2008-server-core.aspx). I had a call with a similar experience where I found myself asking “where do I find that in Server Core?” I’d like to share it with you so you don’t get caught in the same bind.
Hyper-V has the ability to support VLANs (802.1q) for the Host or the Guest Virtual Machines. Within the HyperV Manager, you open the Virtual Network Manager and can add a VLAN tag to a particular virtual network (Figure 1) or Guest (Figure 2):
Figure 1
Figure 2
This is all fine and good when you have the Hyper-V Manager GUI available to you. The question comes into play when you are running Server Core. First of all, to manage Hyper-V on that server, you can run the Hyper-V Manager MMC remotely on another 2008 server or on a Vista box after installing the following download (http://support.microsoft.com/kb/952627). In my case, a customer was using another 2008 server for management.
What went wrong?
The exact problem occurred when the customer accidentally enabled VLAN tagging on the External Network without having a VLAN-capable switch. As soon as that change was made, they lost all communication with the Hyper-V server and all of its guests. Fortunately, they had an ILO connection to the server so we were still able to get on the box, but the question then was, how do we remove the VLAN tag from the virtual switch without Hyper-V Manager?
The solution
The solution is best understood if you navigate the registry and look at some of the registry keys created for the Virtual Switches. If you navigate to HKLM\SYSTEM\CCS\Services\VMSMP\Parameters, you’ll see some interesting keys there (Figure 3). These keys show all of the NICs on the host server (NicList) and all of the Virtual Networks that have been created (SwitchList).
Figure 3
Now that you know where the information is stored, it’s actually really logical how to find the External Network and remove the VLAN setting. First, under HKLM\SYSTEM\CCS\Services\VMSMP\Parameters\NicList, open the individual GUIDs and examine the "FriendlyName" value until you find the name of the External Virtual network. The results from my lab machine can be seen below for my network called “MSNET” (Figure 4):
Figure 4
Once you find the right virtual network, make note of the GUIDs under SwitchName and PortName in that network. Next, expand HKLM\SYSTEM\CCS\Services\VMSMP\Parameters\SwitchList and find and expand the GUID referenced in SwitchName above. Under that switch GUID, find the PortName GUID referenced above. This is the port in the virtual switch that the host is connected to. I’ve selected that port in Figure 5 below, so you can see the values for that port.
Figure 5
You will see a DWORD called AccessVlanId. Change the value to 0 your Hyper-V box and reboot. Once it comes back up, the VLAN tags will be removed and you should be able to communicate with the server.
Conclusion
VLAN support is a great feature of Hyper-V, but one that should be implemented slowly and thoughtfully if never done before. As always, you shouldn’t go messing with the registry unless you absolutely need to and be sure to test in the lab before rolling out any changes to your production network. You’ll thank me later.
One great article I found on VLAN support in Hyper-V that I wanted to share can be found here:
http://blogs.msdn.com/adamfazio/archive/2008/11/14/understanding-hyper-v-vlans.aspx
- Michael Rendino
One of the most common calls we get is that file transfers are “slow,” causing us to collect network traces and perfmon logs in order to determine if the transfer is slower than it should be and, if so, what is the cause. I had a case recently where a little-known and rarely-experienced change to Windows Server 2003 Service Pack 2 caused some pain for a customer.
The customer had a cluster of Server 2003 Terminal Servers, which then accessed a cluster of file servers on the backend. The customer noticed that after they upgraded some of their Terminal Servers to Service Pack 2, things had significantly “slowed down” as a result. We did some standard troubleshooting and made changes to things like the Scalable Networking Pack, but it didn’t seem to make a difference. A process that used to take 6-7 seconds was now taking 25 seconds to perform. The strangest thing about it was that regular file copies from the file servers were fine. The slowdown occurred when the users logged in and a configuration file was read from the file server line-by-line. They had a script that reproduced the issue consistently by accessing a file and doing that line-by-line transfer. We took network traces and saw the following:
Service Pack 1
In the above transfer, you can see that the SMB buffer in the Read AndX Request starts at 4096 bytes, but quickly increases to 32K, resulting in the fast response. However, when we looked at the post SP2 server, it looked different:
Service Pack 2
This time, the buffer didn’t increase beyond 4K throughout the transfer, even though during the Session Setup AndX Request, we said our Max Buffer size was 16K. We found it especially strange that full file copies didn’t have this same behavior.
Based on that information, I did some research and found out that from Service Pack 1 to Service Pack 2, we disabled a ReadAhead feature since hotfix 894463 is included in SP2. (Please note that while hotfix 894463 makes the change, KB328237 discusses ReadAhead and provides the registry settings to re-enable the feature).
This ReadAhead feature doesn’t benefit all file transfers, but does impact performance with this small, line-by-line transfer and sometimes when Office files are opened over the network. While the issue appeared on a Windows Server 2003 box in my case, it can also affect other clients (by client, I mean the computer initiating the file transfer) such as Windows XP and Windows 2000. In order to return to their previous performance, on the Terminal Server we added the ReadAheadGranularity DWORD referenced in 328237 and set it to 8 for eight pages to read ahead (equaling 32 KB). Once the customer rebooted their server, the issue was resolved and their rollout to SP2 could continue.
Here’s some other helpful Information for tweaks to improve file transfers:
Thanks and happy copying!
-Michael Rendino
One of our customers reported an issue after applying the update per KB958687 (MS09-001), which installs a new version of SRV.SYS. The customer was experiencing the following symptoms:
Troubleshooting the issue, we used system configuration utility (i.e.MSCONFIG), and disabled all non-Microsoft software & utilities once this was done the symptoms were not seen. In order to narrow the problem down, we started enabling the third party services/application one after the other.
We went through the list one by one and were able to reproduce the issue when the third- party antivirus service on the customer’s machine was started. In order to confirm this, we disabled the antivirus application and started the rest of the applications. After this the customer indicated the problem was no longer happening.
We enabled the AV service once again and the issue returned. We then created a new user on the local machine and logged in as the new user, but the results were same.
We logged back with domain credentials, disabled the AV application again, and the machine came up quickly and the user experience was as expected.
This confirmed that the issue was being caused by the antivirus software on the customer machine. Now that we knew the cause, we had to find a solution.
The next step that we took was to update the antivirus software, which in turn downloaded the latest antivirus signatures. After this was complete, we rebooted the machine. The antivirus service was running this time and the symptoms were no longer seen.
The Take away: Always keep your machines updated.
- Firasat Ali Mirza