If you work with NLB in Multicast mode on Windows Server 2008, this is a must read for you, as you may run into this issue.
Symptoms: You cannot connect to the NLB Virtual IP Address when NLB is running in Multicast Mode on Windows Server 2008 from machines on the other subnets. You will see “Request Timed Out” errors when you ping the VIP Address from a client on different subnet.
In the following scenario, the client is on subnet 172.32.x.x/24 and the Windows Server 2008 NLB node is on subnet 172.33.x.x/24:
Note: A static ARP entry is in place on the router, which maps the Virtual IP Address and Multicast Virtual MAC Address (which we generally do as most routers do not support ARP requests for a Unicast IP Address to a Multicast MAC Address).
Analysis of the network capture will show that the NLB Node sends an ARP Request to the IP address of the Default Gateway, but there is no response.
In the ARP request, we see the Sender IP address as 172.33.X.100(VIP), Sender MAC address as 03-bf-ac-20-10-64(Cluster Multicast MAC), Target Address as 172.33.X.1(default gateway) and Target MAC as 00-00-00-00-00-00. See the below ARP packet:
- Arp: Request, 172.33.X.100 asks for 172.33.X.1
ProtocolType: Internet IP (IPv4)
HardwareAddressLen: 6 (0x6)
ProtocolAddressLen: 4 (0x4)
OpCode: Request, 1(0x1)
SendersMacAddress: 03-bf-ac-20-10-64 <-- Multicast MAC address
The Router did not respond to the above ARP request, so the ARP resolution to Gateway IP fails, causing the ping to the Virtual IP address from the client to fail.
The above ARP Request Packet indicates that the Sender’s IP Address is Unicast and Sender's MAC Address is Multicast. Most routers do not respond to ARP Requests with Unicast Sender IP and Multicast Sender MAC (Multicast Cluster MAC). So, the NLB node does not get a response to the ARP request and the ping from the NLB Virtual IP Address fails as the NLB node fails to resolve the MAC address of the Gateway.
If NLB is running in Unicast Mode, everything works fine. The reason is that in Unicast NLB, the node sends ARP Requests with a Unicast Sender IP Address and a Unicast Sender MAC Address; the router will respond to ARP Requests such as these.
These destination MAC addresses are allowed through the transparent firewall. Any MAC address not on this list is dropped.
When a router receives the ARP Request from the NLB node, it has to send the ARP Response. In the ARP Response, the Destination MAC address field will be replaced with a Multicast MAC address, which is not in the above Allowed MAC addresses list. This explains why the router drops the packet.
In the following scenario, the client is on subnet 172.32.x.x/24 and the Windows Server 2003 NLB node is on subnet 172.33.x.x/24:
Analysis of the network capture will show that the NLB Node sends an ARP Request to the IP address of the Default Gateway and the router sends the ARP response.
In the ARP request, we see the Sender IP address as 220.127.116.11 (Dedicated IP), Sender MAC Address as 01-0A-0B-0C-0D(Interface MAC) , Target IP Address as 18.104.22.168 (Gateway), and the Target MAC Address as 00-00-00-00-00-00. See the below ARP packet:
Arp: Request, 172.33.X.50 asks for 172.33.X.1
Since Windows Server 2003 sends the ARP Request with a Unicast Sender IP Address and Unicast Sender MAC (Interface MAC) Address, the router sends an ARP response.
The functionality of NLB is the same on Windows Server 2008 as it is on Windows Server 2003. TCP/IP functionality has been changed in Windows Server 2008.
In Windows Server 2003, assume that we have the Virtual IP address and Dedicated Primary IP on the interface. Whenever you try to ping the Virtual IP address from a client, the Windows Server 2003 NLB node sends out an ARP request to the Default Gateway IP address. This ARP Request always goes from the Primary IP Address, which is a dedicated IP address with a Unicast MAC (Interface MAC) Address.
On Windows Server 2008 NLB Nodes operating in Multicast Mode, the ARP request to the Default Gateway IP Address goes from the Virtual IP Address with a Multicast MAC Address as the Sender's MAC Address and the Router (Gateway Device) never responds if the ARP request contains a Multicast MAC Address in the Sender's MAC Address field.
We can add a static ARP table entry for the Default Gateway IP address on the NLB Node.
Command to add static ARP entry
Arp –s <IP address> <Mac Address>
Note: Microsoft is aware of this issue and we will keep you posted regarding its status.
For more information regarding use of the ARP command in Windows, see the following:
- Saravanan N
203 Microsoft Team blogs searched, 88 blogs have new articles in the past 7 days. 199 new articles found
Thanks for a good article - this helpt me solve an Exchange 2007 Client Access/Hub transport NLB cluster on W2k8.
I have two tips for others having this problem:
- If using redundant HSRP routers - both routers dedicated addresses as well as the virtual IP should be in the static ARP table.
- Use "netsh int ipv4 set neighbors "NICNAME" x.x.x.x xx-xx-xx-xx-xx-xx" instead of the ARP command. This way the entries remain after reboot without using "Scheduled Task" and other workarounds to repopulate the ARP cache.
With best regards
/ Stefan Alkman, Kontract
The sender MAC listed in your NLB 2003 example is incorrect. 01-0A-0B-0C-0D is also a multicast MAC.
This is exactly the problem we've been banging our heads over for the past two months. After digging deeper we noticed that the router was not responding to ARP requests sent from a unicast IP and multicast MAC address, but assumed it was a router bug because we never had the chance to set up a Windows Server 2003 test cluster to see if the problem was there -- that was next on the list.
Our workaround was to ping the default gateway once a minute using a scheduled task from each host to keep the ARP cache current. We would have went the static ARP entry route, but redundant routers mean the MAC address could change in the future without warning.
Can I please ask for a clarification?
in the 2008 part of this post, you say:
SendersMacAddress: 03-bf-ac-20-10-64 <-- Multicast MAC address.
To me multicast MAC are in the range:
0100.5E00.0000 to 0100.5EFE.FFFF
Why do you say 03-bf-ac-20-10-64 is a Multicast MAC?
I am facing a similar problem, but in unicast mode. Cluste is configured in unicast mode with 2 NIC. I am not able to react the cluster IP from client subnet. You can see detailed explaination aout the issue I am facing at follwoing link.
I'm glad this was posted, I've been trying to track down this issue for a while now.
This should be posted in the TechNet NLB Step-by-Step guide as a notation.
I would add that can be necesary to add the static entry by using netsh. Trying with arp -s I get this error "The ARP entry addition failed: 5"
netsh interface ipv4 set neighbors "Local Area Connection" "<ip-Address>" <Mac-Address>"
i am running MWS2003R2EE SP2 with XenServer 5 NLB 2 node cluster with VIP for 2 WI Servers and having the same problem. when you say "TCP/IP functionality has been changed in Windows Server 2008." is this true for my version of server 2k3?
I am seeing a problem with my 2 CAS as an NLB in Windows 2008 with two NICS config in Unicast mode. The problem is I can access the NLB cluster name for OWA internally. Externally/Internet I can't access OWA though I have NATed the public IP to the VIP of NLB.
I have Nated the dedicated IP of one of the CAS and I have no issues. Why doesn't VIP of NLB not NATing or responding correctly?
Good writeup. I've been troubleshooting this for the past couple of weeks, and this supports all of my findings so far.
Was this included in 2008 SP2?
When I try to install the hotfix on an SP2 server, I get "Update does not apply to your system"
I still cannot ping the 2nd LAN adapter with my NLB on it. I can ping from inside the server...but that is it
Answer: This issue has been resolved and here is the KB