Unable to connect to Windows Server 2008 NLB Virtual IP Address from hosts in different subnets when NLB is in Multicast Mode

Unable to connect to Windows Server 2008 NLB Virtual IP Address from hosts in different subnets when NLB is in Multicast Mode

  • Comments 18
  • Likes

If you work with NLB in Multicast mode on Windows Server 2008, this is a must read for you, as you may run into this issue.

Symptoms: You cannot connect to the NLB Virtual IP Address when NLB is running in Multicast Mode on Windows Server 2008 from machines on the other subnets.  You will see “Request Timed Out” errors when you ping the VIP Address from a client on different subnet.

Steps to reproduce the issue

In the following scenario, the client is on subnet 172.32.x.x/24 and the Windows Server 2008 NLB node is on subnet 172.33.x.x/24:

image

Note:  A static ARP entry is in place on the router, which maps the Virtual IP Address and Multicast Virtual MAC Address (which we generally do as most routers do not support ARP requests for a Unicast IP Address to a Multicast MAC Address).

  1. Clear the Arp cache on the NLB node using the command arp –d *
  2. Try to ping the NLB VIP from the client machine; this ping will fail.
  3. Try to ping the dedicated IP address from the client machine; this works fine.
  4. Again try to ping the NLB VIP from the client machine; this ping works fine.
  5. Clear the ARP cache on the NLB node and try to ping the NLB; the ping will fail. This proves that we have some ARP issue on the NLB node.
  6. Clear the ARP cache again on the NLB node as in step 1.
  7. Start a Network Monitor network capture on the NLB node.
  8. Try to ping the NLB VIP; the ping request will fail.
  9. Stop the network capture.

Analysis of the network capture will show that the NLB Node sends an ARP Request to the IP address of the Default Gateway, but there is no response.

In the ARP request, we see the Sender IP address as 172.33.X.100(VIP), Sender MAC address as 03-bf-ac-20-10-64(Cluster Multicast MAC), Target Address as 172.33.X.1(default gateway) and Target MAC as 00-00-00-00-00-00. See the below ARP packet:

 

- Arp: Request, 172.33.X.100 asks for 172.33.X.1
    HardwareType: Ethernet
    ProtocolType: Internet IP (IPv4)
    HardwareAddressLen: 6 (0x6)
    ProtocolAddressLen: 4 (0x4)
    OpCode: Request, 1(0x1)
    SendersMacAddress: 03-bf-ac-20-10-64   <-- Multicast MAC address
    SendersIp4Address: 172.33.X.100
    TargetMacAddress: 00-00-00-00-00-00
    TargetIp4Address: 172.33.X.1

The Router did not respond to the above ARP request, so the ARP resolution to Gateway IP fails, causing the ping to the Virtual IP address from the client to fail.

Why does the router not respond to the ARP request?

The above ARP Request Packet indicates that the Sender’s IP Address is Unicast and Sender's MAC Address is Multicast.  Most  routers do not respond to ARP Requests with Unicast Sender IP and Multicast Sender MAC (Multicast Cluster MAC).  So, the NLB node does not get a response to the ARP request and the ping from the NLB Virtual IP Address fails as the NLB node fails to resolve the MAC address of the Gateway.

If NLB is running in Unicast Mode, everything works fine.  The reason is that in Unicast NLB, the node sends ARP Requests with a Unicast Sender IP Address and a Unicast Sender MAC Address; the router will respond to ARP Requests such as these.

Allowed MAC Addresses (from Cisco’s website)

These destination MAC addresses are allowed through the transparent firewall. Any MAC address not on this list is dropped.

  • TRUE broadcast destination MAC address equal to FFFF.FFFF.FFFF
  • IPv4 multicast MAC addresses from 0100.5E00.0000 to 0100.5EFE.FFFF
  • IPv6 multicast MAC addresses from 3333.0000.0000 to 3333.FFFF.FFFF
  • BPDU multicast address equal to 0100.0CCC.CCCD
  • AppleTalk multicast MAC addresses from 0900.0700.0000 to 0900.07FF.FFFF

When a router receives the ARP Request from the NLB node, it has to send the ARP Response. In the ARP Response, the Destination MAC address field will be replaced with a Multicast MAC address, which is not in the above Allowed MAC addresses list. This explains why the router drops the packet.

Why does Multicast NLB work fine on Windows Server 2003?

In the following scenario, the client is on subnet 172.32.x.x/24 and the Windows Server 2003 NLB node is on subnet 172.33.x.x/24:

image

Steps to see the scenario succeed when Windows Server 2003 is used on the NLB node

  1. Clear the Arp cache on the NLB node using the command arp –d *
  2. Try to ping the NLB VIP from the client machine; this ping will succeed.
  3. Clear the ARP cache on the NLB node as in step 1.  This ensures that the network trace to be captured in the next steps will show ARP communication.
  4. Start a Network Monitor network capture on the NLB node.
  5. Try to ping the NLB VIP; the ping request will succeed.
  6. Stop the network capture.

Analysis of the network capture will show that the NLB Node sends an ARP Request to the IP address of the Default Gateway and the router sends the ARP response.

In the ARP request, we see the Sender IP address as 172.32.16.50 (Dedicated IP), Sender MAC Address as 01-0A-0B-0C-0D(Interface MAC) , Target IP Address as 172.32.16.1 (Gateway), and the Target MAC Address as 00-00-00-00-00-00.  See the below ARP packet:

 

Arp: Request, 172.33.X.50 asks for 172.33.X.1 
    HardwareType: Ethernet
    ProtocolType: Internet IP (IPv4)
    HardwareAddressLen: 6 (0x6)
    ProtocolAddressLen: 4 (0x4)
    OpCode: Request, 1(0x1)
    SendersMacAddress: 01-0A-0B-0C-0D
    SendersIp4Address: 172.33.X.50
    TargetMacAddress: 00-00-00-00-00-00
    TargetIp4Address: 172.33.X.1

Since Windows Server 2003 sends the ARP Request with a Unicast Sender IP Address and Unicast Sender MAC (Interface MAC) Address, the router sends an ARP response.

What’s the difference in Windows Server 2003 and Windows Server 2008 NLB?

The functionality of NLB is the same on Windows Server 2008 as it is on Windows Server 2003.  TCP/IP functionality has been changed in Windows Server 2008.

In Windows Server 2003, assume that we have the Virtual IP address and Dedicated Primary IP on the interface.  Whenever you try to ping the Virtual IP address from a client, the Windows Server 2003 NLB node sends out an ARP request to the Default Gateway IP address.  This ARP Request always goes from the Primary IP Address, which is a dedicated IP address with a Unicast MAC (Interface MAC) Address.

On Windows Server 2008 NLB Nodes operating in Multicast Mode, the ARP request to the Default Gateway IP Address goes from the Virtual IP Address with a Multicast MAC Address as the Sender's MAC Address and the Router (Gateway Device) never responds if the ARP request contains a Multicast MAC Address in the Sender's MAC Address field.

What is the workaround to resolve this issue?

We can add a static ARP table entry for the Default Gateway IP address on the NLB Node.

Command to add static ARP entry

Arp –s <IP address> <Mac Address>

Note: Microsoft is aware of this issue and we will keep you posted regarding its status.

For more information regarding use of the ARP command in Windows, see the following:

http://technet.microsoft.com/en-us/library/bb490864.aspx

- Saravanan N

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • 203 Microsoft Team blogs searched, 88 blogs have new articles in the past 7 days. 199 new articles found

  • Hi,

    Thanks for a good article - this helpt me solve an Exchange 2007 Client Access/Hub transport NLB cluster on W2k8.

    I have two tips for others having this problem:

    - If using redundant HSRP routers - both routers dedicated addresses as well as the virtual IP should be in the static ARP table.

    - Use "netsh int ipv4 set neighbors "NICNAME" x.x.x.x xx-xx-xx-xx-xx-xx" instead of the ARP command. This way the entries remain after reboot without using "Scheduled Task" and other workarounds to repopulate the ARP cache.

    With best regards

    / Stefan Alkman, Kontract

  • The sender MAC listed in your NLB 2003 example is incorrect.  01-0A-0B-0C-0D is also a multicast MAC.

  • This is exactly the problem we've been banging our heads over for the past two months. After digging deeper we noticed that the router was not responding to ARP requests sent from a unicast IP and multicast MAC address, but assumed it was a router bug because we never had the chance to set up a Windows Server 2003 test cluster to see if the problem was there -- that was next on the list.

    Our workaround was to ping the default gateway once a minute using a scheduled task from each host to keep the ARP cache current. We would have went the static ARP entry route, but redundant routers mean the MAC address could change in the future without warning.

  • Hi,

    Can I please ask for a clarification?

    in the 2008 part of this post, you say:

    ---

    SendersMacAddress: 03-bf-ac-20-10-64   <-- Multicast MAC address.

    ---

    To me multicast MAC are in the range:

    0100.5E00.0000 to 0100.5EFE.FFFF

    Why do you say 03-bf-ac-20-10-64 is a Multicast MAC?

  • Hi,

    I am facing a similar problem, but in unicast mode. Cluste is configured in unicast mode with 2 NIC. I am not able to react the cluster IP from client subnet. You can see detailed explaination aout the issue I am facing at follwoing link.

    http://social.technet.microsoft.com/Forums/en-US/winserverPN/thread/373eb373-56ba-4d0c-9e32-4b67a8a1e894/?ffpr=0

    Any suggestion?

    Thanks!

  • I'm glad this was posted, I've been trying to track down this issue for a while now.  

    This should be posted in the TechNet NLB Step-by-Step guide as a notation.  

  • I would add that can be necesary to add the static entry by using netsh. Trying with arp -s I get  this error "The ARP entry addition failed: 5"

    netsh interface ipv4 set neighbors "Local Area Connection" "<ip-Address>" <Mac-Address>"

  • i am running MWS2003R2EE SP2 with XenServer 5 NLB 2 node cluster with VIP for 2 WI Servers and having the same problem.  when you say "TCP/IP functionality has been changed in Windows Server 2008." is this true for my version of server 2k3?

  • I am seeing a problem with my 2 CAS as an NLB in Windows 2008 with two NICS config in Unicast mode.  The problem is I can access the NLB cluster name for OWA internally. Externally/Internet I can't access OWA though I have NATed the public IP to the VIP of NLB.  

    I have Nated the dedicated IP of one of the CAS and I have no issues.  Why doesn't VIP of NLB not NATing or responding correctly?

    Any suggestions

  • Fixed with:

    http://support.microsoft.com/kb/960916

  • Good writeup. I've been troubleshooting this for the past couple of weeks, and this supports all of my findings so far.

  • Was this included in 2008 SP2?

    When I try to install the hotfix on an SP2 server, I get "Update does not apply to your system"

  • I still cannot ping the 2nd LAN adapter with my NLB on it.  I can ping from inside the server...but that is it

  • Question

    ======

    Hi,

    Thanks for a good article - this helpt me solve an Exchange 2007 Client Access/Hub transport NLB cluster on W2k8.

    I have two tips for others having this problem:

    - If using redundant HSRP routers - both routers dedicated addresses as well as the virtual IP should be in the static ARP table.

    - Use "netsh int ipv4 set neighbors "NICNAME" x.x.x.x xx-xx-xx-xx-xx-xx" instead of the ARP command. This way the entries remain after reboot without using "Scheduled Task" and other workarounds to repopulate the ARP cache.

    With best regards

    / Stefan Alkman, Kontract

    Answer: This issue has been resolved and here is the KB

    http://support.microsoft.com/kb/960916.