Nodes being removed from Failover Cluster membership on VMWare ESX?

Nodes being removed from Failover Cluster membership on VMWare ESX?

  • Comments 5
  • Likes

Welcome to the AskCore blog. Today, we are going to talk about nodes being removed from active Failover Cluster membership when the nodes are hosted on VMWare ESX. I have documented node membership problems in a previous blog:

Having a problem with nodes being removed from active Failover Cluster membership?
http://blogs.technet.com/b/askcore/archive/2012/02/08/having-a-problem-with-nodes-being-removed-from-active-failover-cluster-membership.aspx

This is a sample of the event you will see in the System Event Log in Event Viewer:

image

One specific problem that I have seen a few times lately is with the VMXNET3 adapters dropping inbound network packets because the inbound buffer is set too low to handle large amounts of traffic. We can easily find out if this is a problem by using Performance Monitor to look at the “Network Interface\Packets Received Discarded” counter.

image

Once you have added this counter, look at the Average, Minimum and Maximum numbers and if they are any value higher than zero, then the receive buffer needs to be adjusted up for the adapter. This problem is documented in VMWare’s Knowledge Base:

Large packet loss at the guest OS level on the VMXNET3 vNIC in ESXi 5.x / 4.x
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2039495

I hope that this post helps you!

Thanks,

James Burrage
Senior Support Escalation Engineer
Windows High Availability Group

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Nice tip, thanks for sharing James!

  • Hi there.

    I tried to figure out what are the recommended values but couldnt find it in the vmware docs.

    can you refer to these values?

    thanks

    Shimon

  • Hi there.

    I tried to figure out what are the recommended values but couldnt find it in the vmware docs.

    can you refer to these values?

    thanks

    Shimon

  • Wow, nice tip.  I'm seeing huge numbers for packet drops on the replication network.  The default is "not present" so what number should we start with?  Thanks.

  • Let me add my thanks too. We have been experiencing failovers that we could not explain. I think this is the smoking gun we have been looking for.