In a previous blog post I outlined a failure behavior of the file share witness resource on Windows 2008 clusters.
Ultimately there could exist a condition where the cluster enters a lost quorum state due to the File Share Witness resource being in a FAILED status even though the witness directory was fully available.
In Windows 2008 R2 a design change was made. When the File Share Witness host server becomes unavailable, the File Share Witness resource will still fail in cluster and cause the Cluster Core Resources to move between nodes. In this case assuming the File Share Witness host server is still not available, the resource remains in a failed state. If it becomes necessary to utilize the File Share Witness to maintain Quorum, and the witness resource is in a failed state, cluster will attempt to online the witness resource. If the online is successful the witness share is alive and accessible – quorum is maintained. If the online is not successful, the witness share is not alive and accessible – a lost quorum condition is encountered.
A hotfix has been released (KB978790) that back ports this change from Windows 2008 R2 to Windows 2008.
You can download this hotfix by navigating to http://support.microsoft.com and requesting the knowledge base article 978790. There is a link to request and download this fix in the upper left hand corner. (Note: When downloading the fix the package is marked for the product Windows Vista, but this is actually for Windows 2008).
This hotfix is applicable to Exchange 2007 and Exchange 2010 when utilized with Windows 2008.
Very Nice Change. Instead of hotfix, this should be an option in R2 (by default enabled)
This is the default behavior in R2 without a hotfix.
What if this still happens, we have windows server 2008 R2 sp1. and FSW failed and the server holding the quorom failed later. no other server was able to form a cluster, because it complains no witness resource is online to maintain the cluster.
is there a hotfix for it yet? or we need to make sure we have two cas/hub instead of just one...?
appreciate your help
In terms of this particular issue if you have the fix you should not have it.
Based on your description I believe you've got potentially a different issue.
There can only be one witness at a time. The alternate witness has no effect on a running cluster.
I still see this warning and error on windows 2008 R2 server. Any fix for this?
This fix does not correct an error. It corrects that invalid detection of a lost quorum condition.
Does this means that in the event of the Hub Transport server is rebooted during the patching cycle, the Exchange Server CCR cluster can only last for less than than one hour before Split-Brain Syndrome occurs ?
@Server Support specialist...
No - this has nothing to do with splint brain. Prior to the hotfix you could inadvertently enter into a lost quorum condition. WIth this hotfix, as long as the witness server is actually online, regardless of the state of the witness resource in cluster, a lost quorum condition will be avoided.