Tim McMichael

Navigating the world of high availability...and occasionally sticking my head in the cloud...

Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes.

Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes.

  • Comments 32
  • Likes

Although Exchange 2010 no longer deploys a cluster resource model we still use Windows Failover Clustering service for certain functions.

When a Windows 2008 / 2008 R2 cluster is created, the cluster core resources are groups together in the ‘Cluster Group’.  THe Cluster Group is a hidden group that contains the following resources:

  • Cluster Name:  This is the cluster name object (CNO).  Exchange 2010 uses the name of the DAG to create this resource.  The name of the DAG is always the name of the cluster and the CNO.
  • Cluster IPv4 Addresses:  These are the IPv4 addresses that are associated with the DAG.  If the members of the DAG span multiple subnets, there will be multiple IPv4 resources.
  • File Share Witness:  This is the quorum resource that is created using the witness server and witness directory settings of the DAG.  This resource should only be present when there is an even number of DAG members.

You can see the cluster core resources in failover cluster manager by selecting the cluster name in the upper left hand pane.  In the center pane, expand the cluster core resources section.

image

The cluster core resource group can also be seen using cluster.exe (or in Windows 2008 R2 cluster powershell extensions).

Windows 2008 / Windows 2008 R2:  Cluster.exe DAG.company.com group

cluster.exe dag.company.com group
Listing status for all available resource groups:

Group                Node            Status
-------------------- --------------- ------
Cluster Group        DAG-1           Online
Available Storage    DAG-1           Offline

Windows 2008 R2:  Get-ClusterGroup –Cluster DAG.company.com

PS C:\Users\Administrator> Get-ClusterGroup -Cluster DAG.company.com

Name                   OwnerNode        State
----                   ---------        -----
Cluster Group          dag-1           Online
Available Storage      dag-1          Offline

From an Exchange 2010 perspective you do not really need to manage the cluster core resources.  As members join and depart the cluster this resource group will be automatically moved to a remaining member.  Each member of the DAG should have the ability to arbitrate and fully bring online the cluster core resources.

When a cluster is created in Windows 2008 or Windows 2008 R2, the cluster service enumerates all network ports found on the nodes.  These network ports are then combined into cluster networks.  You can view the cluster networks in failover cluster manager by expanding the cluster name and expanding networks.

image

You can also view the cluster networks using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network

cluster.exe dag.company.com network
Listing status for all available networks:

Network                                  Status
---------------------------------------- -----------
Cluster Network 2                        Up
Cluster Network 4                        Up
Cluster Network 1                        Up

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com

Get-ClusterNetwork -Cluster DAG.company.com

Name                                State
----                                -----
Cluster Network 1                   Up
Cluster Network 2                   Up
Cluster Network 4                   Up

A cluster network has three settings:

  • Do not allow cluster network communications on this network
  • Allow cluster network communications on this network
    • Allow clients to connect through this network

You can see these settings in failover cluster manager by getting the properties of a cluster network.

image

You can also view the network role either by using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network "Cluster Network 1” /prop

cluster dag.company.com network "Cluster Network 1" /prop

Listing properties for 'Cluster Network 1':

T  Network              Name                           Value
-- -------------------- ------------------------------ -----------
SR Cluster Network 1    Name                           Cluster Network 1
MR Cluster Network 1    IPv6Addresses
MR Cluster Network 1    IPv6PrefixLengths
MR Cluster Network 1    IPv4Addresses                  10.0.0.0
MR Cluster Network 1    IPv4PrefixLengths              24
SR Cluster Network 1    Address                        10.0.0.0
SR Cluster Network 1    AddressMask                    255.255.255.0
S  Cluster Network 1    Description
D  Cluster Network 1    Role                           3 (0x3)
D  Cluster Network 1    Metric                         1200 (0x4b0)
D  Cluster Network 1    AutoMetric                     1 (0x1)

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com | fl name,role

Get-ClusterNetwork -Cluster DAG-1.company.com | fl name,role

Name : Cluster Network 1
Role : 3

Name : Cluster Network 2
Role : 1

Name : Cluster Network 4
Role : 1

The role of the networks can also be viewed in the registry of each node.  This information is located at:  HKEY_LOCAL_MACHINE\Cluster\Networks.  Each cluster network is represented by a subkey which is the GUID of the network.  Expanding the GUID, you will see sub-values including Name and Role.

[HKEY_LOCAL_MACHINE\Cluster\Networks\2cd2b920-0a2a-4851-bb24-de02d4a70b7e]
@="class mscs::TmNetworkInfo"
"Id"="2cd2b920-0a2a-4851-bb24-de02d4a70b7e"
"Name"="Cluster Network 2"
"Signature"="NETW"
"Description"=""
"Role"=dword:00000001
"Priority"=dword:ffffffff
"Transport"="TCP/IP"
"Ignore"=dword:00000000
"Address"="192.168.0.0"
"AddressMask"="255.255.255.0"
"IPv6Address"=""
"State"=dword:00000003
"Metric"=dword:0000044c
"AutoMetric"=dword:00000001

The role value can contain three different values depending on the cluster network settings.  The values are:

  • 0:  Do not allow cluster network communications on this network
  • 1:  Allow cluster network communications on this network
  • 3:  Allow clients to connect through this network

In order for an IPv4 resource to be brought online it must be associated with a network that  is configured to “Allow cluster network communications on this network” and to “Allow clients to connect through this network”.  If for any reason the “Allow clients to connect through this network” option is not enabled, the IPv4 resource associated with that network will not be able to be brought online.

On an Exchange 2010 DAG member, when attempting to move the cluster core resources to another DAG member the resources may fail to come online.  Specifically the IPv4 resource fails to come online which results in the network name resource failing to come online (due to dependency).

If using Failover Cluster Manager and attempting to bring online the IPv4 resource in the cluster core resources group, the following pop up error is displayed:

image

A review of the system log shows event 1223:

Log Name:      System

Source:        Microsoft-Windows-FailoverClustering

Date:          5/10/2010 1:14:42 PM

Event ID:      1223

Task Category: IP Address Resource

Level:         Error

Keywords:     

User:          SYSTEM

Computer:     dagNode.company.com

Description:

Cluster IP address resource 'IPv4 Static Address 2 (Cluster Group)' cannot be brought online because the cluster network 'Cluster Network 2' is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.

This Event 1223, described above, indicates that the effective setting for Cluster Network 2 is “Allow cluster network communications on this network” but does not have “Allow clients to connect through this network” set.  However, when reviewing the settings in failover cluster manager for Cluster Network 2 you might see that both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled. 

The Microsoft Exchange Replication Service is responsible for assisting to maintain the cluster network configuration.  There is an issue in the current Replication Service where settings are not changed.  This essentially causes a difference between the setting inside the cluster and the setting displayed in Failover Cluster Management tools.

Workaround:

A quick and easy workaround for this issue is to simply reset the state of the network.  There are multiple ways to accomplish this and I will outline each below.  Step zero before proceeding with any other steps is to note the cluster network that is displayed in the above event since that is the network that will need to be reset (in this example Cluster Network 2). 

Windows 2008 / Windows 2008 R2 – Using Failover Cluster Management Tool

The network state can be reset using Failover Cluster Manager

  • Launch Failover Cluster Management
  • Expand the cluster \ networks.

image

  • Get the properties of the cluster network in question.
  • Uncheck the box to “Allow clients to connect through this network”.

image

  • Press <apply> - you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.
  • The network is disabled for “Allow clients to connect through this network”. 

Next we need to enable the network for “Allow clients to connect through this network”.

  • Get the properties of the cluster network.
  • Check the box to “Allow clients to connect through this network”.

image

  • Press <apply> – you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.

The network has been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 / Windows 2008 R2:  Using cluster.exe

  • Launch a command prompt with administrative privileges.
  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=1

  • The network is disabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=3

  • The network is enabled for “Allow clients to connect through this network”.  At this time we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 R2:  Using powershell

  • Launch powershell with administrative privileges.
  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=1}

  • The network is disabled for “Allow clients to connect through this network”. 

Next, enable the network for “Allow clients to connect through this network”.

  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=3}

  • The network is enabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

 

LONG TERM FIX

This issue will be fixed in Exchange 2010 Service Pack 1.  The issue will not be fixed in Exchange 2010 RTM.

==========================================

Updated – 6/2/2010

Updated to list Exchange 2010 SP1 confirmed to contain fix. 

==========================================

Comments
  • Can anyone confirm this has 100% fixed in e2010sp1?

  • This is not fixed in SP1. I have tested this myself.

  • @Gaz:

    Can you provide some more information for me.  This is not something that you can reproduce.  If you manually change the cluster settings you can force this issue to occur but it is not covered under the fix described in this blog post.  Are you saying that you have experienced this issue on an SP1 DAG?

    TIMMCMIC

  • Tim,

    This is still a problem with Service pack 1.  Today I had to go to the failover cluster manager and remove the check, click apply, then add the check and click apply and finally I can bring the DAG online to both ping it and use Backup Exec to select the DAG.  

    McCue

  • @McCue

    Thanks for posting.  Can you confirm whether or not the issue was present prior to upgrading to SP1?

    TIMMCMIC

  • @Tim

    Unfortunately no, I had already installed EX2010-SP1 when I installed BE2010 which is to be my last step before moving real mailboxes to the EX2010 server.

    McCue

  • This is an issue before SP1 as well.  I have Exchange 2010 and BE2010 R2, and cluster was offline after a reboot.

  • To set an IP address for the DAG, use the following exchange shell command:

    Set-DatabaseAvailabilityGroup -identity DAGGroupName -databaseavailabilitygroupipaddress 192.168.x.x

    Confirm with Get-DatabaseAvailabilityGroup -Idenity DagGroupName |fl

  • @Joe:

    This is correct but you should not have to reset the IP address to correct the issue outlined in this blog.

    TIMMCMIC

  • Is this related to BE2010? I had this same issue with Exchange 2010 (not SP1) but not until after BE2010 was installed.

    Thanks for the fix !

  • @Phil:

    I am not aware of what BE2010 is.

    TIMMCMIC

  • @Tim

    I'm pretty sure BE2010 is Backup Exec 2010

  • With Exchange 2010 SP1 UR0 this is not fixed.

  • If the issue existed prior to upgrading then you will have to follow the workaround.  SP1 will prevent the issue from reoccuring.

    TIMMCMIC

  • I have run the fix a number of times after installing SP1 for Exchange 2010 and one of my two DAG addresses are reporting as offline. Is there any other fix available?

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment