<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.technet.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Rob's SQL Server Blog : Clustering</title><link>http://blogs.technet.com/rob/archive/tags/Clustering/default.aspx</link><description>Tags: Clustering</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>SQL Server 2005 Multi-Site Clustering with Windows Server 2008</title><link>http://blogs.technet.com/rob/archive/2009/03/15/sql-server-2005-multi-site-clustering-with-windows-server-2008.aspx</link><pubDate>Mon, 16 Mar 2009 01:12:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3213295</guid><dc:creator>robcarrol</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.technet.com/rob/comments/3213295.aspx</comments><wfw:commentRss>http://blogs.technet.com/rob/commentrss.aspx?PostID=3213295</wfw:commentRss><description>&lt;P&gt;I was working recently with a customer who was looking to deploy a SQL Server 2005 cluster across 2 geographically dispersed sites using Windows Server 2008. They were looking to utilise the new clustering improvements in Windows Server 2008 to build a highly available SQL Server solution. The customer required automatic failover between the sites in the event of a disaster, but their current solution required manual intervention by an administrator in order to failover to the disaster recovery site. Automatic failover would increase application availability, and reduce the complexity of the solution. Each site has it's own SAN storage and the customer planned to replicate data between each site using SRDF replication.&lt;/P&gt;
&lt;P&gt;&lt;BR&gt;This led me to do further research into clustering SQL Server in this type of environment. Windows Server 2008 introduces greater flexibility in the choice of Quorum configuration. The concept of quorum moves away from the requirement of a shared storage resource, but now refers to the number of votes needed to establish a majority. All nodes and a witness resource&amp;nbsp; can get a vote, which removes the disk as the single point of failure as in previous clustering models. The 4 Quorum Models available are:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Node and Disk Majority &lt;/LI&gt;
&lt;LI&gt;Disk Only &lt;/LI&gt;
&lt;LI&gt;Node Majority &lt;/LI&gt;
&lt;LI&gt;Node and File Share Majority &lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;As there is no shared storage between the nodes in a multi-site cluster, 2 of these Quorum models are suitable for multi-site clustering: Node Majority and Node and File Share Majority. Node and Disk Majority and Disk Only should only be used in a multi-site cluster if specifically directed by your storage vendor as your disk replication software needs to support these configurations.&lt;/P&gt;
&lt;P&gt;&lt;BR&gt;&lt;STRONG&gt;Node and File Share Majority:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This allows the creation of up to 16 nodes with no shared disks. A file share acts as a witness, meaning that a 2 node cluster will have have 3 votes, so connectivity can be lost by either one of the nodes or the witness and the cluster can still continue to function. &lt;/P&gt;
&lt;P&gt;A cluster quorum configured to use a node-and-file-share majority is a great solution for multi-site clusters. The file share witness can reside at a third site independent of either site hosting a cluster node for high disaster resilience. A single file server can serve as a witness to multiple clusters (with each cluster using a separate file share witness on the file server).&lt;/P&gt;
&lt;P&gt;&lt;A href="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_4.png" mce_href="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_4.png"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH: 0px; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" border=0 alt=image src="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_thumb_1.png" width=240 height=170 mce_src="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_thumb_1.png"&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;This configuration gives the highest resilience as the cluster can automatically recover from the loss of any one site without manual intervention.&lt;/P&gt;
&lt;P&gt;The File Share Witness (FSW) needs to be in the sane forest as the nodes and be running Windows Server 2003 or Windows Server 2008. For maximum resilience, it is best to locate the FSW at a 3rd site separate from the cluster nodes. The FSW does not need to be attached to shared storage and should NOT be a node in the same cluster.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;BR&gt;Node Majority:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A node-majority cluster consists of 3 or more nodes without shared storage. Each of the nodes has a vote and there is no shared disk vote. A majority of votes are necessary to operate the cluster if 2 nodes fail in a 3 node cluster, then the remaining node drops out of the cluster. An administrator can manually over-ride this and force the remaining node to start. When the other nodes come back, majority quorum is achieved again and the cluster comes back online seamlessly.&lt;/P&gt;
&lt;P&gt;This configuration works best with an odd number of cluster nodes as it is not enough to have half the cluster nodes functioning in this model. If four nodes were set up in a node-majority configuration, the cluster would continue to operate with the loss of one node but not with the loss of two nodes. You should use an odd number of nodes with Node Majority as 4 nodes can only survive 1 failure, which is the same as 3 nodes.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_2.png" mce_href="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_2.png"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH: 0px; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" border=0 alt=image src="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_thumb.png" width=240 height=133 mce_src="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_thumb.png"&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;The node-majority quorum configuration can work when there is more than one cluster node at each site. Consider a multi-site cluster consisting of five nodes, three of which reside at Site A and the remaining two at Site B. With a break in connectivity between the two sites, Site A can still communicate with three nodes (which is greater than 50 percent of the total), so all of the nodes at Site A stay up. The nodes in Site B are able to communicate with each other, but no one else. Since the two nodes at Site B cannot communicate with the majority, they drop out of cluster membership. (Were Site A is to go down in this case, in order to bring up the cluster at Site B, it would require manual intervention to override the non-majority.)&lt;/P&gt;
&lt;P&gt;&lt;A href="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_6.png" mce_href="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_6.png"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH: 0px; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" border=0 alt=image src="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_thumb_2.png" width=240 height=143 mce_src="http://blogs.technet.com/blogfiles/rob/WindowsLiveWriter/SQLServer2005GeoClusteringwithWindowsSer_14C48/image_thumb_2.png"&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;As a result, the Node Majority configuration does not give automatic failover between sites as nodes 4 and 5 cannot achieve quorum. In this situation, you would need to manually force a failover. &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;SQL Server Networking Considerations:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Windows Server 2008 now allows nodes in the same cluster to reside in different network subnets and communicate across network routers. &lt;STRONG&gt;However, be aware that SQL Server 2005 and 2008 still require all cluster nodes to reside in the same network subnet, so you will still need to set up virtual local area networks (VLANs) to connect geographically separated cluster nodes. &lt;/STRONG&gt;This can have some benefits with regard to client response times though, as DNS replication may impact client re-connection times in the event of a failover from one site to another. VLAN's allow DNS names to stay the same, so can increase availability.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Storage Considerations:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;As there is no shared storage between the nodes in a multi-site cluster, the main consideration is how to keep the data replicated between the sites. The choice of 3rd-party replication solution is important and can have a major effect on how you deploy your cluster. As such, you should work closely with your storage vendor from an early stage in the design process.&lt;/P&gt;
&lt;P&gt;Synchronous replication results in no data loss, but requires shorter distances between nodes and higher bandwidth to avoid write latency from impacting performance. Asynchronous allows you to stretch cluster nodes across longer distances, however there is a potential for data loss in the event of a failure. Asynchronous data replication also assumes a large enough network bandwidth to keep up with data changes and does not significantly impact application performance.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Conclusion:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;As data replication is key in a multi-site SQL Server cluster, work with your storage vendor from an early stage to ensure they support your cluster configuration. Multi-site clustering allows you to achieve high availability and disaster recovery, however it can be a costly and complex solution. You should evaluate your business requirements first and then decide on the best technology to meet these. It could be the case that Database Mirroring, for example, could be used to give you the required level of resilience across geographical sites.&lt;/P&gt;
&lt;P&gt;In this case, the customer chose to implement a 2-node, 3-site solution using the Node and File Share Majority quorum model, with a File Share Witness located in the 3rd site. This gives site-level resilience in the event of a disaster and also allows automatic failover between the cluster nodes without having to re-write client applications, meeting the business requirements.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Additional Resources:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.microsoft.com/downloads/details.aspx?familyid=75566F16-627D-4DD3-97CB-83909D3C722B&amp;amp;displaylang=en" target=_blank mce_href="http://www.microsoft.com/downloads/details.aspx?familyid=75566F16-627D-4DD3-97CB-83909D3C722B&amp;amp;displaylang=en"&gt;Windows Server 2008 Multi-Site Clustering Whitepaper&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&amp;amp;EventID=1032364834&amp;amp;CountryCode=US" target=_blank mce_href="http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&amp;amp;EventID=1032364834&amp;amp;CountryCode=US"&gt;TechNet Webcast: Geographically Dispersed Failover Clustering in Windows Server 2008&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&amp;amp;EventID=1032364842&amp;amp;CountryCode=US" target=_blank mce_href="http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&amp;amp;EventID=1032364842&amp;amp;CountryCode=US"&gt;TechNet Webcast: Failover Clustering and Quorum in Windows Server 2008 Enterprise&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.microsoft.com/default.aspx?scid=kb;en-us;953170&amp;amp;sd=rss&amp;amp;spid=2855" target=_blank mce_href="http://support.microsoft.com/default.aspx?scid=kb;en-us;953170&amp;amp;sd=rss&amp;amp;spid=2855"&gt;Support Webcast: Microsoft SQL Server 2005 Failover Clustering on Windows Server 2008&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://msdn.microsoft.com/en-us/library/ms179530(SQL.90).aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/ms179530(SQL.90).aspx"&gt;How to: Create a New SQL Server 2005 Failover Cluster (Setup)&lt;/A&gt;&lt;/P&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3213295" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/rob/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://blogs.technet.com/rob/archive/tags/Windows/default.aspx">Windows</category><category domain="http://blogs.technet.com/rob/archive/tags/High+Availability/default.aspx">High Availability</category><category domain="http://blogs.technet.com/rob/archive/tags/Clustering/default.aspx">Clustering</category><category domain="http://blogs.technet.com/rob/archive/tags/2005/default.aspx">2005</category><category domain="http://blogs.technet.com/rob/archive/tags/2008/default.aspx">2008</category></item><item><title>Support Webcast: Microsoft SQL Server 2005 Failover Clustering on Windows Server 2008</title><link>http://blogs.technet.com/rob/archive/2008/05/19/support-webcast-microsoft-sql-server-2005-failover-clustering-on-windows-server-2008.aspx</link><pubDate>Mon, 19 May 2008 22:53:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3057489</guid><dc:creator>robcarrol</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.technet.com/rob/comments/3057489.aspx</comments><wfw:commentRss>http://blogs.technet.com/rob/commentrss.aspx?PostID=3057489</wfw:commentRss><description>&lt;P&gt;A new Level 300 support webcast has been scheduled for Monday the 9th of June, 10:00 AM Pacific Time. The summary of the session is as follows:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;This Support WebCast focuses on how to plan, implement, and administer a Microsoft SQL Server 2005 failover cluster on Windows Server 2008. This session provides step-by-step instructions about how to install SQL Server 2005 clustered instance on a Windows Server 2008 cluster. It also discusses the options you can use to move SQL Server 2005 failover cluster from Windows Server 2003 to Windows Server 2008.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.microsoft.com/default.aspx?scid=kb;en-us;953170&amp;amp;sd=rss&amp;amp;spid=2855"&gt;http://support.microsoft.com/default.aspx?scid=kb;en-us;953170&amp;amp;sd=rss&amp;amp;spid=2855&lt;/A&gt;&lt;BR&gt;&lt;/P&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3057489" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/rob/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://blogs.technet.com/rob/archive/tags/Windows/default.aspx">Windows</category><category domain="http://blogs.technet.com/rob/archive/tags/Clustering/default.aspx">Clustering</category><category domain="http://blogs.technet.com/rob/archive/tags/2005/default.aspx">2005</category><category domain="http://blogs.technet.com/rob/archive/tags/2008/default.aspx">2008</category></item><item><title>Windows Failover Clustering Overview</title><link>http://blogs.technet.com/rob/archive/2008/05/07/failover-clustering.aspx</link><pubDate>Wed, 07 May 2008 19:07:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3051657</guid><dc:creator>robcarrol</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.technet.com/rob/comments/3051657.aspx</comments><wfw:commentRss>http://blogs.technet.com/rob/commentrss.aspx?PostID=3051657</wfw:commentRss><description>&lt;P&gt;The host node in the failover cluster performs a "looks alive" check every 5 seconds. An IsAlive check is performed every 60 seconds using SELECT @@SERVERNAME. If this fails the IsAlive retries 5 times and then attempts to reconnect to the instance of SQL. If all fail, then the SQL Server resource fails. Depending on the failover threshold, configuration of SQL resource, Windows Clustering will either attempt to restart on same node or failover to another available node.&lt;/P&gt;
&lt;P&gt;During failover, Windows Clustering starts the SQL Server service for that instance on the new node, and goes through the recovery process to start the databases.&amp;nbsp; After the service is started and the master database is online, the SQL Server resource is considered to be up. User databases will then go through the normal recovery process: any completed transactions in the t-log are rolled forward, and any incomplete transactions are rolled back. The length of the recovery process is dependent on how much activity must be rolled forward or rolled back upon startup.&lt;/P&gt;
&lt;P&gt;Set the recovery interval of the server to a low number to avoid long recovery times and to speed up the failover process. SQL Server generates automatic checkpoints based on the "recovery interval" setting. Long running transactions can lead to much longer restart times than specified in the recovery interval option.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Failover/Failback Strategies&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The cluster group containing SQL Server can be configured for automatic failback to the primary node when it becomes available again. By default, this is set to off.&lt;/P&gt;
&lt;P mce_keep="true"&gt;To Configure:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Right-click the group containing SQL Server in the cluster administrator, select 'properties' then 'failback' tab.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;To prevent an auto-failback, select 'Prevent Failback', to allow select 'Allow Failback' then one of the following options:&lt;/DIV&gt;&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Immediately: Not recommended as it can disrupt clients&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Failback between n and n1 hours: allows a controlled failback to a preferred node (if it's online) during a certain period.&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Configure Node Failover Preferences&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;When you use more than 2 nodes, it's important to consider which nodes should own resources in the event of a failover. For example, in an n+1 configuration, each SQL Server group should have the idle node second in the list of preferred owners. &lt;STRONG&gt;N.B. Do not use cluster admin to remove nodes from the resource definition. USe SQL Server setup for that functionality.&lt;BR&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;To Configure:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Right-click SQL Server group in the cluster administrator and select properties&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;On the&amp;nbsp;'General' tab, the preferred owners list box contains all cluster nodes that can potentially own resources in that group, and the current order in which they will failover&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Click 'Modify' to change this order&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Configure Thresholds for a Resource&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Right-click the cluster resource and then select 'Propereties'&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Click 'Advanced'&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Select 'Do not restart' if the cluster service should not attempt to restart. Restart is the default&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;If 'Restart' is selected:&lt;/DIV&gt;&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Affect the Group: uncheck to prevent a failure of the selected resource from causing the SQL Server group to failover&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Threshold: number of times the cluster service will attempt to restart the resource, and period is the amount of time in seconds between retries&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Do not modify the 'LooksAlive' and 'IsAlive' settings&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Unless necessary, do not alter the 'Pending Timeout'. This is the amount of time the resource is either in the online or pending or offline pending states before the the cluster service puts it in either offline or failed state&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Configure Thresholds for a Group&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Right-click the group containing the&amp;nbsp;SQL Server virtual server then click properties&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Click the failover tab&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;to configure the failover policy, in the threshold box enter the number of times the group is configured to failover within a set span of hours. In the period box, entrer the set span of hours&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Once the resource group reaches the set number of failovers, it will stay offline. However, other cluster resources, such as cluster IP, could be left online&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&lt;BR&gt;&lt;STRONG&gt;&lt;FONT size=3&gt;Cluster Resource Dependencies&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;TABLE class="" border=1&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class=""&gt;Resource&lt;/TD&gt;
&lt;TD class=""&gt;Dependency&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;SQL IP Address (Virtual Server Name)&lt;/TD&gt;
&lt;TD class=""&gt;NONE&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;SQL Network Name (Virtual Server Name)&lt;/TD&gt;
&lt;TD class=""&gt;SQL IP Address&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;SQL Server&lt;/TD&gt;
&lt;TD class=""&gt;Disk Resource(s),&lt;BR&gt;SQL Network Name&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;SQL Server Agent&lt;/TD&gt;
&lt;TD class=""&gt;SQL Server&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;SQL Server Full Text&lt;/TD&gt;
&lt;TD class=""&gt;Disk Resource(s)&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;Analysis Services (2005 only)&lt;/TD&gt;
&lt;TD class=""&gt;Disk Resource(s),&lt;BR&gt;Network Name&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size=3&gt;&lt;STRONG&gt;Cluster Heartbeat&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Cluster nodes use the "heartbeat" signal to check whether each node is alive at both the OS level and SQL Server level. The node hosting the SQL Server resources uses the Service Control Manager to check every 5 seconds whether the SQL Server service appears to be running. This "LooksAlive" check does not impact performance but does not perform a thorough check; the check will succeed if the service appears to be running even though it might not be operational. As a result, a deeper check must be performed; this "IsAlive" check runs every 60 seconds.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;IsAlive:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Runs every 60 seconds&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Runs an @@SERVERNAME T-SQL query against SQL Server to determine whether the server can respond to requests&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Does not gaurantee that all user databases are available or are performing within necessary performance/response-time requirements&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;If IsAlive Check fails:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Retried 5 times and then it attempts to reconnect to the instance of SQL Server&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;If all 5 retries fail, the server resource fails&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Depending on the failover threshold config, the failover cluster will either restart the resource on the same node or it will failover to another available node&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;The IsAlive query tolerates a few errors, but ultimately it fails if it's threshold is exceeded&lt;/P&gt;
&lt;P mce_keep="true"&gt;During failover of the SQL Server instance, SQL Server resources start up on the new node and SQL Server goes through the recovery process to restart the databases. After the service is started and the master database is alive, the SQL Server resource is considered to be up. User databases will go through the normal recovery process. Completed transactions in the transaction log are rolled forward (the Redo phase), incomplete transactions are rolled back (the Undo phase).&lt;/P&gt;
&lt;P mce_keep="true"&gt;In SQL Server 2005 Enterprise Edition, each user database is available to the user once the Redo phase is complete. For all other editions (and all 2000 editions), each user database is unavailable until the Undo phase completes. Length of recovery process depends on how much activity needs to be rolled forward or back upon startup. &lt;/P&gt;
&lt;P mce_keep="true"&gt;The 'recovery interval' sp_configure option of the server can be set to a low number to avoid longer Redo recovery times and to speed up the failover process. Undo recovery time can be reduced by using shorter transactions so that uncommitted transactions do not have much to roll back.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;BR&gt;&lt;FONT size=3&gt;&lt;STRONG&gt;Recommended Heartbeat Configurations&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Two or more independent networks must connect the nodes of the cluster to avoid a single point of failure&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Use of 2 LAN's is typical (MS PSS does NOT support the config of a cluster with nodes connected by only one network)&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;At least two of the cluster networks must be configured to support heartbeat communications between the cluster nodes to avoid a single point of failure&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;To do so, configure the roles of these networks as either "Internal Cluster Communications Only" or "All Communications" for the cluster service&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Typically, one of these networks is a PRIVATE INTERCONNECT dedicated to internal cluster communication.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Each cluster network must fail independently of all other cluster networks. &lt;/DIV&gt;&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;The cluster networks must not have a component in common that&amp;nbsp;can cause&amp;nbsp;both to fail simultaneously. &lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;The use of a multiport network adapter, for example to attach a node to two cluster networks would not satisfy this requirement in most cases as the ports are not independent&amp;nbsp;&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Remove all unnecesary network traffic from the network adapter that is set to INTERNAL CLUSTER COMMUNICATIONS ONLY (also known as the "heartbeat" or "private" network adapter, to eliminate possible communication issues&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Clustering communicates using Remote Procedure Calls (RPC) on IP sockets with User Datagram Protocol (UDP) packets&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&lt;A class="" href="http://support.microsoft.com/kb/258750" mce_href="http://support.microsoft.com/kb/258750"&gt;Recommended Configuration for Private Adaptor in Windows 2000 and Windows 2003&lt;/A&gt;&lt;/P&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3051657" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/rob/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://blogs.technet.com/rob/archive/tags/Windows/default.aspx">Windows</category><category domain="http://blogs.technet.com/rob/archive/tags/High+Availability/default.aspx">High Availability</category><category domain="http://blogs.technet.com/rob/archive/tags/Clustering/default.aspx">Clustering</category></item></channel></rss>