It is no secret that HPC Server 2008 will offer the option to make the head node of a HPC cluster highly available. This feature is not in beta 1, but it is being developed. It will exploit fail-over mechanisms provided by Server 2008 (enterprise edition or better), so I thought I'd mention some highlights in this area too.
High-availability clusters are difficult to set up and troubleshoot on several platforms. With Windows Server 2003 we made progress in simplifying them, but limitations are still significant:
These limitations may hamper adoption, especially in such environments - like HPC - where Windows has not been popular.
Windows Server 2008 introduces some significant improvements that address most of those issues:
1. Configuration validationA test tool is built into the product. It will analyze nodes and shared storage (if any) before they join a cluster. It can also be used as a troubleshooting tool, as long as the storage you want to analyze is offline. The tool will point out any issues with the hardware and the configuration that may make them unsuitable for a fail-over cluster. It will finally replace the cluster HCL. So, if the hardware passes validation, then the configuration is officially supported.
2. Simplified resource setupA wizard-driven process allows you to select which roles you want to cluster (e.g. file server, print server, virtual server), then sets up cluster resources and dependencies appropriately.
3. Improved SAN supportWindows server 2008 issues persistent reservations on shared storage to establish ownership of LUNs, it does not use bus resets any longer. Bus resets are disruptive on SANs where several systems on several platforms may share the same storage bus. This implies that the storage must support persistent reservations. Shared parallel SCSI is deprecated.
4. Changed quorum modelThe administrator can choose the most appropriate quorum model for the configuration. Several are possible:
5. Improved networkingThe nodes need no longer be on the same private subnet and the timeout of the “ping” among nodes is configurable. This makes it possible to route private traffic between locations and removes any a-priori restrictions on distance. Obviously, practical restrictions remain and will depend on how much the clustered applications and their users will tolerate.
These are just a few of the innovations available in server 2008 clusters. You may want to try them out for yourself by building a simple cluster on a set of virtual machines. You don’t need shared storage any longer, but if you want to try a quorum with witness disk, you can set up one of those machines as an iscsi target. Get startedIf you want to try Windows Server 2008 clusters, a virtual lab is available on Technet Events.There is also an excellent screencast by David Northey on http://edge.technet.com/Media/Windows-Server-2008-Failover-Clustering/ Official training will be launched shortly.