VIRTUALBOY BLOG
As hardware increases in scale, and new capabilities, such as Dynamic Memory, are introduced into Hyper-V R2 SP1, more and more customers are going to start to encroach on the supported limits of Hyper-V cluster nodes. As of May 2010, those supported limits stood at 64 VMs per cluster node, up to a total of 15+1 nodes, giving a total of 960 VMs. This contrasts considerably with the 384 VMs per non-clustered host, yet will still be more than enough headroom for most customers, however, in a recent announcement at TechEd 2010, we’ve decided to increase the limits on the cluster nodes. The increase is actually pretty considerable too, helping customers to scale to much greater levels, especially on smaller clusters, assuming they have resource in their underlying hardware!
So, in a nutshell, we now support 1000 VMs per cluster, providing you don’t exceed the 384 VMs per node limit, which which will still be enforced. In tabular form:
Number of Nodes in Cluster
Max Number of VMs per Node
Max # VMs in Cluster
2 Nodes (1 active + 1 failover)
384
3 Nodes (2 active + 1 failover)
768
4 Nodes (3 active + 1 failover)
333
1000
5 Nodes (4 active + 1 failover)
250
6 Nodes (5 active + 1 failover)
200
7 Nodes (6 active + 1 failover)
166
8 Nodes (7 active + 1 failover)
142
9 Nodes (8 active + 1 failover)
125
10 Nodes (9 active + 1 failover)
111
11 Nodes (10 active + 1 failover)
100
12 Nodes (11 active + 1 failover)
90
13 Nodes (12 active + 1 failover)
83
14 Nodes (13 active + 1 failover)
76
15 Nodes (14 active + 1 failover)
71
16 Nodes (15 active + 1 failover)
66
and from TechNet:
Component
Maximum
Notes
Nodes per cluster
16
Consider the number of nodes you want to reserve for failover, as well as maintenance tasks such as applying updates. We recommend that you plan for enough resources to allow for 1 node to be reserved for failover, which means it remains idle until another node is failed over to it. (This is sometimes referred to as a passive node.) You can increase this number if you want to reserve additional nodes. There is no recommended ratio or multiplier of reserved nodes to active nodes; the only specific requirement is that the total number of nodes in a cluster cannot exceed the maximum of 16.
Running virtual machines per cluster and per node
1,000 per cluster, with a maximum of 384 on any one node
Several factors can affect the real number of virtual machines that can be run at the same time on one node, such as:
· Amount of physical memory being used by each virtual machine.
· Networking and storage bandwidth.
· Number of disk spindles, which affects disk I/O performance.
Obviously many of you will look at that and say “We don’t leave 1 node free for ‘failover’'” whereas some of you will always do this, to ensure there’s enough resource for failing over VMs in the event of an issue. Now, I’m not going to say that you absolutely have to have a +1 node, but it is best practice nonetheless and something that should be considered in mission-critical deployments. So, looking at the table, even on a 4 node cluster (3+1), you can hit the big 1000, which shows huge scalability and consolidation. If you went from 1000 servers, down to 4, that would be a % saving of over 99% (assuming my aging maths is correct there). I’m going to say something now, and you should listen carefully.
Just because you can, doesn’t mean you should.
If you’re going to run that many eggs, on so few baskets, you’re going to have to ensure that the underlying infrastructure is rock solid and extremely well capacity planned/architected. From networking requirements (a LOT of NICs would be needed in those hosts I imagine!) through to storage (how much I/O!?), and memory (DM will help!) through to CPU (8-12 core will help!), every little decision could be amplified up to 333 times, so you have to nail it with detailed and thorough planning and comprehensive testing,
Perhaps an area where you’re more likely to hit this limit, is when virtualising desktops, rather than servers. In most organisations, the number of desktops typically outweighs the number of servers, so hitting the previous limits was much more achievable, so this gives the organisation who happened to be creeping closer, a bit of breathing room.