...building hybrid clouds that can support any device from anywhere
Contoso Labs Series - Table of Contents
In the prior post, we discussed how our major decision revolved around storage. The choices we made there would dictate the cost of the rest of the solution, so we had to make a decision there first, and carefully at that. Once we knew we'd be using Scale-Out File Servers and a converged fabric, we had to evaluate what that meant for our network.
Our first inclination was to get excited by all the possibilities and performance that the SOFS+SMB 3.0 stack could enable. How awesome is your life when you technologies like SMB Multichannel and SMB Direct (RDMA) at your disposal? Once our budget numbers came back, we realized we had to scale back our ambitions a little. While SMB Direct is amazing, it requires special RDMA-enabled 40GbE, 10GbE, or Infiniband cards. That presented a few serious problems.
We decided to do the math to see if we could get away with using the existing connectivity in our scenario. The compute nodes currently have four 1GbE ports each. Frankly, that's not very much. 4Gbit of throughput, split between management, storage, AND tenant traffic is almost criminally low. We had to debate long and hard about whether we could get away with it. After long deliberation, we decided that our users and workload could manage with this limited throughput, since there won't be heavy stress being generated during steady-state. Our worst case scenario is boot storms, which we think we can manage. However, we're acknowledging there's serious risk here. Until we get users on and load test, we can't be sure what kind of user base we can support. Just like our customers, sometimes we need to learn useful things by doing them and seeing what happens.
In all honesty: This is not an architecture we'd recommend in any production environment meant for serious workloads. It does not provide enough room for I/O growth, and could too easily produce bottlenecks. On the flip side, we think it's an excellent dev/lab design that could make good use of older hardware, and provide a good experience. Keep that in mind as you read more of our architecture in the future.
Are you going to discuss the network equipment that you selected?
@TristanYes, definitely. I'll start covering the network equipment on Friday, through next week. The actual design of and configuration of the L2/L3 parts of the network will come later on, and it much more detail.
A 288 node dev/lab is a pretty big playground :)It makes me be very curious about two things:1. Can you simulate an IaaS cloud in a much smaller lab, only with a couple of Hyper-V hosts, in order to learn this?2. What would be the minimum number of servers to start a real cloud providing business?This blog series is awesome so far!
@Istvan SzarkaThe absolute smallest set of hosts you could use would be 2-3. At minimum you'd need one host for the network virtualization gateways, and one for the fabric AND tenant VMs. The easier one would be 3...splitting tenant and fabric VMs between hosts, plus the aforementioned NV gateway host. The fabric VM host would have to be generously provisioned as well.As for the business side of things, that depends entirely on how you'd run and market your business. For a service provider who is offering some serious levels of support and service to customers, you could manage capacity very ad-hoc and stay small but profitable. There is definitely no set answer to that question.
Hi, can you explain to me if there is any specific featureset is needed on the fabric Switches? Stuff like Will a L2 Switch do, Need for BGP Capable Switches (assuming I have a dedicated NVGRE Gateway which is the Internet Access Point)? VLAN Capability?
I'm planning a even smaller dev lab and I'm especially confused about the BGP Part. Thanks for your help