Contoso Labs-Fabric Choices (Network)

Contoso Labs-Fabric Choices (Network)

  • Comments 5
  • Likes

Contoso Labs Series - Table of Contents

In the prior post, we discussed how our major decision revolved around storage. The choices we made there would dictate the cost of the rest of the solution, so we had to make a decision there first, and carefully at that. Once we knew we'd be using Scale-Out File Servers and a converged fabric, we had to evaluate what that meant for our network.

Speeds and Feeds

Our first inclination was to get excited by all the possibilities and performance that the SOFS+SMB 3.0 stack could enable. How awesome is your life when you technologies like SMB Multichannel and SMB Direct (RDMA) at your disposal? Once our budget numbers came back, we realized we had to scale back our ambitions a little. While SMB Direct is amazing, it requires special RDMA-enabled 40GbE, 10GbE, or Infiniband cards. That presented a few serious problems.

  • Dual port, RDMA-capable cards are expensive. Well over $500 a piece for a 10GbE example, with limited options because it is newer technology. Do the math for one per 288 nodes, and you see a tremendous cost.
  • High speed NICs require drastically more capable and expensive switching and routing infrastructure. 48-port 10GbE switches cost 2-4x more than their 1Gbe counterparts. SFP+ cables often cost 5-10x or more of an equivalent length CAT-6 cable. When you need a thousand of them? Ouch.
  • On top of that, because our compute nodes are Gen6 servers, we learned that it's actually frighteningly easy to overwhelm the PCIe bus in some RDMA scenarios. Our network could be TOO FAST FOR OLDER PCIe. What a great problem to have.

We decided to do the math to see if we could get away with using the existing connectivity in our scenario. The compute nodes currently have four 1GbE ports each. Frankly, that's not very much. 4Gbit of throughput, split between management, storage, AND tenant traffic is almost criminally low. We had to debate long and hard about whether we could get away with it. After long deliberation, we decided that our users and workload could manage with this limited throughput, since there won't be heavy stress being generated during steady-state. Our worst case scenario is boot storms, which we think we can manage. However, we're acknowledging there's serious risk here. Until we get users on and load test, we can't be sure what kind of user base we can support. Just like our customers, sometimes we need to learn useful things by doing them and seeing what happens.

In all honesty: This is not an architecture we'd recommend in any production environment meant for serious workloads. It does not provide enough room for I/O growth, and could too easily produce bottlenecks. On the flip side, we think it's an excellent dev/lab design that could make good use of older hardware, and provide a good experience. Keep that in mind as you read more of our architecture in the future.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Are you going to discuss the network equipment that you selected?

  • @Tristan

    Yes, definitely. I'll start covering the network equipment on Friday, through next week. The actual design of and configuration of the L2/L3 parts of the network will come later on, and it much more detail.

  • A 288 node dev/lab is a pretty big playground :)

    It makes me be very curious about two things:
    1. Can you simulate an IaaS cloud in a much smaller lab, only with a couple of Hyper-V hosts, in order to learn this?
    2. What would be the minimum number of servers to start a real cloud providing business?

    This blog series is awesome so far!

  • @Istvan Szarka

    The absolute smallest set of hosts you could use would be 2-3. At minimum you'd need one host for the network virtualization gateways, and one for the fabric AND tenant VMs. The easier one would be 3...splitting tenant and fabric VMs between hosts, plus the aforementioned NV gateway host. The fabric VM host would have to be generously provisioned as well.

    As for the business side of things, that depends entirely on how you'd run and market your business. For a service provider who is offering some serious levels of support and service to customers, you could manage capacity very ad-hoc and stay small but profitable. There is definitely no set answer to that question.

  • Hi, can you explain to me if there is any specific featureset is needed on the fabric Switches? Stuff like Will a L2 Switch do, Need for BGP Capable Switches (assuming I have a dedicated NVGRE Gateway which is the Internet Access Point)? VLAN Capability? I'm planning a even smaller dev lab and I'm especially confused about the BGP Part. Thanks for your help