Last week I was part of a team that traveled To San Jose, CA. Our mission was to get some hands on experience, and build a test lab that we can use to demo System Center, Private Cloud, VDI, Clustering, etc. etc., etc. for our events over the few months.
Our plan was to build out these 5 Dell Servers we got. Document what we did, so the next two team could come in and repeat our steps.
Monday we installed Windows Server 2008 R2 SP1 on all 5 boxes, we thought we were cool doing for Thumb drives (much faster) than DVD installs. Basically we took the defaults, didn’t really think about the drives, or partitions, or much. The Dell R710’s came with 2 or 6 TWO Terabyte drives. We just did install. Then realized the ones with 2 drives were striped, but why couldn’t we see the other drive? Yep Server has a 2 Terabyte boot partition limit. Oops. So we wanted to move around a couple of drives (add more to one machine, only leave one in others). So it was off to “undo the stripping” move the drives, and yes now Tuesday, start over with the installs, but BEFORE we did, we created a 2 Terabyte partition for the OS, and another partition for the rest of the storage. Ah learning's (or being stupid from the cold).
We also took the chance to make sure hardware virtualization was installed on all 5 servers. It wasn’t we thought we got them all. On Friday we had issues with the second server Hyper-v2. With a little research, Hyper-v service was failing (we even re-installed Hyper-v roll) Yes it would let us install the role of Hyper-v without hardware virtualization turned on. So always double check your BIOS and make sure hardware virtualization is turned on.
Things like this continued all week. Read Kevin’s Post below or on his blog about what we did with our Domain Controls that, um um caused us some issues!!!!
I will post more on the fun week, I learned a lot. I have more appreciation for all you Server Admins out there. Makes me realize how much worry the cloud will take out of this stuff.
We survived “being Stupid” from the cold, Yes our test lab is up and running now. More later!
The rest of this is a post from Kevin Remde on his learning's that week. Yes Don’t put your DC in your Cluster!
Okay.. I feel like sharing this because it’s pretty stupid, but in a geeky-sort-of-way the solution was interesting enough to share. Think Chicken & Egg. (or “Catch-22”).
As the title of this post suggests, the subject is Windows Failover Clustering. For those of you who are not familiar with it, Windows Failover Clustering is a built-in feature available in Windows Server 2008 R2 Enterprise and Datacenter editions. Along with shared storage (for which we used the free iSCSI Software Target from Microsoft to implement), it provides a very easy-to-configure and use cluster for serving up highly available services. In our case, this would be virtual machines running on two clustered virtualization hosts.
As a training platform, but primarily for use as a demonstration platform for our presentations (and certainly more real-world than one laptop alone can demonstrate), our team received budget to acquire several Dell servers. We found a partner (Thank you Groupware Technology!) who was willing to house the servers for us. The idea was that we, the 12 IT Pro Evangelists (ITEs) in the US would travel to San Jose in groups of 3-4 and do the installation of a solid private cloud platform, using Microsoft’s current set of products (Windows Server 2008 R2 and System Center). This past week I was fortunate enough to be a member of the first wave, along with my good buddies Harold Wong, Chris Henley, and John Weston. The goal was to build it, document it, and then hand if off to the next groups to use our documentation and start from scratch, eventually leaving us with great documentation, and a platform to do demonstrations of Microsoft’s current and future management suites.
We all arrived in San Jose Monday morning, and installed all 5 server operating systems in the afternoon. We installed them again Tuesday morning.
It’s a long story involving how Dell had configured the storage we ordered. We needed to swap some drives between machines and set up RAID and partitioning in a way that was more workable to our goals. I’ll leave that discussion for one of my teammates to blog about.
Anyway, once we had the servers up, I installed and configured the Microsoft iSCSI software target on our “VMSTORAGE” server, and configured two other servers as Hyper-V Hosts in a host cluster, with Windows Failover Clustering and CSV storage. By the end of the week we had overcome hardware, networking, missed-BIOS-checkmarks (did you know that Hyper-V will install, but you can’t actually use it if you somehow miss enabling Virtualization support on the CPU on one of the host cluster machines? Who’da thunk it?!) , we had 5 physical and a half-dozen virtual servers installed and running, with Live Migration enabled for the VMs in the cluster. Our domain had two domain controllers; one as a clustered, highly-available VM, and the other as a VM that was not-clustered, but still living in the CSV volume; C:\ClusterStorage\Volume1 in our case. (That’s a hint, by the way. Do you see the problem yet?)
One of the many hurdles we had to overcome early on was an inadequate network switch for our storage network. 100Mbps wasn’t going to cut it, so until our Gig-Ethernet switch arrived on Friday, Harold used his personal switch that he carries with him. On Friday before we left for the airport, we shut down the servers and let the folks there install the new switch. Harold need his switch back at home.
But in restarting the servers, here’s the catch: Windows Failover Clustering requires Active Directory. The storage mount-point (C:\ClusterStorage\Volume1) on our cluster nodes requires the Failover Clustering. And remember where I said our domain controllers were?
“Um.. So… Your DCs couldn’t start, because their location wasn’t available. And their location wasn’t available, because the DC’s hadn’t started. And your DC’s couldn’t start, because their storage location wasn’t available, and… !!”
Bingo. Exactly. Chicken, meet Egg. It was our, “Oh shoot!” moment. (Not exactly what I said, but you get the idea.)
“So how did you fix it?”
I’ll tell you…
Our KVM was a Belkin unit that supports IP connections and access to the machines through a browser. We configured it to be externally accessible. So I was able to use that to get in to the physical servers and try to solve this “missing DCs” puzzle; though to make matters much more difficult, the web interface for that KVM is really, REALLY horrible. The mouse didn’t track to my mouse directly, no ALT+ key support, TAB key didn’t work.. I ended up doing a lot of the work from a command-line simply because it was easier than trying to line up and click on things! Perhaps in a future blog post I will give Belkin a piece of my mind regarding this piece-of-“shoot” device…
So, my solution was based on two important facts:
“Ah ha! So on the storage machine, you mounted the .VHD file that was your cluster storage disk, and you copied out the .VHD file from one of the domain controller VMs!”
Yeah.. that’s basically it. Though I did have one problem. The .VHD file was in-use; probably by the iSCSI Software Target service. So when I tried to attach it, the OS wouldn’t let me.
Fortunately I found that by stopping that “Microsoft iSCSI Software Target” service on the storage server (I also stopped the “Cluster Service” on the two Hyper-V cluster nodes), I was able to attach to the .VHD, navigate into it, and copy out the .VHD disk for the needed Domain Controller. (Actually, I also removed the .VHD from its original location. I didn’t want the DC to come alive again when the storage came back online, if the identical DC was already awake and functioning.)
So after that, it was as simple (?) as this:
Everything came back to life almost immediately; including the Remote Desktop Gateway that we had configured so that we could remotely connect to the machines in a more meaningful, functional way.
So the moral of the story is: When you’re building your own test lab, or even considering where to put your DCs in your production environment, make sure you have at least one DC that comes online without depending upon other services (such as high-availability solutions) that, in turn, require a DC to be functioning.
All-in-all, it was a great week.