• Not all Orchestration tools are created equal

    I’ve been doing a lot of work with the Release Candidate of System Center Orchestrator, so it’s always interesting to see what other orchestration products are capable of.  I recently read a blog post on Creating Workflow Loops in vCenter Orchestrator and I was struck by just how complicated it is to do relatively simple tasks, with lots of really arcane syntax to work with. 

    It’s probably worth taking a look at how System Center Orchestrator would accomplish a similar task.  First, we start with a new empty runbook.  I’m going to use a text file as an input into the runbook, but we could just as easily prompt for user input, or even better store the list of servers to be patched in a change control record in a service desk system (of course it doesn’t have to be the Microsoft service desk, that’s one of the strengths of the Orchestrator platform).

    We’re going to build a runbook that looks like this.

    Snapshot Final

    First we drag out a standard Read Line activity from the palette on the right from the  Text File Management category.  Then we’ll grab a Get VM List activity from the VMware vSphere.  We can then join the two activities together by hovering over the activity, clicking the arrow that appears and then drag it to the next activity.  We’ll repeat the process dragging out a Compare Values activity, and then finally a Take VM Snapshot activity.  We’ve now built the structure of our runbook, and we can go about customising the activities.

    First we’ll specify the text file to read in.  Double click the Read Line activity, and you can enter the properties.  I’ve used an ASCII format text file that just contains a list of VM Names, and the 1-END tells the activity to read from the first line to the end of the file.

    SnapshotReadLine

    The Get-VMList activity doesn’t need much customisation, it just needs to be told which vSphere connection to use (defined in the Runbook Designer under Options…VMware vSphere).

    SnapshotGetVMList

    The Compare Values activity allows us to compare two text or numeric values – we are going to use this to match our list of VM’s to snapshot from the text file against the list of VM’s returned from vSphere.  We’re going to use one of the real key features of Orchestrator, which is the concept of published data.  Each activity preceding this one returns data onto the databus, and any activity following can take advantage of the published data.  We’re going to use the Line Text returned from the Read Line activity, and the VM Names returned from the Get VM List activity.  In the Test area, we right click and choose Subscribe…Published Data.  We select from the drop down at the top the Read Line activity, and choose the Line Text option

    SnapshotReadLinePD

    We then right click in the pattern field and subscribe Published Data again, and choose the Get VM List activity.

    SnapShotGetVMListPD

    We’ll end up with a Compare Values task that looks like this.  (By default we do a text comparison, if you want to compare numbers, the general tab allows you to select that.)

    SnapshotCompareValue

    We can then customise the Take VM Snapshot activity to customise the behaviour.  Again, we’ll use some published data to identify which VM’s to snapshot.  Up to now we’ve been working with VM names, but the snapshot activity actually requires a VM Path parameter – not to worry, the VM Path is returned along with the VM Name as part of the Get VM List published data.

    SnapshotVMSnapshot

    The final step in the puzzle is to customise our link so that the snapshot task only runs for VM’s that match the list of VM’s in the text file.  To do this, we double click the link object between the Compare Values & Take VM Shapshot activities.  This brings up a dialog that allows us to set the conditions when we will execute the next step – by default it will proceed should the task succeed, but success in this case is simply that the task ran.  We will change it by clicking the text that says “Compare Values” and select the Comparison result published data.  We will then change the criteria to true by clicking the text that says value, and entering true in the popup.

    SnapShotLinkProperties

    The great thing about this runbook is that it will loop by itself, you don’t need to keep track of the loops (you might still want to maintain the state of the loop so you can restart the runbook should it strike an error) and without any strange syntax we’ve built an easy to understand & debug runbook.  The looping was handled automatically as part of the runbook, and we performed a relatively complex text comparison very quickly.

    Compare this to the vCenter Orchestrator example, and you’ll see that using System Center Orchestrator you’ll be up and running with your runbooks much faster.

  • Disingenuous Cost Comparisons

    I was inspired by Eric Gray’s recent post “Disingenuous cost comparisons” which purports to show all the ways that VMware costs less than Microsoft.  I thought I’d take Eric’s advice and see for myself by trying the VMware cost comparison calculator and see exactly how VMware are trying to reduce the high cost of their hypervisor. 

    I entered the data below as my starting point:

    vmwarelicensingcalc

    This gives the rather interesting claim that vSphere supports 50% more VM’s that Hyper-V:

    appsperhost

    That’s a pretty interesting claim, and in the spirit of transparency VMware are good enough to give the reasons they make this assumption.  It’s interesting to see what those claims are, and how they stack up against reality.

    VMwareMemoryMgmt

    From what I know of VMware’s memory overcommitment methods, there are 4 techniques they use:

    1. Transparent Page Sharing (TPS)
    2. Memory Ballooning
    3. Memory Compression
    4. Host Paging/Swapping

    The bottom two methods are only used once the host is under memory pressure, and isn’t a great place for you to be really.  If you’re at the point that the host is paging memory out to disk (even SSD) it’s still orders of magnitude slower than real memory, and will have a performance impact.  VMware point this out in their documentation:

    “While ESX uses page sharing and ballooning to allow significant memory overcommitment, usually with
    little or no impact on performance, you should avoid overcommitting memory to the point that it requires
    host-level swapping.” – Page 23

    TPS is the concept of finding shared pages of memory between multiple VM’s and collapsing that down to a single copy stored in physical memory, and ballooning allows the host to request the guest to release memory that it can, so it can reduce the overall memory footprint of a guest.  So of the “multiple methods” of oversubscribing memory, two are really used in day to day production.  What VMware don’t talk about much is that most of the benefit of TPS comes in the sharing of blank memory pages (i.e. memory that is allocated to a VM, but isn’t being used).  There is an incremental benefit of shared OS memory pages as well, but the majority of the benefit is from those blank pages.  TPS is also affected by large memory pages & ASLR technologies in modern versions of the Windows OS, and isn’t an immediate technology – TPS takes time to identify the shared pages, and runs on a periodic basis.

    Hyper-V has Dynamic Memory functionality that allows machines to be allocated the memory that they need, and ballooning to reclaim memory that isn’t in use.  In practice, this has similar benefits to TPS – blank pages are simply not allocated to VM’s until they need them so they can be used elsewhere as a scheduled resource.  And it’s faster than TPS, as it is immediately responsive to VM demand.  So on a direct comparison, TPS may save slightly more memory due to shared OS pages, but ultimately TPS & Dynamic Memory solve the blank memory pages issue in different ways. 

    VMwareDirectDriverModel

    I think in this case it’s best to let the facts speak for themselves.  It’s pretty clear from the sorts of IO numbers in those articles that the indirect driver model (parent partition architecture) doesn’t impose a bottleneck on Hyper-V IO performance.  And it certainly appears that Hyper-V support is a requirement of WHQL.  And it’s not like VMware can claim to have no driver problems.  It also always makes me laugh when VMware claim that their drivers are optimised for virtualisation.  What does that really mean?  From what I can see, optimised for virtualisation means that you’ve got drivers that can deliver massive amounts of IO to your hardware – which is exactly what Windows Server 2008 R2 has.

    VMwareGangScheduler

    Wouldn’t it be embarrassing if a company had invested so much time and money in building a highly optimised gang scheduler, and then someone came along with a general purpose OS scheduler and the independent test results showed that the general purpose OS scheduler performed as well as, or in some cases better than that gang scheduler (or even, that the vSphere results were so bad that they released a hotfix specifically for that issue)? The simple fact is that regardless of what VMware claim, Hyper-V does not use a general purpose OS scheduler – it just shows that they fundamentally don’t understand the Hyper-V architecture. The only place a general purpose OS scheduler is involved is in the parent partition & the running guests. The Hyper-V scheduler is not the parent partition scheduler – it is it’s own optimised for virtualisation scheduler.  So in terms of performance, Hyper-V more than holds it’s own against vSphere and it’s clear that VMware are just throwing this out there and hoping their customers don’t look into it.

    VMwareDRS

    DRS is certainly a useful technology for adding flexibility to virtual environments, and Microsoft ships a similar technology with System Center (called PRO) that does a similar function.  So if VMware’s cost calculator adds in extra virtual machines to allow for running System Center, surely they should also allow for the fact that this will give this extra functionality?  And without wanting to talk about the future, Virtual Machine Manager 2012 offers Dynamic Optimisation as an alternative to PRO if you don’t have System Center Operations Manager – give it a try.

     

    So looking through the claims for a 50% increase in VM’s per host we get:

    1. Memory overcommit – very slight advantage to VMware because of shared OS pages
    2. Direct Driver Model – no advantage
    3. High performance gang scheduler – no advantage
    4. DRS – similar functionality in System Center, and really contributes to flexibility, not a higher consolidation ratio.

    Based on those claims, I’m struggling to figure out how VMware can make these claims with a straight face.  And overall, my rating of VMware’s licensing calculator – nice try, but you’re smart guys and I’m sure you can do better.

  • Controlling HP ILO with Opalis

    Back to my long neglected blog…

    Inspired by Adam’s post about building his own Opalis integration pack I thought I’d give it a go myself.  I’ve seen the sessions at MMS, but never actually sat down to do this.  I thought I’d start with a scenario that I was looking at for my demo environment, which is all based on HP equipment, and I was trying to figure out a way to ensure that I can bring the demo environment up in an orderly fashion, and in a way that allows me to control when hosts start.  The first step in all of this is controlling the power state of the HP servers using ILO.

    HP publish tools to work with the ILO controller from the command line, and also a set of sample scripts that allow you to control the machine using the command line tools.

    To get things going, I downloaded the HP tools and installed them in the default location (C:\program files\HP Lights-Out Configuration Utility).  I created a C:\scripts directory and extracted the sample scripts into this directory.  I took the “Set_host_Power.xml” file and made two copies – one called “poweron.xml” & one called “poweroff.xml”.  The script shipped from HP turns the power off, so I left the poweroff.xml file alone, and edited the "poweron.xml file so that the line that read:
    <SET_HOST_POWER HOST_POWER="No" />
    was changed to:
    <SET_HOST_POWER HOST_POWER="Yes" />

    I also copied “get_host_power.xml” and named the new copy “getpower.xml” – not essential though.

    Login username & password are defined in the file, but we will override them at the command line we specify in our Opalis objects so we can leave them as default.

    You can test the command line control by issuing the following commands at the command line:

    cpqlocfg –s <ILO IP Address> –u Administrator –p <password> –f <input file from above>

    What I’ve done with my integration pack is simply wrap the command line & input files together into a single pack that I can then use to control my host power, and created some published data.

    Adam went through how to use QIK in it’s basic form, so I’m not going to cover that here.  What I’ll cover is the basic commands I added, and the published data I added, along with a couple of limitations I found.

    I’ll go through the most detailed one which is the GetPower command, as this returns public data.  I added a new command in QIK which runs the cpqlocfg tool.

    01. GetPower

    I then added my arguments as below.  It’s pretty straightforward – we have three parameters that we pass into the tool (ILO IP, Username, Password), we ensure that we define password as encrypted text so the password isn’t exposed in the GUI.  I’ve also hard-coded the path to the getpower.xml input file – one of the issues I found in QIK is that if you want to pass filenames with spaces in them or the path, you need to put double quotes around them but this makes the QIK compilation process fail. I’ve worked around this by using an 8.3 path to the file.

    02. GetPower

    What I also want this to do is to return to me the power state in a form I can use, and that’s where the published data tab comes in.  What I’m doing is using the “Extract Group” option when defining published data.  This actually pulls out the matching text from the output, and makes it available as published data.  This uses standard .NET regular expressions.  My output text looks like HOST_POWER=”ON” or HOST_POWER=”OFF”.  The regular expression I have below matches the = sign, followed by any character, followed by an O and then one or more other characters.  This is specific enough to match my text (I could also use =.(ON|OFF)) only, and not anything else.  The brackets around the text tells the regular expression to extract the actual data that matches, so my PowerStatus will pass data back that is either ON or OFF.  Ideally I would have used =”(O\w+), but again with QIK using double quotes causes some compilation issues that I needed to work around.

    04. GetPower

    So that’s it, I added a couple of other tasks – one to power the server on, and the other to power the server off – using the same command line & just pointing at the appropriate input files.  I didn’t add any published data to these tasks at this point.

    I also haven’t added any additional error handling in the tasks to extract error conditions, if I require this I’ll extract further information from the XML output.

    The last step was just to compile the dll into an Integration Pack as per the instructions in Adam’s blog.  I included the contents of C:\scripts to simplify matters.  I won’t attach the OIP file to this blog, as I don’t want to distribute HP’s files – you can download them yourselves.  I will attach the dll that I generated with QIK though so you can download and see what I’ve done.

  • Hotfixes for DPM Protected Hyper-V Guests

    One of the things that isn’t made 100% clear in the DPM documentation is what you need to do for Hyper-V VM’s that you are protecting at the Hyper-V host layer.  The “Protected Computer Software Requirements” document on Technet tells you that you need to apply certain hotfixes to the Hyper-V host, and you could be forgiven for thinking that was all you needed to do.

    However, if you remember that when DPM uses the Hyper-V VSS Writer to take a snapshot of a running VM, it also leverages the in-guest VSS Writers to ensure that the guest itself is consistent inside, so we are able to have application consistent backups.

    What’s the implication of this?  If there is a hotfix required in the physical world (for instance a file server running Windows Server 2003 requires hotfixes 940349 & 975759) then you should also have that hotfix applied to your protected VM’s, even though you aren’t running a DPM agent in that VM.  Essentially treat any machine you are protecting the same according to the “Protected Computer Software Requirements” document, regardless of whether it is physical or virtual, and protected at the host or guest level.

  • server cluster patching with opalis – part 2

    In my last post on this subject I set up the cluster patching framework in Opalis.  In this post I’m going to modify the policy initiation phase to use PowerShell & some text manipulation to make it more dynamic.

    I’d set this up originally to simply read the cluster nodes from a text file.  There are more options available to us though – we could use PowerShell to query the cluster for a list of cluster nodes, or in this case I’m going to use PowerShell to query Virtual Machine Manager to get the list of Hyper-V cluster nodes.

    I’ve got a simple PowerShell script which is as follows:

    Add-PSSnapin Microsoft.SystemCenter.VirtualMachineManager
    $cluster=get-vmhostcluster -name "democluster.contoso.com" -vmmserver "vmm.contoso.com"
    $vmhosts=get-vmhost -vmhostcluster $cluster | select -property Name

    Running this script will return me a variable $vmhosts which contains text as follows:

    @{Name=clusternode1.contoso.com}

    @{Name=clusternode2.contoso.com}

    That’s not quite in the format that I want and I could do some further processing in PowerShell to get it right, but I’m lazy, and also want to show off some of the text manipulation features in Opalis.

    First thing though, I want to execute this in Opalis.  Because I’m using Virtual Machine Manager objects I’ll need to have the VMM 2008 R2 console installed on my Opalis machine.

    Once I’ve got it installed I can then modify my policy.  I’m going to add a “Run .NET script” object.  “Run .NET Script” is pretty powerful and lets us run scripts in C#, JScript, VB.NET or PowerShell.  I just drag and drop my object into my policy, and then double click it to open the properties.

    My policy looks like this:

    overall policy

    When I open the properties, I can simply paste my script in there, and change the language to PowerShell:

    net script

    Now I need some way to get the output from the script.  The great thing about Opalis is that it makes this really easy.  I go to the “Published Data” area and tell the policy which variable I want to return as published data.  In this case my vmhosts variable is the data I want, so I simply add that in.  Now when I need to retrieve the published data from the data bus, vmhosts will appear as available.

    published data

    As I said before the data that comes out of that script isn’t exactly in the format I want – I only want hostnames, not all the other text so I need some way of stripping that out.  Fortunately this is another area that Opalis makes really easy.

    In my “Trigger Policy” task I previously set it up to pass the Computer Name parameter through to the next policy.  I’m going to continue to do this, but manipulate the text that I pass through.

    There is a screen shot below that shows the start of this, but I will expand further.  Opalis will treat information in square brackets as data manipulation functions, so I start with that.  In this case there is a consistent set of data coming out of the script – there is effectively a header (“@{Name=”) , the data (“clusternodex.contoso.com”) and a footer (“}”.  I’m going to use a combination of three functions – the Mid function, the Sum function and the Len function.  Mid allows me to retrieve text from the middle of a string, and Len will allow me to get the length of a string.  Sum just allows me to add two numbers together.

    Because my data is in a consistent format the information I want always starts at the 8th character, and the data I need to retrieve is from that point through to the second last character.  The length of text I need to grab is the overall length of the string less 8 characters (the 7 in the header plus the 1 in the footer).  For this I use the Sum function to add –8 to the length of the string.  I could equally use the Diff function to subtract 8.

    I then build my function as:

    [Mid('<vmhosts published data>’,8, Sum(Len(‘<vmhosts published data>’),-8))]

    I insert my published data by right clicking where I want to insert it and choosing “Subscribe…published data”.  I choose the “Run .NET Script” task, and the vmhosts data.

    Trigger Policy

    So now when my trigger policy task runs it will pass the ComputerName parameter as clusternodex.contoso.com, having stripped off the header and footer.

    If you wanted to simply use the Windows 2008 R2 Failover Cluster PowerShell cmdlet’s to get the cluster nodes you could do this:

    Import-Module -Name FailoverClusters
    Get-ClusterNode -cluster democluster | Format-Table -Property Name –HideTableHeaders

    This actually gives nicer output than the VMM Cmdlets but the VMM ones were slightly easier to work with in my case (I can execute them locally on the Opalis server).

    Next time we’ll look at some of the pre-patching checks.

    Edit: Should have made clear, the Opalis service account needs to have permission in VMM to execute this script.  As there is no concept of an "Operator" in VMM the Opalis account will need to be a delegated administrator.  To run the standalone Failover Cluster scripts the Opalis account simply needs to have Read-only permissions on the cluster.