Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Insufficient data from Andrew Fryer

Sort by: | |
• How to train your MAML

In this fourth post in my Azure Machine Learning series we are actually going to do the training itself.  I have tidied up my experiment from last time to get rid of the modules to export data to SQL Azure as that has now served it’s purpose ..

Before we get stuck into training there’s still a bit more tidying up to do:

• We can get rid of the year month and day columns as they were only needed to join our datasets and don’t have any relevance in predicting flight delay (actually they might do but I am keeping things simple here).
• We can put the weather reading data into discrete ranges rather than leaving them as continuous variables.  For example temperature could be aggregated into groups say less than –10,  –10 to 0, 0-10 , 10 , 20-40 and  40 plus.  This is called quantization in the same way that the  quantum in quantum physics recognises that particles like electrons have discrete amounts of energy.  There’s a special quantization module to help us do this in ML studio.

So lets do those tow steps in ML studio. First add in another Project Columns module under the join to remove the time based columns..

Now add in the a Quantize module underneath that.  Quantization can work in several ways depending on the binning and quantile normalization options we select in this module.  We can design our own bins into which each value falls by setting the bin edges (like the temperature groups I just mentioned).  If we decide to let the Quantize model set the bin edges automatically then we can choose the algorithm it will use,  the quantile normalization method, and we can set how many bins we want.  For our initial training we’ll select 10 bins for all of the numerical data we are going to use to base our predictions on and we’ll overwrite the columns with the new values..

If we now Visualize the left hand output of the quantize function we can see that there are now 10 unique values for each of our numerical columns..

Now we can  actually get on with the training itself, although in the real world we may have to revisit some of the earlier decisions we made to improve the output.  The process of training is much like learning simple arithmetic we are introduced to a new symbol say + and given example of how it works.  To confirm we’ve got it we use it ourselves and compare our results against some more examples.  In machine learning we do the initial training by carving out a sample from our data set.  In MAML we can use the split module to do this which we used this in my last post to just to get us the data from one airport we are now going to use the split function to get a random sample.

To do create our training set we drag a Split module to the bottom of the design surface and connect its input to the left hand output of the Quantization module and set its properties as follows to give a 20% random sample we can use for training..

The question now is which algorithm to use to make our flight delay prediction? This is a huge topic in its own right and also requires prior knowledge of statistics to make sense of what some of these do.  Also you can elect to ignore all the good work Microsoft have done in providing some really powerful algorithms used in Xbox and Bing and bring your own algorithm written in open source R language. In MAML there are three types of built in algorithms, Classification, Clustering & Regression so what exactly do those terms mean?

• Classification is where we want to determine which group something belongs to – in our case we want to determine whether a flight is in the delayed group r the on time group.
• Clustering is similar to that but this time we don’t know what the groups are.  For example we might take a collection of purchases from a shopping web site and MAML would work out what things were bought with what other things to provide recommendations when a customer places something in their basket.
• Regression is the process of fitting points to curve or line (hence linear regression) so here we are looking at predicting a continuous variable say house price based on attribute of a property such as age area, nearest station distance etc.

So if we expand out the  Machine Learning | Evaluate | Classification object on the left of ML studio we can see some curiously named modules like Multiclass Decision Jungle  which we can use..

Module

Description

Multiclass Decision Forest

Create a multiclass classification model using a decision forest

Multiclass Decision Jungle

Create a multiclass classification model using a decision jungle

Multiclass Logistic Regression

Create a multiclass logistic regression classification model

Multiclass Neural Network

Create a multiclass classifier using a neural network algorithm

One-vs-All Multiclass

Create an one-vs-all classification model

Two-Class Averaged Perceptron

Creates an Averaged Perceptron binary classification model

Two-Class Bayes Point Machine

Create a Bayes Point Machine binary classification model

Two-Class Boosted Decision Tree

Create a binary classifier using a boosted decision tree algorithm

Two-Class Decision Forest

Create a two-class classification model using a decision forest

Two-Class Decision Jungle

Create a two-class classification model using a decision jungle

Two-Class Logistic Regression

Create a two-class logistic regression model

Two-Class Neural Network

Create a binary classifier using a neural network algorithm

Two-Class Support Vector Machine

Creates a Support Vector Machine binary classification model

At the time of writing the help on MSDN on what these do is pretty opaque to mere mortals and to add to the fun many of these have parameters that can be tweaked as well.  At this point it’s worth recapping what data we have and what we are trying to do - In our case there is probably some sort of relationship between some of our columns for example  visibility humidity wind speed and temperature and our two outcomes are that the flight is delayed or it isn’t.  So let’s jump in as we can’t really break anything and take Microsoft’s advice that Two-Class Boosted Decision Tree is really effective in predicting one of two out comes especially when the data is sorted of related.

The other thing we can do while training our model is to work out which of columns are actually informing the decision making process.  Eliminating non significant columns both improves computation time (which we are paying for!) and also improves accuracy by eliminating spurious noise.  In our experiment it might be that the  the non-weather related attributes a factor such as  the carrier (airline), and the originating and departing airports.  Rather than guess or randomly eliminating columns with yet another Project columns module we can get MAML to do the work for us by using the Sweep Parameters module. Let’s see this in action –drag the Sweep parameters module onto the design surface. Note it has three inputs which we connect as follows:

• the algorithm we want to use which is a Two-Class Boosted Decision Tree module
• the trained data set this is the 20% split from the split module – so the left hand output.
• the data to be used to evaluate the sweep is working which is simply the rest of the data form the split module.

So how does the evaluation but work – simply by selecting the column to be used which has the answer we are looking for , whether or not  a flight is delayed which is the ArrDel15 column.  We can simply leave the rest of the sweep parameters alone for now and rerun the model.

Each of the outputs from the sweep look very different from what we have in our data set, and although accuracy (at an average of 0.91) is in one of the columns it’s difficult for a data science newbie like me to work out how good the predictions are is. Fortunately we can get a better insight into what’s going on by using the Score and Evaluate modules.  To do this we connect the Score module the left output of the sweep parameters module, underneath this we can then drag on the Evaluate module and connect the output of the score module to the left hand input and rerun the experiment( neither of these have any settings )

If we now look visualize the output of the Evaluate module we get this graph..

What is this telling us? The key numbers are the false positive figures just under the graph where we can see that there were 302,343 correctly predicted on late flights, but 98,221 were identified as being late but were on time and that 1,466,0871 flights were correctly predicted as being on time but 61,483 were predicted to be on time but were late.  The ROC graph is also an indication of accuracy the greater the area between the blue line and a straight line is an indication of accuracy as well so the closer the blue line is to un upside down L shape the better it is.

Could we improve on this score and how might we do that?  We could tweak the parameters for the Two-Class Boosted Decision Tree and what parameters we sweep for or we could see if there is a better algorithm.  For this I would use the technique I use to evaluate a good wine which is to try blind taste two wines and select my favourite hang on to that and then  compare that with the next wine and continue until I have my favourite of a given group.  In MAML we can do this by adding another branch tot eh second input of the second input evaluate module and compare how they do.

By way of an example we can drag on the Two-Class Logistic Regression module,  copying and pasting the Score Model and Sweep Parameters modules  and connecting up these as shown below..

and if we visualize the evaluate module we now get two ROC curves and by highlighting them we can see the detailed results  where the red curve is the right hand input so our Two-Class Logistic Regression module

We can see that the Two-Class Boosted Decision Tree module is slightly outperforming Two-Class Logistic Regression and we should stick with that or compare it to something else.

So not only is MAML really easy to setup but provides sophisticated tools to help you to evolve to the right algorithm to meet your business need and we still haven’t had to enter any code or get too deep into maths & stats.  What now – well we could output the data to something like Power BI to show the results but we might want also want to use the experiment we have made to predict flights on a departure by departure basis in some sort of web site and that’s what we’ll look at in my next post in this series.

• How to train your MAML – Looking at the data in SQL Azure

In my last post we saw how to clean, transform and join datasets. I also mentioned I had trouble doing the join at all and even now it’s not quite right so how can we look at the data and find out what’s going on.  The visualisation option only shows a few rows of the 2 million in the dataset and there’s not really anything in ML Studio to help.  However we can export the data in a number of ways, to Azure storage either as a table or blob or directly into SQL Azure.  The latter seems more useful as we need to investigate what’s going on when the data is joined, however a little bit of pre-work is required as ML studio won’t create the tables for us – we’ll have to do that ourselves.  We also don’t need all the data just a consistent set to work on so let’s start by understanding the split module in ML studio, which we’ll need later on to train our experiment.

Search for the Split module in MLStudio and drag it onto the design surface, connect it the last Apply Math Operation in the flight delay data set process and set the split properties as shown..

What the split does is to send the filtered data that meets our criteria to the result dataset1 (on the left) and the data that doesn’t to the result set on the right (result dataset 2). as well see later normally we would use this to randomly split out some of the data at random but in this case we are using a relative expression - \"DestAirportID" > 11432 & <11434 ( I did try \"DestAirportID" = 11433 but I got an error!) to just give us the flights arriving at one airport. If we run the model now we can visualise the data in the left output the split module we’ll just one value (1433).

 Note when we rerun experiment only the bits that have changed and what they affect are actually run – in this case only the split itself is being run the rest is being read form cache (in the  storage account)

Now we need somewhere to put the data which in this case will be SQL Azure.  As with MAML everything can be done from the browser.

Go to the Azure management portal and select the SQL Databases option and at the bottom of the screen click on the plus to create a new database with quick create. Mines in northern Europe, it’s basic with no replication and  is 1Gb (you won’t need that much)..

You will also be prompted for an administrator ID and password. Once the database is created we now need to ensure it remotely and so we need to open up the firewall much as we would do if the database was on any remote server in our data centre.  To this in SQL Azure  click on the configure option ..

You will see your current ip address here which you can then use to make a rule (I have hidden mine in the screen shot above) . Now we can go back to the dashboard and click on the hyperlink to design or run queries against our new database from the SQL management portal (you’ll be asked to login first).  Now we can add in a table for our flight delay data as ML studio won’t do that for us.  We need it to have the right data types and rather than you doing it manually here is the query you can run to create it.

USE ML

CREATE TABLE [dbo].[FlightDelay] (
[ID]              INT          IDENTITY (1, 1) NOT NULL,
[Month]           INT          NOT NULL,
[DayofMonth]      INT          NOT NULL,
[Carrier]         NVARCHAR (5) NOT NULL,
[OriginAirportID] INT          NOT NULL,
[DestAirportID]   INT          NULL,
[DepDelay]        INT          NULL,
[CRSArrTime]      INT          NULL,
[ArrDel15]        INT          NULL,
);

CREATE CLUSTERED INDEX [IX_FlightDelay_0]
ON [dbo].[FlightDelay]([ID] ASC);

Notice that there is a separate ID column with the identity type so that we have a primary key for each row.

Now we can see how to export data to SQL Azure from ML studio.  Drag the Data Writer model onto the design surface and connect it to the left hand output of the split module..

Set the module properties as follows;

• Data destination:  SQL Azure
• Database server name: your database server name
• Database name ML (in my case)
• Check accept any server certificate
• Coma separated list of columns to be saved:   Month,DayofMonth,Carrier,OriginAirportID,DestAirportID,CRSArrTime,DepDelay,ArrDel15
• Data table name Flight Delay
• Comma separated list of datatable columns: Month,DayofMonth,Carrier,OriginAirportID,DestAirportID,CRSArrTime,DepDelay,ArrDel15
• Number of rows to be written per SQL Azure operation: 50 (the default)
 Note the columns names above are case sensitive and the number of columns input and output must be the same.  Also be aware if you run the experiment again you’ll add more rows to he SQL tables each time so remember to empty the table before a run – truncate table FlightDelay; truncate table AirportWeather

If we run the experiment now we will populate the FlightDelay table in SQL azure and each rerun will truncate the table and repopulate this (I couldn’t see how to override this).  Once that’s working OK we can then repeat the exercise for the weather data:

• In the SQL Azure management portal create a new table AirportWeather

USE [ML]
CREATE TABLE [dbo].[AirportWeather] (
[ID]               INT IDENTITY (1, 1) NOT NULL,
[AirportID]        INT NULL,
[Month]            INT NULL,
[Day]              INT NULL,
[Time]             INT NULL,
[TimeZone]         INT NULL,
[Visibility]       INT NULL,
[DryBulbCelsius]   INT NULL,
[WetBulbCelsius]   INT NULL,
[DewPointCelsius]  INT NULL,
[RelativeHumidity] INT NULL,
[WindSpeed]        INT NULL
);

CREATE CLUSTERED INDEX [IX_AirportWeather_0]
ON [dbo].[AirportWeather]([ID] ASC);

• Copy and past the existing Split module and connect it to the last Output Math Operation for weather data process and change the relative expression to  \"AirportID" > 11432 & <11434
• Copy and paste the existing write module and connect it to the new Split module for the weather data. Change the two setting for the columns to be used to AirportID,Month,Day,Time,TimeZone,Visibility,DryBulbCelsius,WetBulbCelsius,DewPointCelsius,RelativeHumidity,WindSpeed

Now we’ll leave ML Studio and look at what we have in the SQL Azure portal. Click on new query and paste following in..

Select F.Month, F.DayofMonth, F.CRSArrTime, F.DestAirportID, F.CRSArrTime, F.DepDelay, F.OriginAirportID, A.DewPointCelsius, A.DryBulbCelsius, A.RelativeHumidity, A.Visibility, A.WetBulbCelsius, A.WindSpeed
from FlightDelay F
inner join AirportWeather A
on F.DestAirportID = A.AirportID  and F.Month = A.Month  and F.DayofMonth = A.Day
and F.CRSArrTime = A.Time
order by
F.Month, F.DayofMonth, F.CRSArrTime

and notice that we get back 61,231 rows compared to the row count of 62,211 for the FlightDelay table which means we are losing 980 rows of data.  887 rows of this are down to the fact that there are no rows in the weather data for 0  (midnight)  but there are in the flight data (midnight in the weather data shows as 24). So something is wrong with the mechanism for matching the flight arrival time with the nearest timed weather reading.  This was not a deliberate error on my part I just based this all on the example in ML studio but it does show two things:

• Always check your findings – in this case the wrong weather data was being used which will affect the accuracy of our predictions
• Don’t blindly trust someone elses work!

Anyway in this case it’s easy to fix. all we need to do is to subtract one hour from the weather reading data by adding in another Apply Maths Operation Module as show below..

if we run our query in SQL Azure again we will get back 62159 rows which means we are missing a few rows of weather data and so certain flights will are being dropped when we do the join.  If we were to use this in BI then we would need to fix this but what we need for machine learning is clean data and now we have made this fix we have a good set of data on which to make predictions and that's what we will start to look at in my next post.

• How to Train your MAML–Refining the data

In my last post we looked at how to load data into Microsoft Azure Machine Learning using the browser based ML Studio.  We also started to look at the data around predicting delayed flights and identified some problems with it and this post is all about getting the data into the right shape to ensure that the predictive algorithms in MAML have the best chance of giving us the right answer.

Our approach is three fold

• To discard the data we don’t need, either columns that aren’t relevant or are derived from other data and to discard rows where there is missing data for the columns (features in machine learning speak)
• To tag the features correctly as being numbers or strings and whether they are categorical or not.  Categorical in this context means that the value puts them in a group rather than being continuous so AirportID is categorical as it puts a row into a group of rows for the same airport where temperature is a continuous variable and the numbers do represent point on a line (where AirportID 1 is nothing to do with ID 3 or 4).
• To join the flight delay dataset to the weather data set on the Airport and the date/time. In my last post I mentioned that we could either join the weather data in twice once to the departure airport and once to the arriving airport and indeed the sample experiment on flight delay prediction does exactly this but I think a simpler approach is to just model the arrival delay on the fact that some flight have a delayed departure time which may or  may not be influenced by the weather at the departure airport.

Let’s get started..

Open ML Studio, create a new experiment , give it a suitable name and drag the flight delays and the Weather datasets onto the design surface so it looks like this ..

Clean the data

As before we can right click on the circle at the bottom of the data set and select visualize data to see what we are working with- for example here’s the weather data.

What’s is odd here is that the data is not properly typed in that some of the numeric data is in a column marked string such as  the weather data set temperature columns.  I spent ages trying to work out how to fix this and the answer turns out to be to use the Convert to Data set module which automatically does this.  So our first step is to drag tow of them onto the design service and connect them to each of our data sets..

If we run our model (run is at the bottom of the screen) we can then visualize the output of the convert to dataset steps and now our data is correctly identified as being numeric etc.

The next step is to get rid of any unwanted columns and this is simply a case of using the project columns module (to find it just use the search at the top of the modules list).  You can either start with a full list of columns and remove what you don’t need or start with an empty list and add in what you do need. So lets drag it onto to the design surface and then drag a line from the Flight Delays Data to it.  It’ll have a red X against it as it’s not configured and we can do this from the select columns on the task pane

Here I have selected all columns and then excluded Year , Cancelled, ArrDelay, DepDelay15, and CRSDeptime.  At this point we can check to see that what we get is what we wanted by clicking the run button at the bottom of the screen.

 Note It’s only when we run stuff in ML Studio that we are being charged for computation time using this service, the rest of the time we are just charged for the storage we are using (for our own data sets and experiments)

As before at each stage we can visualize the data that’s produced by right clicking on its output node..

Here we can see that we have one column Depdelay that has missing values so the next thing we need to do is to get rid of that and we can use the Missing Values Scrubber module for this so search for that and drag it on to the design service and drag a connector from the output of the project columns module to it.  We then need to set its properties to set how to deal with the missing values.  As we have such a lot of clean data we can simply ignore any rows with missing values by setting the top option to remove entire row..

We can now run the experiment again to check we have no more missing values.

Now we need to do some of this again for the weather dataset. We can then add in another project column module to select the columns we need – this time I am starting with an empty list and specifying which columns to add..

and the data scrubber module again set to remove the entire row…

Tag the Features

Now we need to change the metadata about some of the columns to ensure ML studio handles them properly. Here I cheated which shows you another feature of ML studio.   Remember that some of the number in our data are codes rather than being a continuous number for example the airport codes and the airline code. We need to tell MLStudio that these are categorical  by using the Metadata Editor module. To this we are going to cheat and by simply copying that module form another experiment.  Open another browser window and go into the ML Studio home page and navigate to the Flight Delay sample prediction.  Find the Metadata Editor module on their and paste it to the clipboard and then go back into the browser with our experiment and paste it in, and you should see that this module is set to make Carrier, OriginalAirportID and DepAirPortID categorical...

Join the datasets

Now we have to sets of clean data we need to join them.  They both have an airport ID, month and day and the flight delay data set has an arrival time to the nearest minute.  However  the weather data is taken at 56 minutes part the hour every hour and is in local time with a separate time zone column. So what we need to do is round up the flight arrival time to the nearest hour and do the same for the weather data as follows:

For the flight delay arrival time

1. Divide the arrival time by 100

2. Round down the arrival time to the nearest hour

For the weather data time

3. Divide the weather reading time by 100 to give the local time in hours

4. round up to the nearest hour

So how do we do that in ML studio? The answer is one step at a time making repeated use of the Apply Math Operation  module.  Help is pretty non existent for most of these modules at the time of writing so experimentation is the name of the game, and I hope I have done that for you. We’ll place 4 copies of the Maths Operation module on the design surface one for each step above (so two linked to the weather dataset and two to the flight delay set) ..

Step 1

note the output mode of inplace which means that the value is overwritten and  we get all the other columns in the output as well, so make sure this is set for each of the four steps.

Step 2

Step 3

Step 4

Now we can use the Join module (again just search for Join and drag it onto the design surface) to connect our data sets together.  Not surprisingly this module has two inputs and one output and we’ll see several modules with multiple inputs and outputs in future.  Connect the last module in each of our data set chains into the join and set the properties for the join as shown..

so on the left (flight data ) we have Month,DayofMonth,CRSArrTime,DestAirportID and on the right (the weather data) we have Month,Day,Time,AirportID.

I have to be honest it took a while to get here and initially I got zero rows back.  Even now it’s not quite perfect as I have got slightly more rows than I started with which I have tracked down to having the odd hour in the weather data that has two readings.  Finding that kind of data problem is beyond what you can do in ML studio in the preview so in my next post I’ll show you your options for examining this data outside of ML studio.

• How to train your MAML – Importing data

In my last post I split the process of using Microsoft Azure Machine Learning (MAML) down to four steps:

• Import the data
• Refine the data
• Build a model
• Put the model into production.

Now I want to go deeper into each of these steps so that you can start to explore and evaluate how this might be useful for your organisation.  To get started you’ll need an Azure subscription; you can use the free trial, your organisation’s Azure subscription or the one that you have with MSDN.  You then need to sign up for MAML as it’s still in preview (note using MAML does incur charges but if you have MSDN or a trial these are capped so you don’t run up a bill without knowing it)..

You’ll now have an extra icon for MAML in your Azure Management Portal and the from here you’ll need to create an ML Workspace to store your Experiments (models).  Here I have one already called HAL but I can create another if I need to by clicking on New at the bottom of the page, and selecting Machine Learning and clicking on it and then following the Quick Create wizard..

Notice that as well as declaring the name and owner I also have to specify a storage account where my experiments will reside and at the moment this service is only available in Microsoft’s South Central US data centre.  Now I have somewhere to work I can launch ML Studio from the link on the Dashboard..

This is simply another browser based app which works just fine in modern browsers like Internet Explorer and Chrome..

There’s a lot of help on the home page from tutorials and sample to a complete list of functions and tasks in the tool.  However this seems to be aimed at experienced data scientists who are already familiar with the concepts of machine learning.   I am not one of those but I think this stuff is really interesting so if this all new to you too then I hope my journey through this will be useful but  I won’t be offended if you break off now and check these resources because you went to University and not to Art College like me!

In my example we are going to look at predicting flight delays in the US based on one of the included data sets.  There is an example experiment for this but there isn’t an underlying explanation on how to build up a model like this so I am going to try and do that for you. The New option on the bottom of this ML studio screen allows you to create a new experiment and if you click on this you are presented with the actual ML studio design environment..

ML studio works much like Visio or SQL Server Integration Services, you just drag and drop the boxes you want on the design surface and connect them up but what do we need to get started?

MAML needs data and there are two places we can import this -  either by performing a data read operation from some source or creating a data set or. At this point you’ll realise there’s lots of options in ML Studio and so the search option is a quick way of getting to the right thing if you know it’s there.  If we type reader into the search box we can drag that onto the design surface to see what it does..

The Reader module comes up with a red x as it’s not configured, and to do that there a list of properties on the right hand side of the screen.  For example if the data we want to use is in Azure blob storage then we can enter the path and credentials to load that in.  There are also options for http feed , SQL Azure,  Azure Table Storage as well as HiveQuery (to access Hadoop and HDInsight)  and PowerQuery. PowerQuery is a bit misleading as it’s actually a way of getting OData and one example of that is PowerQuery.  Having looked at this we’ll delete it and work with one of the sample data sets.

Expand the data sources option on the left you’ll see a long list of samples from IMDB film titles to flight delays and astronomy data. If I drag the Flight Delays Data dataset onto the design surface  I can then examine it by right clicking on the output node at the bottom of it, right click and select Visualize..

this is essential as we need to know what we are dealing with and ML Studio gives us some basic stats on what we have..

MAML is fussy about it’s diet and heres’ a few basic rules:

• we just need the data we that are relevant to making a prediction.  For example all the rows have the same values for year (2013) so we can exclude that.
• There shouldn't be any missing values in the data we are going to use to make a prediction and 27444 rows of this 2719418 row data set have missing departure values so we will want to exclude those.
• No feature should be dependant on another feature much as in good normalisation techniques for data base design.  DepDelay and DepDel15 are related in that if DepDelay is greater then 15 minutes then DepDelay = 1.  The question is which one is the best at predicting the Arrival Delay, specifically ArrDel15 which is whether or not the flight is more than 15 minutes late.
• Each column (feature in data science speak) should be of the same order of magnitude.

However eve after cleaning this up there is also some key missing data to answer our question “why are flights delayed?” It might be problems associated with the time of day or the week , the carrier our difficulties at the departing or arriving airport, but what about the weather which isn’t in our data set?  Fortunately there is another data set we can use for this – the appropriately named Weather dataset.  If we examine this in the same way we can see that it is for the same time period and has a feature for airport so it's be easy to join to our flight delay dataset. The bad news is that most of the data we want to work with is of type string (like the temperatures) and there redundancy ion it as well so we’ll have some clearing up to do before we can use it.

Thinking about my flying experiences it occurred to me that we might need to work the weather dataset in twice to get the conditions at both the departing and the arriving airport. Then I realised that any delays at the departing airport might be dependant on the weather and we already have data for the departure delay (DepDelay) so all we would need to do is to join it which we’ll look at in the next post in this series where we prepare the data. based on what we know about it.

Now we know more about our data we can start to clean it and refine it and I’ll get stuck into that in my next post but just one thing before I go – we can’t save our experiment yet as we haven’t got any modules on there to do any processing so don’t panic we’ll get to that next.

• Learning Machine Learning using a Jet Engine

In this post I want to try and explain what machine learning is and put into context with what we used to do when analysing and manipulating data.  When I started doing all this the hot phrase was Business Intelligence (BI),  but this was preceded by EIS (Executive Information Systems) and DSS (Decision Support Systems).  All of these were to a larger extent looking backwards like driving a car using just the rear view mirror.  There was some work being done on trying to do look ahead (predictive analysis) and this was typically achieved by applying data mining techniques like regression (fitting points to a line or curve).

Nowadays the hot topic is Machine Learning (ML) so is that just a new marketing phrase or is this something that’s actually new?  Data mining and ML both have a lot in common; complex algorithms that have to be trained to build a model that can then be used against some set of data to derive something that wasn’t there before. This might be a numeric (continuous value) or a discreet value like the name of a group or a simple yes/no.  What I think makes ML different is that it’s there for a specific purpose where data mining is more like a fishing exercise to discover hidden relationships where one such use is predictive analytics.  So data mining could be considered to be a sub set of ML and one use of ML is to make predictions.

If I look at how Microsoft has implemented ML in Azure (which I will refer to as MAML), then there is a lot of processes around data acquisition before training and publishing occur.  In this regard we might relate this to a modern commercial jet engine.

• Suck the data in to the ML workbench..

This can be done either from a raw source or via HD Insight (Hadoop on Azure) which means MAML can work with big data.  Note that in the jet  engine diagram much of the air bypasses the engine if we are using big data in ML then we may ignore large chunks of that as not being relevant to what we are doing.  A good example is Twitter – most tweets aren’t relevant because they don’t mention my organisation.

• Squeeze the data.

If we haven’t sourced the data from something like HD Insight we need to clean it up to ensure  a high quality output and we understand it’s structure -  type, cardinality etc.

• Bang.

In a jet engine we introduce fuel and get a controlled explosion in ML we apply our analysis to get results and provide momentum for organisation.  Specifically this is where we build and refine our analytical model by taking an algorithm such as one used in data mining and training it against a sample set of data. MAML has specific features to support this a special split task for carving out data for model training and an evaluation task to tell you how well your model is performing by visualising the output against an ideal (upside down L curve).

• Blow.

Having established that your model works you’ll want to unleash it on the world and share it which is analogous to the thrust or work produce by a jet engine.  In MAML we can expose the model as a web service or a data set which can be used in any number of ways to drive a decision making process.

Jet engines don’t exist in isolation typically these are attached to aircraft and it is the aircraft that controls where the engine goes.  In ML we are going to have a project to set our direction and ML might only be a small part of this in the way that the product recommendation engine on a web site like Amazon is only a small part of the customer experience.  So as with all the varying names for projects that use data to drive businesses forward we need to be aware that the analytics bit is a small part of the whole; we need to be aware of data quality, customer culture and all of the baggage that are essential to a successful implementation.  There is also one extra complication that comes from the data mining world and that is that it’s not always possible to see how the result was derived at so we have to build trust in the results particularly if they contradict our intuition and experience.

So that the good news for us experienced BI professionals is that we can apply many of the lessons we have learnt and apply them to ML and with MAML we don’t need to know too much about the data mining algorithms unless we want to.

My title for this post is a pun on the Adventure Works databases, and samples that have been in SQL Server since I can remember.  There were also some data mining examples ( as referenced in  this old post) but this has not really moved on since 2011 when I last wrote about it so you might be forgiven for thinking that data mining is dead as far as Microsoft is concerned.

However since that time two big things have happened;  the hyper-scale of cloud and the rise of social media as a business tool not just as a bit of fun to share strange pictures and meaningless chat.  Coupled together this is big data;   masses of largely useless data, being produced at a rate faster than can be downloaded in a variety of semi and unstructured formats – so Volume , Velocity and Variety.  Hidden in this data are nuggets of gold such as brand sentiment , how users navigate our web sites and what they are looking at, and patterns that we can’t immediately recognise. Up until now processing and analysing big data has really only been possible for large corporates and governments as they have the resources and expertise to do this. However as well as storing big data the cloud can also be used to make this big data analysis available to anyone who has the sense of adventure to give it a try -  all that’s needed is access to the data and an understanding of how to mine the information.  However the understanding bit of the equation is still a problem and this expertise aka data science is the bottleneck and a quick search on your favourite jobs board for jobs in this area will confirm this.

What they have always done – simplify it , commoditise it, and integrate it.  If I go back to SQL Server 2000 we had DTS to load and transform data from any source and analysis services to slice and dice it from Excel and then we got reporting services in 2002 all in one product.  In 2014 we have a complete set of tools to mash, hack and slice data into submission from any source, but these tools are no longer in SQL Server they are in the cloud specifically Azure and in Office 365.   So what are the tools?

• HDInsight which is Hadoop running as a service in Azure  where you can build a  cluster as large as you need and feed it data with all the industry standard tools you are used to (Mahout and Pig for example).
• Microsoft Azure Machine Learning (MAML) can take data from anywhere including HDInsight and do the kind of predictive analytics that data mining promised but without the need to be a data scientist yourself.  This is because the MAML studio has a raft of the best algorithms that are publicly available and is also very easy to use from IE or Chrome – actually it reminds a bit of SQL Server Integration Services which is no bad thing.

Once you have trained your experiment (as they called) you can expose this as a web service which can then be consumed on a transaction by transaction basis to score credit, advise on purchase decisions etc. within your own web sites.

• Office 365 provides the presentation layer on the tools above with access to HDInsight data and machine learning datasets from the Power BI suite of tools.

In order to play with any of these there’s two other tools you’ll need - An MSDN subscription to get free hours on Azure to try this out and to get a copy of Office 2013 for the Power BI stuff. You’ll also want to watch the Microsoft Virtual Academy for advice and guidance although at the time of writing there aren’t any courses on MAML as it’s so new.

Finally a word of warning before you start on your own adventure  - these tools can all encode a certain amount of business logic and so it’s important to understand the end to end changes you have made in building your models from source to target and to consider where and when to use which tool.  For example Power Pivot can itself do quite a of data analysis but is best used in a big data world as a front end for HDInsight or machine learning experiment.

I will be going deeper into this in subsequent posts as this stuff is not only jolly interesting it’s also a huge career opportunity for anyone who loves mucking around with data.

• VDI–Shameless plug

I have spent the last six months writing my first book, Getting Started with Windows VDI  for Packt Publishing, basically because I want to create a one stop repository of the key elements you need to do this. The basic idea I had was to only use the Microsoft stack to do this and to explore the wider aspect of VDI in the process:

• Building the VDI template using the Microsoft Deployment Toolkit (MDT) where this is normally used to create images that are deployed to physical desktops.
• Use of the Microsoft Desktop Optimization Pack (MDOP) to virtualize applications (App-V) and user settings (UE-V) so that users can get the resources they need whether they are using VDI or not.
• How to enable High availability for VDI for each of the component parts.  For example more than one the broker can be used by setting up a shared SQL Server database to store metadata about the deployment such as who is logged into which desktop and what Remote App Applications can they use.
• Using WSUS to update the deployment by integrating it with MDT to refresh the virtual desktops and also what needs to be done to patch the role services and hosts.

I have tried to be as honest as possible as I would rather maintain my integrity and stop you wasting time trying to get stuff to work which is not optimal.  A good example is the big debate of Terminal Services (which I refer to as Session Virtualization) vs a collection of Virtual Machines each providing a virtual desktop which is what most people mean by VDI.  Session Virtualization is still much more efficient than using individual VMs to provide virtual desktops  by factor of 10 or 20  so you won’t find me defending VDI. Rather I have tried to explain when to use what and that using a blend of both with a common infrastructure is the best approach as this is very easy to do in Remote Desktop Services in Windows Server 2012R2.

I also wanted to make it a practical guide but you’ll need a fairly beefy box (i7 16gb RAM) and ideally an SSD to run all the VMs if you want to try it.  I have also included as much PowerShell as possible in there, as you’ll want to automate as much of this as possible. I have also included a basic guide to licensing VDI as this is one of the main barriers to adoption.

So I hope you get as much out of it as I did in writing it and just one thing to note  - If you buy the paper edition the code samples you need are here as some of my PowerShell scripts are too long to be inline in the book and  I am old enough to remember how tedious it was copying lines of code out of a magazine to get a game to run!

• Lab Ops – PowerShell 4 and PKI

In a Remote Desktop Services lab setup involving a gateway to allow connections to our virtual desktops over the internet  we are going to need to use an SSL certificate to secure traffic between the remote devices and the gateway.  Typically in the past we would have gone into a server running IIS and made a self signed certificate in IIS Manager and while this still works in IIS8.5 (the one that ships with Windows Server 2012R2) there is better way and that is PowerShell 4 that is also included in Server 2012R2.   Why is it better? More control specifically we can add alternate names to correspond to all the DNS entries we might need in something like RDS.  We also get all the other benefits of using PowerShell like repeatability and like all ne PowerShell commands it’s easy to understand what it’s doing..

To make a certificate:

New-SelfSignedCertificate `
-CertStoreLocation cert:\LocalMachine\My `
-DnsName rds-Broker.contoso.com,rds.contoso.com,rds-gateway.com

where the new certificate is going to be stored on the local machine in the “my” folder and it has two alternate names. By the way not everyone knows this but the ` character is the line continuation character and means I can format code that’s easier for you to read.

The new certificate is not much use where it is so we can also export it and if we want the private key we’ll need to password protect it. First we need to find it and to do that we can simply browse the certificates just as we might for any other kind file on a hard drive except that for certificates we are navigating through the certs: drive not the C drive.  Get-ChildItem is the proper PowerShell command for navigating folders of any kind but it also has aliases dir ls, gci etc.  I mention this because in some IT departments it is considered best practice not to use aliases but the root PowerShell commands to provide clarity, perhaps in the same way as we don’t use the word ain’t in an English class!

\$Cert = Get-ChildItem -Path cert:\localmachine\My | `
where subject -EQ "CN=RDS-Broker.contoso.com, CN=RDS-Web.contoso.com, CN=rds.contoso.com"

Now we can setup a password and export it

\$pwd = ConvertTo-SecureString -String "Passw0rd!" -Force –AsPlainText
Export-PfxCertificate `
-cert \$cert `
-FilePath '\\rds-DC\c\$\RDS-Web self signed certificate.pfx'   `

We could then import this into our VDI deployment with PowerShell as well, but there is a simple UI in the deployment properties in Server Manager.

where we can check our certificates are installed and recognised as part of the deployment.  Of course there is proper PowerShell for this..

set-rdcertificate `
-Role RDGateway `
-ConnectionBroker RDS-Broker.contoso.com `
-ImportPath 'C:\RDS-Web self signed certificate.pfx'`

which not only puts the certificate where it need to go but put makes the RD  Broker aware of it.

Creating certificates is also going to be useful if we want to manage any evals , demos and labs we might want to work with in Azure but that is a story for another post , not least because I need spend more time in the cloud myself!

In the meantime do have a look at the PowerShell content in MVA, and if you aren’t running Server 2012R2 where you are you can install it on older operating systems back to Windows Server 2008 R2/ Windows 7 from here

• Microsoft VDA and Windows Embedded – A game changer for VDI?

If you are following me on Twitter then you may have noticed that a few of my friends are reviewing a book I am writing on VDI for PackT.  I wanted to write this up as a one shop stop on all you needed to know about implementing Microsoft based VDI using the various Remote Desktop Services server roles in Windows Server 2012R2 with Windows 8.1 as the Virtual desktop OS. You would be forgiven for thinking that the hardest part of this book would be the last chapter Licensing, which I wanted to include precisely because it is widely touted as being hard and because you do need to know this stuff to implement it.

In the course of my research and my day job I have been doing some work with 10Zig and in fact it was as a result of some events we did together that I thought that Microsoft VDI needed to be written up properly and in one place. Prior to this I thought that thin clients were pretty much of a muchness and I certainly didn’t appreciate that they came in different types or how they have come on since I joined Microsoft. So I have asked Kevin Greenway from 10Zig to explain the world of the thin client and how your choice might affect the licensing..

The ever growing debate over which flavour of Thin Client is the right choice in relation to cost, performance, user experience, peripheral support, management, etc, etc, continues to fill blogs and review sites. Recent news from Microsoft on changes to Virtual Desktop Access (VDA) licensing to Windows Embedded throws a really interesting perspective on using a Windows Embedded based Thin Client for Virtual Desktop Infrastructure (VDI).

VDA, Thin Clients and PC’s

As of today if you wanted to use a Thin Client, which includes ‘Zero’, ‘Linux’ or ‘Windows Embedded’ based Thin Clients in conjunction with Virtual Desktop Infrastructure (VDI), you are also required to use Virtual Desktop Access (VDA) licensing. VDA licensing is charged at \$100 per year for each device using this license. Couple that with the initial acquisition cost of any of the above types of Thin Clients which all require VDA licensing, the costs suddenly start to rack up and can exceed that of a traditional PC.

The VDA licensing does not apply if you use Microsoft Software Assurance (SA), which also includes some desktop virtualisation rights. Most commonly if a Windows PC is used as your endpoint for connecting to VDI and you have SA, this negates the need for VDA licensing. The downside to this is that by using a Windows PC as an endpoint, this doesn’t leverage any of the true benefits of VDI, most specifically the system administrator is now administering two desktops instead of one. There are options to alleviate this somewhat with products, such as Windows Thin PC*, but what if you want to realise the benefits of Thin Clients? These include Manageability, Ease of Deployment, Small Footprint, Power Savings, and Lifespan due to no moving parts.

*This only applies to repurposing windows OS, not OS agnostic repurposing solutions.

This is where the recent news from Microsoft comes into play.

VDA, Software Assurance and Windows Embedded Thin Clients

If you use Microsoft Software Assurance and you have a Windows Embedded Thin Client (Windows Embedded 7/Windows Embedded 8), then you can now move this into your SA, thus negating the need for VDA licensing. This means the total acquisition cost is potentially lower than equivalent PC’s and even Linux based Thin Clients or Zero Clients due to the Annual Charge of VDA.

Windows Embedded, Linux or Zero

As mentioned there is always the ongoing debate over which flavour of Thin Client is right. One of the biggest arguments against Windows Embedded based Thin Clients has traditionally been the Cost, User Experience (We’ll cover this one in a moment) and Manageability. However the biggest argument for use of Windows Embedded Thin Clients, is that typically the majority of VDI vendors including Microsoft, Citrix, and VMware will focus their efforts primarily around the compatibility, support and feature set available within the Windows version of their clients for connecting to VDI.

You typically see a greater feature set and an earlier adoption of new features within the relevant Windows clients, than you will against say Linux, Android or IOS equivalents. Additionally the ever growing list of peripherals required to either work locally within the Thin Client or be redirected into the VDI session, will typically be more pain free from a Windows based Thin Client, than they will from an equivalent Linux or Zero Client.

More recently we’ve also started to see more and more emergence of client based plugins to redirect Real Time Transport Protocols (RTP/RTVP), such as the Microsoft Lync 2013 VDI plugin, which is only available as a Windows Plugin. We also have the various Media offloading capabilities such as those within the RemoteFX, Citrix HDX and VMware PCoIP stacks, which often still result in a greater tick list of features within these stacks from a Windows Client than their Linux or Zero counterparts.

Let’s shift back to the topic of User Experience. Whilst adopting VDI one of the easiest ways to get user buy-in is ensuring that their use experience, including access and entry remains the same as it has been with a PC. You want to ensure ultimately that the user only has access to their VDI desktop, not the local operating system. This way there is no retraining involved, or risk that the user can access/break the local operating system. The end goal is that you want their access device to serve simply as a launch pad to their VDI desktop, whilst also leveraging the performance that they achieved from their local PC.

This is where Windows Embedded 8 really comes into great effect. Windows Embedded 8 includes some really great features including HORM (Hibernate Once, Resume Many) for shrinking boot speeds, as well as Unified Write Filter (UWF) for protecting Disk Volumes/Folders/Registry. Embedded Lockdown Manager is also available for Windows Embedded 8 and includes several lockdown features. Examples of features include Keyboard Filter, Gesture Filter, USB Filter and most prolifically the Shell Launcher. The Shell Launcher can be configured to provide the user with a launch pad to their desired VDI session and restricts access to the underlying local operating system.

There are several management tools available for managing Windows Embedded 8, including Microsoft based and 3rd Party tools such as those provided by 10ZiG Technology within Windows Embedded 8 based Thin Clients (5818V/6818V) and the Enterprise Free 10ZiG Manager.

[added 12/05/14 by Andrew] the official Microsoft guidance on licensing Windows 8  embedded can be found here.

Conclusion

To conclude this exciting news from Microsoft regarding the addition of Windows Embedded into Software Assurance (SA) and negating the need for Virtual Desktop Access License (VDA), thus dramatically reducing the overall acquisition cost per Thin Client device against a PC. Additionally the adoption of new and exciting features into Windows Embedded 8 heralds a potential shift in the VDI/Thin Client space towards Windows Embedded 8.

Couple it all together with HORM, UWF, ELM and Client offload techniques powered by modern multi core CPU Thin Clients such as those from 10ZiG Technology (5818V/6818V). We are now equipped with what is fast becoming a Hybrid of traditional Thin Client and PC Technology into one.

It’s an interesting space to watch in the ever evolving world of VDI and Thin Client Technology.

For more on VDI there’s some great content on the Microsoft Virtual Academy.

• Lab Ops Redux setting up Hyper-V in a hurry

I am at Manchester United as I write this and one of the delegates wanted to quickly try out Hyper-V in Windows Server 2012R2.  I thought I had an all up post on that but it turns out I don’t, so Nicholas Agbaji this is just for you!

You’ll need a latptop/desktop running Windows 7/Windows 2008R2 or later to work on that’s capable of running Hyper-V, and that the BIOS is setup for virtualization

• You’ll need to download a copy of Windows Server 2012R2 and a special PowerShell script Convert-WindowsImage.ps1 from the TechNet Gallery.
• Run the PowerShell script as follows.

.\Convert-WindowsImage.ps1 –SourcePath <Path to your iso> -Size 50GB -VHDFormat VHD –VHD “C:\WS2012R2.VHD” -Edition "ServerStandardEval"

Note: If you are running on Windows 8 or Windows Server you can use the newer VHDX format for virtual hard disks

• We now have a VHD with a sysprepped clean copy of Windows Server 2012R2 and Windows 7/2008R2 & later allows us to boot from a VHD just like the one we just made.
• To boot from VHD we need to mount the VHD. In Windows 8/2012 you can simply click on a VHD to mount it, however in Windows 7/2008R2 then we’ll need to open an elevated command prompt and do this manually:
• diskpart
• select vdisk file =”<path to your VHD>”
• attach vdisk
• We’ll now get an additional drive say drive H: and now we’ll need to edit the boot database from an elevated command prompt and add and edit a new entry to register the VHD:
• bcdboot g:
• We also need to edit the BCD to get Hyper-V to be enable in our VHD with
• bcdedit /set “{default}” hypervisorlaunchtype auto
• Optionally you could describe your new boot entry with
• bcdedit /set “{default}” description “Windows Server 2012R2 Lab Ops”
• Reboot you server/laptop and you’ll have an extra boot option to boot to windows server.
• The final step is to add in  the Hyper-V role either from Server Manager or with Powershell..

Once you have this VHD setup you can boot into your OS and backup the VHD you made and possibly reuse it on another machine.  So good luck Nicholas and thanks for spending the day with us!

• Lab Ops 19 – Software Defined Networking - where’s my stuff

Well done if you got through my last post, I bolted a lot of stuff together and it all looks a bit confusing particularly if you are moving from Hyper-V to System Center Virtual Machine Manager 2012 (VMM).  For a start where’s my network settings? When I was testing my rig I found I couldn’t swap out the network connectivity on a test VM from an internal virtual switch to my shiny new VM Network, and when I looked at the error it said I couldn’t change the MAC address of a running VM.  What was going on here was that Hyper-V had assigned a static MAC address at random to my VM and by assigning it to my new VMM infrastructure it wanted to assign a new one.  This got me to wondering about MAC address spoofing which can be important for legacy applications and in Hyper-V you can do this from the advanced features of the Network adapter on a VM. In VMM MAC address spoofing is part of a Virtual Port Profile which also allows us to set the other advance features of Networking in Hyper-V like bandwidth management guest NIC teaming as well as router and DHCP guard..

Virtual Port Profiles are just logical groupings of these things which you can assign to a logical switch and there are off the shelf ones like Guest Dynamic IP and the Host Management profile we use last time.  This might seem an unnecessary extra step but now that we live in a world of NIC teaming we need to identify the different type of traffic flowing down a fat connection.  We can also see Uplink Port Profiles in this list..

such as the RDS-FabricUplink I created in my last post, which allows us to connect a logical network and specify how that port gets teamed.  A logical switch has one or more uplink port profiles connected to it and has virtual ports to specify one or more port classifications to describe what traffic this switch can handle.  At this point we can assign the switch to as many hosts as we want and each one will inherit all these properties:

• The logical switch appears as a Virtual Switch in Hyper-V and is bound to a NIC or team on that host.  When we do that we can see it’s the Uplink Port Profile that is passed to the NIC
• Underneath that we have one or more Virtual Network adapters which are associated to a VM Network and a Virtual Port Profile.

When we attach VM to a VM network many of the properties are now greyed out (like those MAC address settings ).

Anyway I am now ready to create a couple of VM Networks for Dev and Test on which I can put identical VMs without them seeing in each other but also allow them to span hosts ..

To do this I need to create the two VM Networks and integrate them into the networking fabric, by associating them with my logical network (RDS-FabricNet) I created in my last post, and here’s the PowerShell:

Import-Module VirtualMachineManager
\$logicalNetwork = Get-SCLogicalNetwork -Name "RDS-FabricNet"

# 2 x VMNetworks each with the same subnet
\$vmNetwork1 = New-SCVMNetwork -Name "DevVMNet" -LogicalNetwork \$logicalNetwork -IsolationType "WindowsNetworkVirtualization" -Description "Developer Virtual Network" -CAIPAddressPoolType "IPV4" -PAIPAddressPoolType "IPV4"
\$subnet1 = New-SCSubnetVLan -Subnet "10.10.10.0/24"
New-SCVMSubnet -Name "DevVMNet Subnet 10" -VMNetwork \$vmNetwork1 -SubnetVLan \$subnet
\$vmNetwork2 = New-SCVMNetwork -Name "ProductionVMNet" -LogicalNetwork \$logicalNetwork -IsolationType "WindowsNetworkVirtualization" -Description "Developer Virtual Network" -CAIPAddressPoolType "IPV4" -PAIPAddressPoolType "IPV4"
\$subnet1 = New-SCSubnetVLan -Subnet "10.10.10.0/24"
New-SCVMSubnet -Name "ProductionVMNet Subnet 10" -VMNetwork \$vmNetwork2 -SubnetVLan \$subnet

At this point You might be wondering about what to do about DNS and AD in this situation as we would normally assign fixed ip addresses to these.  The answer is to start these VMs first and then they’ll get the lowest address by default where x.x.x.1 is reserved on the subnet for the switch. This is similar to Azure except that Azure hands out x.x.x.4 as the lowest address as there are three reserved addresses on a subnet.

Anyway the other thing we’ll want to do is specify the new traffic that will be carried on our virtual switch by these VM Networks and to do that we’ll add in another port profile.

\$portClassification = Get-SCPortClassification -Name "Guest Dynamic IP"
\$nativeProfile = Get-SCVirtualNetworkAdapterNativePortProfile -Name "Guest Dynamic IP"
New-SCVirtualNetworkAdapterPortProfileSet -Name "Guest Dynamic IP" -PortClassification \$portClassification -LogicalSwitch \$logicalSwitch -RunAsynchronously -VirtualNetworkAdapterNativePortProfile \$nativeProfile

Our design now looks like this..

We can then assign these VM network to new or existing VMs and it will be available on any hosts we manage in VMM provided we connect those host to our virtual switch. To do that we need a process to create VMs and to do that we need somewhere to put them so next up Storage in VMM.

• Lab Ops 18 Getting started with Software Defined Networking

In my last post I finished up where I had my host under management of Virtual Machine Manager (VMM) and that was about it. As with Hyper-V I can’t really use VMM until I have my fabric configured and after adding in hosts the first thing we need to do is look at Networking.  To recap my current setup now looks like this

Where RDS-Switch is an internal virtual switch, and my RDS-DC is my DC & DHCP server with one scope of 192.168.10.200-254.  VMM has a dynamic ip address and is also hosting SQL Server for its own database.

If I go to VMM and go to VMs & Services | All Hosts | Contoso | Orange (which is my Dell Orange laptop) I can right click and select View Networking. If I look at the Host Networks all I can see are the physical NICs, If I look at VM Networks all I see is my VMs but no networks and the Network topology screen is entirely blank so what’s going on?  Basically things are very different in VMM than they are in Hyper-V, and frankly we don’t want to use Hyper-V for managing our networks anymore than a VMWare expert would configure networks on individual ESXi hosts.  In our case we use VMM to mange virtual switches centrally where in VMWare distributed switches are controlled in VCenter.  So my plan is to use VMM to create a network topology that reflects what my VMs above are for;  to manage my datacentre.  Later on I’ll add in more networking which will enable me to isolate my services and applications from this, in the same way that Cloud providers like Azure hide the underlying infrastructure from customers.

If we look at the Fabric in VMM and expand Networking we have 8 different types of objects we can create and there is a ninth VM Networks that shows up under VMs and Services so where to begin?

Your starter for ten (or nine in this case) is the TechNet guide Configuring Networking in VMM, and once you dig into that you realise that VMM wants to control not just the switching but ip management as well.  The core to all of this are the Logical Network and Virtual Networks which are just containers for various properties including sites and ip pools.   I am going to start simple and as I only have one host just create the first object we need, a Logical Network, that has one connected network.    For now I am going to ignore the sub options to get us started.

note this is a screen grab from the end of the process

I can’t create a Logical Network without a Network Site which has a specific subnet and optionally VLAN set..

 The small printAs per usual in this series I am going to share the PowerShell to do this.  VMM is very good at allowing you to see the equivalent script when doing something,  however I have modified what VMM spits out to make it easier to read, while ensuring it still works .  This is a good thing as it’s easy to cut and paste form this post and get exactly the same results and you can see how the variables are passed and related to each other.  Note the raw PowerShell from VMM often runs everything at the as a job which is a useful trick it has, and in all cases all our work gets logged in VMM whether using the UI or the VMM cmdlets Note the segments in this post are all using the same variables so you will need to turn them in the order shown

The equivalent PowerShell is:

\$logicalNetwork = New-SCLogicalNetwork -Name "RDS-FabricNet" -LogicalNetworkDefinitionIsolation \$false -EnableNetworkVirtualization \$true -UseGRE \$true -IsPVLAN \$false
\$allSubnetVlan = New-SCSubnetVLan -Subnet "192.168.10.0/24" -VLanID 0
\$allHostGroups = Get-SCVMHostGroup -Name "All Hosts"
\$logicalNetworkDefinition =New-SCLogicalNetworkDefinition -Name "RDS-FabricNet_Site192" -LogicalNetwork \$logicalNetwork -VMHostGroup \$allHostGroups -SubnetVLan \$allSubnetVlan -RunAsynchronously

Note that in the code above a network site is referred to as a Logical Network Definition.

VMs are connected to virtual machine networks and we had the option  to create one of these when we created the logical network with the same name. In this case that would have been fine for what I am doing here, as my two VM’s are actually there to manage the hosts in much the same way as a VMWare appliance does.   so I am going to create a virtual network that is directly connected to the logical one..

\$vmNetwork = New-SCVMNetwork -Name "RDS-FabricVNet" -LogicalNetwork \$logicalNetwork -IsolationType "NoIsolation"

However this and the logical network and site are just a containers in which we put our settings as points of management.  We now need to create an uplink port profile from Fabric | Networking | Port Profiles.  This needs to be an Uplink Port profile, and when we select that option we can describe how the underlying NIC can be teamed directly from here rather than doing that it in Server Manager on each host.  We then simply select our Network site (RDS-FabricNet_Site192) and we are done..

The one line of PowerShell for this is..

The next piece of the puzzle is to create a Logical Switch.  This is a logical container that emulates  a top of Rack switch in a real server room.  It can have a number of virtual ports but unlike VMWare these are limited by numbers but are there to manage traffic through the use of port classifications.  We’ll need at least one of these and I am going for Host management for the port classification as that is what all of this is for..

The PowerShell is:

\$virtualSwitchExtensions = Get-SCVirtualSwitchExtension -Name "Microsoft Windows Filtering Platform"
\$logicalSwitch = New-SCLogicalSwitch -Name "RDS_FabricSwitch" -Description "" -EnableSriov \$false -SwitchUplinkMode "NoTeam" -VirtualSwitchExtensions \$virtualSwitchExtensions

We should also create a Virtual port with the  host management port classification:

The PowerShell is..

\$portClassification = Get-SCPortClassification -Name "Host management"
\$nativeProfile = Get-SCVirtualNetworkAdapterNativePortProfile -Name "Host management"

We can now apply this logical switch is then applied to our hosts by going to its properties and navigating to Hardware | Virtual Switches and adding a new Virtual Switch | New Logical Switch.  Immediately our RDS-FabricSwitch is selected and we can see that our adapter (physical NIC) is connected to the Uplink we have created through this switch.

However that is just like using virtual switches in Hyper-V manager what we also need to do is to add in a Virtual Network Adapter as in the diagram above.  This picks up the VM Network we already created (RDS-FabricVNet). Notice I can have all kinds of fun with ip addresses here..

BTW I should have set the Port Profile to the only option available, Host Management, in the above screen shot. If I look at Hardware | Network adapters I can also see my logical network and site..

The equivalent PowerShell to connect the logical switch virtual network adapter to the host is ..

#My Host is called Orange

\$vmHost = Get-SCVMHost -ComputerName Orange

#Note you’ll need to at least change the Get-SCVMHostNetworkAadpter –Name to reflect the NIC in your host.

Now we can finally see what on earth we have been doing as this Logical switch we have created is now visible in Hyper-V Manager..

and if we look at it’s extensions we can see a new Microsoft VMM DHCPv4 Server Switch Extension in here which allows to control all the virtual switches from VMM.

The tricky part now is to add the VMs to the virtual network. This isn’t tricky because it’s hard it’s tricky because if VMM and the DC loose sight of each other or VMM can’t see the host then we are in trouble so as we can’t easily change these settings in the UI or PowerShell plus we’ll need to flush DNS as DHCP will kick in and change things as well.  However what we are doing is moving VMs that are essentially part of the data centre fabric.  Other VMs would not be affected like this indeed that’s the point we should be able to move VMs together across hosts and datacentres without affecting their connectivity.

Here is a diagram of what we have created to contrast with what was above.

This has been quite a lot of work to achieve very little, but we now have the foundations in place to quickly add in more networks and more hosts and to isolate and manage our networking without needing to use VLANs.  However if there are already VLANs in existence then all of this will work just fine as well (for more on that check this post from the VMM engineering team).

Now I have a basic network for VMM and the hosts I need to do some of this again for the VMs that will serve out applications.  Until next time have a go at this stuff read the stuff on TechNet and create a lab like this to get your head around it.

• Lab Ops part 17 - Getting Started with Virtual Machine Manager

In my last post I talked a lot about the way all of System Center  works, now I want to look at  one part of it Virtual Machine Manager (VMM) as VMs in a modern data centre are at the core of the services we provide to our users. In the good old days we managed real servers that had real disks connected to real networks and of course they still exist, but consumers of data centre resources whether that other parts of the IT department or business units will only see virtual machines with virtual disks connected to virtual networks.  So administration in VMM is all about translating all the real stuff (or fabric as it’s referred to) into virtual resources. So before we do anything in VMM we need to configure the fabric, specifically our physical hosts, networking and storage.

There’s good TechNet labs and MVA courses to learn this stuff, but I think it’s still good to do some of this on your own server, so you can come back it again whenever you want to especially if you are serious about getting certified.  So what I am going to do in the next few posts is to explain how to use the minimum of kit to try some of this at home.  I generally use laptops which are sort of portable typically with 500Gb plus of SSD and at least 16Gb or RAM half of those resources should be enough.

I am going to assume you have got an initial setup in place:

• Copy the above VHDX to the root of a suitable disk on your host (say e:\) and mount it(so it now shows up as a drive for example X:)
• from an elevated prompt type BCDBoot X:\windows
• Type BCDEdit /set “{default}” hypervisorlaunchtype auto
• Type BCDEdit /set description “Hyper-V Rocks!”
• reboot the host and select the top boot option which should say Hyper-V Rocks!
• from PowerShell type add-windowsfeatures Hyper-V –includemanagementtools –restart
• the machine should restart and you are good to go.
• A domain controller as per my earlier post on this, with the host and the VMM VM belonging to this domain.

The first thing you want to do in VMM is to configure Run As Accounts.  One of the reasons my PowerShell scripts in this series are not production ready is that they have my domain admin password littered all over them which is not good.  VMM allows us to create accounts used to authenticate against anything we might need to  which could be a domain account a local account on a VM (be that Windows or Linux), access to a switch or a SAN.  So lets start by adding in domain admin. We can do this from settings | Security | Run As Accounts or with the VMM PowerShell cmdlets and not all I have to do is open PowerShell from inside VMM or use PowerShell ISE and either way there’s no more of that mucking about importing modules to do this..

#New Run As Account in VMM
\$credential = Get-Credential
\$runAsAccount = New-SCRunAsAccount -Credential \$credential -Name "Domain Admin" -Description "" -JobGroup "cb839483-39eb-45e0-9bc9-7f482488b2d1"

Note this will popup the credential screen for me to complete (contoso\administrator is what I put in). The jobGroup at the end puts this activity into a group that ends up in the VMM job history so even if we use Powershell in VMM our work is recorded which a good thing.  We can get at that job history with Get-SCJob | out-gridview

We’ll probably want to do the same for a local account so just Administrator and a password so that any VM’s we create will have the same local admin account & Note this will popup the credential screen for me to complete.

Now we can consume that domain login account to add manage our host.  In VMM we would do this from Fabric | Servers | all Hosts and before we add in a server we can create Host groups to manage lots of hosts ( I have one already called Contoso). To add a host right click on all hosts or a host group and select add Hyper-V Hosts or Clusters and the equivalent PowerShell is ..

<#Note in the raw PowerShell from VMM there  is a load of IDs included but this will work as long as we don’t use duplicate names in accounts etc.  #>

\$runAsAccount = Get-SCRunAsAccount -Name "Domain Admin"

\$hostGroup = Get-SCVMHostGroup -Name "Contoso"
Add-SCVMHost -ComputerName "clockwork.contoso.com" -RunAsynchronously -VMHostGroup \$hostGroup -Credential \$runAsAccount

In the background the VMM agent has been installed on our host and it has been associated with this instance of VMM.  You should now see the host in VMM and against all the VMMs that are on it so not bad for a line of PowerShell! We can also see the properties of our host by right clicking on it, and of special interest are the virtual switches  on the host and if you have used my script  with modifying it you’ll see a switch called RDS-Switch on the host.  We can also see the local storage attached to our host here.

So now we have a basic VMM environment we can play with, a host a DC and VMM itself so what do we need to do next.  If this was VCenter we would probably want to setup our virtual switches and port groups so let’s look at the slightly scary but powerful world of virtual networks next.

• Virtual Machine Manager 2012R2 Templates cubed

Up until now in my Lab Ops series I have been using bits of PowerShell to create VM’s in a known state, however this requires a certain amount of editing and I would have to write a lot more code to automate it and properly log what is going on. Plus I would probably want a database to store variables in and track those logs and some sort of reporting to see what I have done.  That’s pretty much how Virtual Machine Manager (VMM) works anyway so rather than recreate it in PowerShell I’ll just use the tool instead.  VMM not only manages VMs it also manages the network and storage used by those VMs. However before we get to that we need to create some VM’s to play with and before we can do that we need to understand how templates are used.  It’s actually similar to what I have been doing already – using a sysprepped copy of the OS and then configuring that for a particular function (file server, web server etc.) and building a VM around it.  It’s possible just to use the Windows Server 212R2 evaluation iso and get straight on and build a template and from there a VM.  However VMM also has the concept of profiles which are sort of templates used to build the templates themselves. There are profiles for the Application , the Capability (which Hypervisor to use) the Guest OS and the Hardware. Only the hardware profile will look familiar if you have been using Hyper-V Manager as this has all the VM setting in.  The idea behind profiles is that when you create a VM template you can simply select a profile rather than filling in all the settings on a given VM template tab and in so doing you are setting up an inheritance to that profile. However the setting in Application and Guest OS profiles are only relevant when creating a Service Template. So what are those and why all this complexity when I can create a VM in a few minutes and a bit of PowerShell?

For me Service Templates are the key what VMM is all about. If you are VMware expert they are  a more sophisticated version of Resource Groups and before you comment below please bear me out.  A service template completely describes a service and each of its tiers as one entity..

a sample two-tier service

If I take something like SharePoint,  there are the Web Front Ends (WFE) the Service itself (the middle tier) and back end databases which should be on some sort of SQL Server cluster.  The Service Template allows us to define maximum and minimum limits for each tier in this service and to declare upgrade domains which will enable you to update the service while it is running by only taking parts of it off line at a time.  The upper and lower VM limits on each tier enable you to scale up and down the service based on demand or a service request, by using other parts of System Center.    There might well be away to do this sort of thing with Resource Groups and PowerCLI in VCenter, but then there are those application and hardware profiles I mentioned earlier.  They mean that I can actually deploy the fully working SharePoint environment  from a template including having a working SQL Server guest cluster where the shared storage for that cluster is on a shared VHDX file.

Services created by these templates can then be assigned to clouds, which are nothing more than logical groupings of compute, networking and storage splashed across a given set of hosts, switches and storage providers and assigned to a given group of users who have delegated control of that resource within set limits.

So templates might seem to be piled on top of one another here, but you don’t have to use all of this capability if you don’t want to. However if you do have a datacentre (the internal Microsoft definition of a datacentre has more than 60,000 physical servers) then this power is there if you need it.

If you haven’t a spare server and a VMM lab setup then you can just jump into the relevant TechNet Lab and see how this works.

• System Center: Use the right tool for the right job

This post isn’t really about Lab Ops as it’s more theory than Ops, but before I dive in to the world of what you can do with System Center I wanted to stress one important concept:

Use the right tool for the right job.

That old saying that when you have a hammer everything looks like a nail can harm your perception of what SC is all about.  Perhaps the biggest issue here is simply to have a good handle on the suite, which is not easy as traditionally many of us will have been an expert in just one or two components (or whole products as was). So here’s how I think about this..

• Virtual Machine Manager controls the fabric of the modern virtualized data centre and allows us to provision services on top of that
• App Controller is there to allow us to control services in Azure as well as what we have in our datacentre
• Configuration Manager allows us to manage our users, the devices they use and the applications they have. It can also manage our servers but actually in the world of the cloud this is better done in VMM
• Then it’s important to understand what’s going on with our services and that’s Operations Manager.
• Rather than sit there and watch Operations Manager all day, we need to have an automated response when certain things happen and that’s what Orchestrator is for.
• In an ITIL service delivery world we want change to happen in a controlled and audited manner whether that’s change need to fix things or change because somebody has asked for something.  That’s what Service Manager is for and so if something is picked up by Operations Manager that we need to respond to this would be raised as an incident in Service Manager which in turn would automatically remediate the problem by calling a process in Orchestrator which might do something in Virtual Machine Manager for example.

The reason that SC is not fully configured and integrated out of the box is simply down to history and honesty. Historically SC was a bunch of different products which are becoming more and more integrated.  Honesty comes from the realisation that in the real world, many organisations have made significant investments in infrastructure and its management which are not Microsoft based.  For example if your helpdesk isn’t based on Service Manager then the other parts of SVC can still to large extent integrate with what you do have, and if you aren’t using Windows for Virtualization or your guest OS then SC can still do a good job of managing VMs, and letting you know that the services on those servers are OK or not as the case maybe.

Another important principle in SC is that it’s very important not to go behind the back of SC and use tools like Server Manager and raw PowerShell to change your infrastructure (or fabric as it’s referred to in SC).  This is important for two reasons, you are wasting your investment in SC and you have lost a key aspect of its capabilities such as it’s audit function.  Notice I used the term “raw PowerShell”; what I mean here is that SC itself has a lot of PowerShell cmdlets of its own however these are making calls to SC itself and so if I create a new VM with a Virtual Machine Manager (VMM) PowerShell cmdlet then the event will be logged.

There’s another key concept in SCV and that is “run as” accounts so whther I am delegating a control to user by giving them limited access to an SC console or I am using SC’s PowerShell cmdlets, I can reference a run as account to manage or change something without exposing the actual credentials need to do that to the user or in my script.

Frankly my PowerShell is not production ready, some of it is deliberate in that I don’t clutter my code with too much error trapping and some is that I am just not that much of an expert in things like remote sessions and logging.  The point is that if you are using SC for any serious automation you should use Orchestrator for all sorts of reasons:

• Orchestrator is easy, I haven’t and won’t post an update on getting started with Orchestrator because it hasn’t really changed since I did this
• It’s very lightweight  - it doesn’t need a lot of compute resources to run
• You can configure it for HA so that your jobs will run when they are supposed to which is hard with raw PowerShell.
• You can include PowerShell scripts in the processes (run books) that you design for things that Orchestrator can’t do
• There are loads of integration packs to connect to other resources and the these are setup is with configurations which have the credentials in to those other services so they won’t be visible in the run book itself.
• you have already bought it when you bought SC!

Another thing about SC generally is that there is some overlap, I discussed this a bit in my last post with respect to reporting and it crops up in other areas too.  In VMM I can configure a bare metal deployment of a new physical host to run my VMs on, but I can also do server provisioning in Configuration Manager so which should I use? That depends on the culture of your IT department and  whether you have both in production.  On the one hand a datacentre admin should be able to provision new fabric as the demand for virtualization grows on the other hand all servers be they physical or virtual should be in some sort of desired state and CM does a great job of that.  It all comes back to responsibility and control if you are in control you are responsible so you need to have the right tools for your role.

So use the right tool for the right job, and after all this theory we’ll look at what the job is by using SC in future posts

• Lab Ops– 16 System Center setup

This post isn’t going to tell you how to install System Center screen by screen as there are some 434 of these to do a complete install and configure.  That’s a lot of clicking with a lot of opportunity for mistakes and while I realise that not everyone needs to tear down and reset everything surely there must be a better way to try it out?

There is but it involves some pretty intense PowerShell scripts  and accompanying xml  configuration files collectively known as the PowerShell Deployment Toolkit (PDT) is on the Tech Net Gallery. It works from scratch  - it will pull down all the installs and prereqs you need, install the components across the servers you define complete with SQL Server and do all of the integration work as well.  There is a full set of instructions here on how to edit the xml configuration files (the PowerShell doesn’t need to change at all) so I am not going to repeat those here.

What I do want to do is to discuss the design considerations for deploying System Center 2012R2 (SC2012R2)  in a lab for evaluation, before I go on to showing some cool stuff in following posts.

SC2012R2 Rules of the game:

Most parts of the SC2012R2 suite are pretty heavyweight applications, and will benefit from being on separate servers and all of SC2012R2 is designed to be run virtually; just as today you might be running VCenter in a VM. Note that Virtual Machine Manager (VMM) is quite happy on a VM managing the host the VM is running on.

Operations Manager, Service Manager and the Service Manager Data Warehouse cannot be on the same VM or server and even the Operations Manager agent won’t install onto a server running any part of Service Manager.  I would recommend  keeping VMM away from these components as well from a performance perspective.

The lighter weight parts of the suite are Orchestrator and App Controller both of which could for example be collocated with VMM which is what I do.

All of the SC2012R2 components make use of SQL Server for one or more databases.  In evaluation land we can get SQL Server for 180 days just as SC2012R2 is good for 180 days but the question is where to put the databases, alongside the relevant component or centrally.  My American cousins used to put all the databases on the DC in a lab as both of these are needed all the time, however we generally run our labs on self contained VMs each with it’s own local database.

Speaking of Domains I tend to have a domain for my hosts and the System Center infrastructure, and I do on occasion create tenant domains in VMM to show the hosting and multi-tenancy.  The stuff that’s managed by System Center doesn’t have to be in the same domain and may not be capable of joining a domain such as Linux VM’s , switches , SANs but we will need various run as accounts to access that infrastructure with community strings and ssh credentials.

Best Practice for production.  The real change for deploying System Center in production is all about high availability.  Given that System Center is based on JBOD (just a bunch of databases) what needs protecting are the databases and the certificates associated with them so that if a VM running VMM is lost we can simple create a new VM add in VMM and point it to our VMM database.  The System Center databases are best protected with Availability Groups and while I realise that is only available in SQL Server Enterprise edition it doesn’t itself rely on shared storage. Availability groups replicate the data from server to server in a cluster and although clustering is used the  databases can be on direct attached storage on each node.  There is some special info on how to use this with System Center on TechNet which will also apply to  Service Manager as well.

That leads me onto my next point about production – there are a lot of databases in System Center  and some of those are datamarts/data warehouses and actually only one of those could arguably be called a data warehouse and that’s the one in Service Manager.  Why? well if you are using Service Manager you don’t need the others as it should for the central reporting aka (Configuration Management DB) CMDB.  So if you have another help desk tool and that is properly integrated into System Center then that’s where you should go for your reporting. If none of the above then you’ll have to dip in and out of the components and tools you have to join the dots (I feel another post coming on about this).

and finally..

I have the capacity to run an extra VM which runs Windows 8.1 plus the Remote Server Administration Tools (RSAT) ,SQL Server Management Studio and all of the SC2012R2 management consoles on it. This means I don’t have to jump from VM to VM to show how things work.  Plus in the process or installing all of those tools in one place I have access to all of the PowerShell cmdlets associated with Server Management, SQL Server and all of System Center.  So now I can write scripts form one place to get stuff done right across my datacentre or carry on filling in dialog boxes.

• Lab Ops part 15 - Combining MDT with Windows Update Services

If we are going to deploy VDI to our users we are going to still have some of the same challenges as we would have if we still managed their laptops directly.  Perhaps the most important of these is keeping VDI up to date with patches.  What I want to do in this post is show who we can integrate Windows Update Services(WSUS) with MDT to achieve this:

• Set up WSUS
• Connect it to MDT
• Approve patches
• recreate the Virtual Desktop Template with the script I created in Part `12 of this series
• Use one line of PowerShell to recreate my pooled VDI collection based on the new VDT.

Some notes before I begin:

• All of this is easier in Configuration Manager but the same principles apply plus I can do a better job of automating an monitoring this process with Orchestrator.  I am doing it this way to show the principles.
• I am using my RDS-Ops VM to deploy WSUS on as it’s running Windows Server 2012R2 and I have a separate volume on this VM (E:) with the deduplication feature enabled, which as well being home to my deployment share can also be the place where WSUS can efficiently keep its updates.  It’s also quite logical, we normally keep our deployments and updates away from production and then have a change control process to approve and apply updates once we have done our testing.
• RDS-Ops is connected to the internet already as I have configured the Routing and Remote Access (RRAS) role for network address translation (NAT)

Installing & Configuring WSUS

WSUS is now a role inside Windows Server 2012 & later and on my RS-Ops VM I already have a SQL Server installation so I can use that for WSUS as well.  The WSUS team have not fully embraced PowerShell (I will tell on them!) so although I I was able to capture the settings I wanted and save those off to an xml file when I added in the Roles and Features I also needed to run something like this after the feature is installed..

(the Scripting Guy blog has more on this here)

Now I need to configure WSUS for the updates I want and there isn’t enough out of the box PowerShell for that -I found I could set the synchronization to Microsoft Update with Set-WsusServerSynchronization -SyncFromMU, but there’s no equivalent Get-WsusServerSynchronization command, plus I couldn’t easily see how to set which languages I wanted only products and classification (whether the update is a driver, an update service pack, etc.) so unless you are also a .net expert with time on your hands and I am not you will need to set most everything form the initial wizard and hope for better PowerShell in future. In the meantime rather than pad this post out with Screengrabs I’ll refer you to the WSUS TechNet Documentation on what to configure and explain what I selected..

• Upstream Server Synchronize. Set to  Microsoft Update
• Specify Proxy Server. None
• Languages. English
• Products. I decided that all I wanted to do for now was to ensure I had updates for just Windows 8.1 & Windows Server 2-012R2 and SQL Server 2012 (my lab has no old stuff in it).  This would mean I would have the updates I needed to patch my lab setup and my Virtual Desktop Template via MDT.
• Classifications. I selected everything but drivers (I am mainly running VMs so the drivers are synthetic and part of the OS)
• Synch Schedule. Set to daily automatic updates

I ran the  initial synchronize process to kick things off and then had a look at  what sort of PowerShell I could use and I got a bit stuck.

I then looked at creating something like an automatic approval rule as you can see here..

only in PowerShell and came up with this ..

Get-WsusUpdate | where classification -in ("Critical Update", "Security Updates") | Approve-WsusUpdate -Action Install -TargetGroupName "All Computers" # chuck in -whatif to test this

which I could run behind my schedules update. Anyway I have now set some updates as approved so I can now turn my attention to MDT and see how to get those updates into my Deployment once they have actually downloaded onto my RDS-Ops Server. BTW I got a message to download the Microsoft Report Viewer 2008 sp1 Redistributable package on the way.

Top Tip: If the MDT stuff below doesn’t work check that WSUS is working by updating group policy on a VM to point to it.  Open GPEdit.msc expand Computer Configuration -> Administrative Templates -> Windows Components -> Windows Updates and set the Specify Intranet Microsoft update service location to http://<WSUS server>:8530 in my case http://RDS-Ops:8530

If I now go into the MDT Deployment Workbench on my RDS-Ops VM I can edit my Task Sequence and as with with my last post on installing applications it’s in  State Restore node that my Updates get referenced..

Note there are two places where updates can be applied bot pre and post an application install and both of these are disabled by default. The post application install would be good if you had updates in WSUS that applied to applications not just the OS as I have just set up. The application updates could then be added on top of the base application install.  This is a nice touch but how does MDT “know” where to get the updates from?  We can’t really set anything in WSUS itself or apply any group policy because the machines aren’t built yet.  The answer is to add one more setting into the rule for the Deployment Share aka CustomSettings.ini WSUSServer=http://<WSUS Server>:8530 as I left the default port as is when I setup WSUS ..

[Settings]
Priority=Default
Properties=MyCustomProperty

[Default]
DeploymentType=NEWCOMPUTER
OSInstall=YES
SkipProductKey=YES
SkipComputerBackup=YES
SkipBitLocker=YES
EventService=http://RDS-Ops:9800
SkipBDDWelcome=YES
WSUSServer=http://RDS-Ops:8530

SkipCapture=YES
DoCapture=SYSPREP
FinishAction=SHUTDOWN

SkipComputerName=YES
SkipDomainMembership=YES

SkipLocaleSelection=YES
KeyboardLocale=en-US
UserLocale=en-US
UILanguage=en-US

SkipPackageDisplay=YES
SkipSummary=YES
SkipFinalSummary=NO
SkipTimeZone=YES
TimeZoneName=Central Standard Time

SkipUserData=Yes
SkipApplications=Yes
Applications001 ={ec8fcd8e-ec1e-45d8-a3d5-613be5770b14}

As I said in my last post you might want to disable skipping the final summary screen ( SkipFinalSummary=No) to check it’s all working (also don’t forget to update the Deployment Share each time you do a test) and if I do that and then go into Windows Update on my Reference Computer I can see my updates..

So to sum up I know have MDT setup to create a new deployment which includes any patches from my Update Server, and a sample application (Foxit Reader). so I can keep my VDI collections up to date by doing the following

1. Approve any updates that have come in to WSUS since I last looked at it OR  Auto approve those I want by product or classification with PowerShell
2. Add in any new applications I want in the Deployment Workbench in MDT.
3. Automatically build a VM from this deployment with the script in part 13 of this series which will sysprep and shutdown at the end of the task sequence.
4. Either create  a new collection with New-RDVirtualDesktopCollection or update an existing collection with Update-RDVirtualDesktopCollection where the VM I just created is the Virtual Desktop Template.

Obviously this would look a little nicer in Configuration Manager 2012R2 and I could use Orchestrator and other parts of System Center to sharpen this up but what this gives us is one approach to maintaining VDI which I hope you’ll have found useful.

• Lab Ops - Part 14 Putting Applications into the VDI Template with MDT

In my last post I created a process to quickly create what MDT refers to as a Reference Computer and directly boot this from the LiteTouchPE_x64.iso in the deployment share.  This was then shutdown and sysprepped so that I could use it to deploy VDI collections in Remote Desktop Services.   Now I want to add in a simple example of an application deployment along with the OS. Simple because I am only interested in the mechanics of the process in MDT rather than getting into any specifics about what is needed to install a particular application e.g. command line switches, dependencies and licensing.  If you are going to do this for real create I wouod recommend creating a new VM and test the installation process from the command line and tune the switches etc. for that application accordingly before doing any work in MDT.  There is already lots of help out there on the form for the application you are interested in.

I am going to use Foxit Enterprise Reader for this example , as it’s not a Microsoft application and it’s a traditional application in that it’s not what MDT calls a packaged application designed for the modern interface in Windows 8.1. It’s also free (although you’ll have to register to get the Enterprise Reader as an msi) and I actually use it anyway to read PDF documents. All this is is actually pretty easy to do but I got caught out a few times and  wading through the huge amount of MDT documentation can be time consuming so I hope you’ll find this useful. My steps will be:

• Import the Foxit application into the Deployment Share in  the MDT Deployment Workbench
• Modify the Task Sequence to deploy Foxit
• amend the Rules (aka ControlSettings.ini) of the Deployment Share properties to install Foxit without any intervention on my part.

Import the Application.

To import an application in MDT all you need to do is navigate to the Applications folder in the Deployment share right click and select Import Application. However I thought it would be good to create folders for each software publisher so I created a Foxit folder underneath Applications and then I did right click -> Imported Application from there. This looked OK in the Deployment Workbench but actually there’s no actual folder created on the Deployment Share.  This is by design and if you want to create your own physical folder structure then you should store them on a share you control and point MDT to the applications ton that share than importing them which is the Application without  source files or elsewhere on the network option in the Import Application Wizard.

Next I found that I couldn’t import a file only a folder  which I guess is typical for many applications so I stored the Foxit msi in its own folder before importing it.

The next thing that caught me out was the Command details.  It’s pretty easy to install an msi for Foxit this would be msiexec /i EnterpriseFoxitReader612.1224_enu.msi /quiet. However the Working directory entry confused me because MDT has the application now so surely I could just leave this empty? Well no and this not a problem with MDT rather it’s because of the way I am using it. Anyway I set the Working Directory to the UNC path of Foxit Reader folder (\\RDS-OPS\DeploymentShare\$\Applications\Foxit Enterprise Reader in my case) and that worked.

I just used a standard Task Sequence template in my last post which already has a step in it to install an application, but where is it?    The answer turns out to be that it’s inside the State Restore folder ..

Anyway I changed the settings here to reference Foxit and all is well.

Configure the Rules

I didn’t think I need to make any changes to the rules (in my last post) as my deployment was already fully automated so I was surprised to be presented with a popup asking me to confirm which application I wanted install when I first tested this. So I needed to add in two more settings one to skip which application to install and another to actually select it.  However the Rules identify applications by GUID not by name, so I had to get the GUID from the general tab of the Application properties and enter it like this..

SkipApplications=Yes

Applications001 ={ec8fcd8e-ec1e-45d8-a3d5-613be5770b14}

Your GUID will be different and if you want more than one application then you would add more afterwards (Application002 = , Application003  etc).

I also set  SkipFinallSummary=No at this point as I wanted to see if everything was working before the VM switched off.

Summary

MDT also has the ability to deploy bundles of applications and before you ask me you’ll need to do something completely different for Office 365 and Office 213 Pro Plus and my recommendation for a simple life would be to use Application Virtualization aka App-V.  This is included in MDOP (Microsoft Desktop Optimization Pack) and is one  of the benefits of having Software Assurance.  That’s a topic for another day based on feedback and my doing a bot more research.  Next up the exciting world of patching VDI

• Lab Ops Part 13 - MDT 2013 for VDI

The core of any VDI  deployment is the Virtual Desktop Template (VDT) which is the blueprint from which all the virtual desktop VMs are created.  It occurred to me that there must be a way to create and maintain this using the deployment tools used to create real desktops rather than the way I hack the Windows 8.1 Enterprise Evaluation iso currently with this PowerShell  ..

\$VMName = "RDS-VDITemplate"
\$VMSwitch = "RDS-Switch"
\$WorkingDir = "E:\Temp VM Store\"
\$VMPath = \$WorkingDir + \$VMName
\$SysPrepVHDX = \$WorkingDir + \$VMName +"\RDS-VDITemplate.VHDX"

# Create the VHD from the Installation iso using the Microsoft Convert windows image script
md \$VMPath
cd (\$WorkingDir + "resources")
.\Convert-WindowsImage.ps1 -SourcePath (\$WorkingDir +"Resources\9600.16384.WINBLUE_RTM.130821-1623_X64FRE_ENTERPRISE_EVAL_EN-US-IRM_CENA_X64FREE_EN-US_DV5.ISO") -Size 100GB -VHDFormat VHDX -VHD \$SysPrepVHDX -Edition "Enterprise"
#Create the VM itself
New-VM –Name \$VMName  –VHDPath \$SysPrepVHDX -SwitchName \$VMSwitch -Path \$VMPath -Generation 1 -BootDevice IDE

# Tune these setting as you need to
Set-VM -Name \$VMName –MemoryStartupBytes   1024Mb
Set-VM -Name \$VMName -DynamicMemory
Set-VM -Name \$VMName -MemoryMinimumBytes   512Mb
Set-VM -Name \$VMName -AutomaticStartAction StartIfRunning
Set-Vm -Name \$VMName -AutomaticStopAction  ShutDown
Set-Vm -Name \$VMName -ProcessorCount       2

So how does a deployment guy like Simon create Windows8.1 desktops - he uses the Microsoft Deployment Toolkit 2013 (MDT) and the Windows Assessment and Deployment Toolkit 8.1  (ADK) that it’s based on.  So I created another VM RDS-Ops with these tools on and started to learn how to do deployment.   I know that when I create a collection with the wizard or with PowerShell (e.g. New-VirtualDesktopCollection) I can specify an unattend.xml file to use as part of the process. The ADK allows you to do this directly but I am going to  build a better mousetrap in MDT and because I want to go on to deploy Group Policy Packs, updates and applications which I know I can do in MDT as well.

If you have used MDT please look away now as this isn’t may day job,.However there doesn’t seem to be any posts or articles on creating a VDT from either ADK,  MDT or even System Center Configuration Manager so I am going to try and fill that gap here

I wanted to install MDT onto a VM running Windows Server 2012R2 with 2x VHDXs the second one being for my deployment share so I could deduplicate the iso and wim files that will be stored here. I then installed the ADK  which needs to be done twice -  the initial ADK download is only tiny because it pulls the rest of the installation files as part of the setup so I first ran adksetup /layout <Path> on an internet connected laptop and then copied the install across to the VM (along with MDT) and then ran..

adksetup.exe /quiet /installpath <the path specified in the layout option> /features OptionId.DeploymentTools OptionId.WindowsPreinstallationEnvironment OptionId.UserStateMigrationTool'

before installing MDT with:

MicrosoftDeploymentToolkit2013_x64.msi /Quiet.

Now I am ready to start to learn or demo MDT to build my template based on the Quick Start Guide for Lite Touch Installation included in the MDT documentation. which goes like this:

• On the machine running MDT Create a Deployment Share
• Import an OS - I used the Windows 8.1 Enterprise Eval iso for this by mounting the iso on the VM and importing from that.
• Add in drivers packages and applications - I will do this in a later post
• Create a task sequence to deploy the imported image to a Reference Computer.
• Update the Deployment Share which builds a special image (in both wim and iso formats)
• Deploy all that to a Reference Computer and start it
• The deployment wizard that runs on the Reference Computer when it comes out of sysprep allows you to capture an image of it back into MDT.
• Capture that image form the Reference Computer
• Create a task sequence to deploy that captured image to the Target computers
• Update the Deployment Share again with the captured image in and optionally hook it up to Windows Deployment Services and you are now ready to deploy your custom image to your users’ desktops.

However I deviated from this in two ways:

1. Creating the Reference Computer:

All I needed to do here was to create a VM (RDS-Ref) based on the iso created by the deployment share update process..

\$VMName = "RDS-Ref"
\$VMSwitch = "RDS-Switch"
\$WorkingDir = "E:\Temp VM Store\"
\$VMPath = \$WorkingDir + \$VMName
\$VHDXPath = \$WorkingDir + \$VMName +"\" + \$VMName +".VHDX"

# Housekeeping 1. delete the VM from Hyper-V
\$vmlist = get-vm | where vmname -in \$vmname
\$vmlist | where state -eq "saved" | Remove-VM -Verbose -Force
\$vmlist | where state -eq "off" | Remove-VM -Verbose -Force
\$vmlist | where state -eq "running" | stop-vm -verbose -force -Passthru | Remove-VM -verbose -force
#House keeping 2. get back the storage
If (Test-Path \$VMPath) {Remove-Item \$VMPath  -Recurse}
# Create a new VHD
md \$VMPath
new-VHD -Path \$VHDXPath -Dynamic -SizeBytes 30Gb

#Create the VM itself
New-VM –Name \$VMName  –VHDPath \$VHDXPath -SwitchName \$VMSwitch -Path \$VMPath -Generation 1

#Attach iso in the deployment share to build the Reference Computer from the MDT VM (RDS-OPs)
Set-VMDvdDrive -VMName \$VMName -Path '\\rds-ops\DeploymentShare\$\Boot\LiteTouchPE_x64.iso'
Start-VM -Name \$VMname

Once this VM comes out of sysprep it will launch the Deployment Wizard on the Reference Computer.  I designed the script to be run again and again until I get it right which was good because I kept making mistakes as I refined it.  The documentation is pretty good but I also referred to the excellent posts by Mitch Tulloch on MDT especially part 7 on automating Lite Touch by editing the INI files scenario above on the Deployment Share properties described below.

2. Completing the Deployment Wizard on the Reference Computer

In the Lite Touch scenario  the Reference Computer is captured back into MDT and used to deploy to target computers usually by using the Windows Deployment Services role in Windows Server directly or via Configuration Manager. In VDI the target computers are VMs and their deployment is handled by the RDS Broker either in Server Manager or with the Remote Desktop Powershell commands like New-VirtualDesktopCollection.  Whichever way I create VDI collections all I need is that virtual desktop template and in this case that’s just the Reference Computer but it needs to be turned off and in a sysprepped state.  The good news is that the Deployment Wizard in MDT 2013 has exactly this option so I can select that and when it’s complete all I need to do is to remember to eject the iso with the Lite Touch pre execution installation on (or that will be inherited by all the virtual desktops!).

Automation

If you are with me so far you can see we have the makings of something quite useful even in production.   What I need to do now is automate this so that my Reference Computer will start install and configure the OS based on my Deployment Share and then sysprep and shutdwon without any user intervention. To do that I need to modify the bootstrap.ini file that launches the deployment wizard (from the Deployment Share properties go to the rules tab and select edit Bootsrap.ini)..

SkipBDDWelcome=YES.

to tell the wizard where my deployment share is and how to connect to it, and then suppress the welcome screen. Then I need to modify the rules themselves (Control Setting.ini) so that the wizard uses my task sequence, hides all the settings screens and supplies the answers to those setting directly..

SkipUserData=Yes

Note the bits of this in bold;

• Event Service enables monitoring which is very useful as all the wizard screens won’t show up the way I have this set now!.
• MDT2012 and later allow you to sysprep and shutdown a machine which is just what I need to create my Virtual Desktop Template.

So what’s really useful here is that when I change my deployment share to add in applications and packages, modify my Task Sequence or the INI settings above, all I need to do to test the result each time is to recreate the Reference Computer like this:

• stop the Reference Computer VM (RDS-Ref in may case) if it’s running as it will have a lock on the deployment iso
• Update the Deployment Share
• Run the Powershell to re-create and start it.
• Make more coffee

Having got that working I can now turn my attention to deploy applications (both classic and modern) into my VDI collections, and then think about an automated patching process.

• Lab-Ops part 12 – A crude but effective Domain Controller

I realised I need to recreate a Domain Controller in my labs and in so doing I noticed a snag in my earlier scripts that really breaks when I use the same snippet for a new DC.  I have this test to see if a VM is ready to be used..

do {Start-sleep –seconds 10}

until(Get-VMIntegrationService \$VMName  | where name -eq "Heartbeat").PrimaryStatusDescription -eq "OK")

#the code to create the \$localcred credential is at the end of this post

It does work in that this will return true if the VM is on, but if a VM is coming out of sysprep this do until loop will exit way before I can log in and actually use the VM. So then I tried this command in my until clause ..

Invoke-Command –command 192.168.10.1 {dir c:\} –ErrorAction SilentlyContinue –Credential \$LocalCred

a crude but effective test based on whether I could connect to and run a simple command on the VM.That worked for most of my VMs, but this was still no good for my script to build a Domain Controller (DC). The problem here is that after I add in the feature (which doesn’t require a reboot)..

and then create the Domain with..

this will at some point cause a reboot but this doesn’t happen inline as this command is itself calling  PowerShell in a session I can’t control.  The result is that my script will continue to execute while this is going on in the background. So my test for a C Drive could work before the reboot and I would be in a mess because some subsequent commands would fail while my VM reboots. So my hack for this is to trap the time my VM takes to come out of sysprep..

\$Uptime = (Get-VM –Name \$VMName).uptime.totalseconds

and test when the current uptime is LESS than \$Uptime which can only be true after the VM has rebooted.

do {Start-sleep –seconds 10}

until(Get-VM –Name \$VMName).uptime.totalseconds –lt \$Uptime)

Then I can test to see if the VM is ready to be configured by checking the Active Directory Web Service is alive on my new DC..

Get-Service –Name ADWS | where status –EQ running

However even after this test returned true I was still getting errors from PowerShell saying that a default domain controller couldn't be found so I specified the DC with a –server switch in each command for example ..

New-ADOrganizationalUnit -Description:"RDS VDI Collection VMs" -Name:"RDS-VDI" -Path:"DC=Contoso,DC=com" -ProtectedFromAccidentalDeletion:\$true -Server:"RDS-DC.Contoso.com"
Just to be extra sure I also slapped in a 20 second wait to ensure the service really was there as I want this to run cleanly again and again.

I won’t bore you with the code for adding the rest of the users, groups etc. to Active Directory as the easiest way to write that is to do something to a Domain controller in the Active Directory Administrative Centre and grab the stuff you need from the PowerShell History at the bottom of the console..

I also showed you how to read and write to text based CSV files in part 5 of this Lab Ops Series so you could amend my script to have a whole list of objects to add in to your DC from a CSV file that you have previously lifted from a production DC.

I also need a DHCP server in my lab and I typically put that as a role on MY DC.  Here again you can see how PowerShell has improved for newbies like me..

#Install the DHCP Role
#Authorize this DHCP server in AD
#Setup a scope for use with RDS/VDI later on
Add-DhcpServerv4Scope -StartRange 192.168.10.200 -EndRange 192.168.10.254 -SubnetMask 255.255.255.0 -Name RDSDesktops -Description "Pool for RDS desktop virtual machines"
#Set up the DNS Server Option (6) in DHCP so DHCP clients have the DNS Server entry set
Set-DhcpServerv4OptionValue -OptionId 6 -value 192.168.10.1
Set-DhcpServerv4OptionValue -OptionId 15 -value "contoso.com"

Sadly the trusty old DHCP MMC snapin doesn’t have a history window so I looked at the options set by the wizard and set them as you can see here.  Once all this is working I can go on to create the other VMs in this series. However this DC also sets up and uses a Hyper-V Internal Virtual Switch “RDS-Switch” and ensures that my physical host (Orange – which is my big Dell laptop) can connect to my new DC on that switch..

# Setup the Networking we need - we'll use an internal network called RDS-Switch. If it's not there already create it and set DNS to point to our new DC (RDS_DC) on 192.168.10.1
If (!(Get-VMSwitch | where name -EQ \$VMSwitch )){New-VMSwitch -Name \$VMSwitch -SwitchType Internal}
# Now configure switch on the host with a static IPaddress and point it to our new VM for DNS
\$NetAdapter = Get-NetAdapter | Where name -Like ("*" + \$VMSwitch + "*")

#Note the use of the !(some condition) syntax to refer to not true

The final piece of the puzzle is to join my physical laptop  to this domain,   as I am going to need the host for VDI, and for now I am going to run that manually with the add computer command..

\$LocalCred = new-object -typename System.Management.Automation.PSCredential -argumentlist "orange\administrator", (ConvertTo-SecureString "Passw0rd" -AsPlainText -Force  )
\$DomainCred = new-object -typename System.Management.Automation.PSCredential -argumentlist "Contoso\administrator", (ConvertTo-SecureString "Passw0rd" -AsPlainText –Force  )
add-Computer ComputerName Orange –Domain Contoso.com–LocalCredential \$Localcred –DomainCredential \$DomainCred –Force

Start-Sleep –seconds 5

Restart-Computer Orange \$localcred

..and of course to test that I need to pull it out of that domain before I test it again with Remove-Computer. By the Way don’t put the –restart switch on the end of add computer as that will bounce your Host and hence your DC as well and while your host appears to be domain joined it doesn’t show up in the domain.

I have posted the whole script on SkyDrive (Create RDS-DC.ps1) and it’s called RDS-DC as it’s designed to underpin my Remote Desktop Services demos. For example there are a couple of write-host lines in there to echo output to the console where in reality you would log progress to a file.

As ever any advice and comments on this is welcome and I can repay in swag and by properly crediting your work.

• Lab Ops part 11 – Server Core

My laptop running Windows Server 2012R2 looks like this when it starts:

This is a good thing and a bad thing. It’s a good thing if:

• You have a server you want to connect to from a device with touch on that may have a smaller form factor and you have big fingers so you can get to the task you want in a hurry as you can pin web sites, MMC snap-ins etc. to the Start Menu and organise them as I have done.
• If you use your server for Remote Desktop Session Virtualization (Terminal Services as was) your users will see the same interface as their Windows 8.1 desktop, and will get a consistent experience.
• You are an evangelist at Microsoft, who has to lots of demos and isn’t allowed near production servers!

However if you are managing production servers at any kind of scale this is a bad thing as you don’t need all the tools and interfaces on every server you deploy.  All those tools expose interfaces like File and Internet explore so if your servers are in the wild (so not in managed data centres) then curious local admins might wish to use those tools to surf the net or reconfigure your servers.  Also these interface require patching and can consume resources.

This is why Server Core was introduced in Windows Server 2008 and in Windows Server 2012R2 Server Core is the default installation option.  All you get if you install Windows Server with this option is:

• Registry Editor
• Command Line
• SConfig a lightweight menu of scripts to do basic configuration tasks.
• and with Windows Server 2012 and later you also get PowerShell

Server Core wasn’t a popular choice in Windows Server 2008 for  a number of reasons:

• It was too limited. For example there was no ability to run the .Net framework and so it couldn’t run things like asp.Net websites or SQL Server and it didn’t include PowerShell by default because PowerShell is also built on .Net.
• 2008 wasn’t setup for remote management by default, and patching was problematic, and unless you paid for System Center there wasn’t a tool to manage these servers at any sort of scale.
• It was an install only option, so the only way to get back to a full interface was to do a complete reinstall.

That has all been fixed in Windows Server 2012 and later and so Server Core should be your default option for all your servers except those used for Remote desktop Session Virtualization Hosts.  You could achieve nearly the same result by doing an install and rip out all the interfaces once your server is configured they way you want in which case you would go to the menu remove features in Server Manager and in the features screen uncheck each of the options..

where:

• The Server Graphical Shell has IE and File Explorer the start button and Start Screen.
• The Desktop Experience gives you the Windows Store and makes Windows Server behave like Windows 8.1 (complete with the Windows Store).
• The Graphical Management Tools have Server Manager and MMV snap-ins.

If you remove all of these from here or use PowerShell to do this:

Remove-WindowsFeature –Name Desktop-Experience, Server_Gui-Mgmt-Infra, Server-Gui-Shell

you essentially get Server Core. If you leave behind the Graphical Management Tools ..

Remove-WindowsFeature –Name Desktop-Experience, Server-Gui-Shell

You’ll get what’s known as “MinShell” all the management tools, but no extra fluff like IE and the Start menu, so this is also a popular choice.

If you do elect do a Server Core install and want then later decide to put the management user interface back in, you need to remember that the binaries for these aren’t on the server so when you add the features in you’ll need to specify a source switch, and before that you’ll need to have access to the source for these features by mounting the appropriate .wim file using the Disk Image Servicing Management command (DISM):

MD C:\WS2012install

DISM /Mount-WIM /WIMFile:D:\sources\install.wim /Index:4 /MountDir:C:\WS2012Install /ReadOnly

shutdown /r /t 0

Notes:

• This turns Server Core in to a MinShell installation
• D: is the install media for Windows Server 2012R2
• Index is the individual installation on the media and for the evaluation edition 4 corresponds to the full install of Datacenter edition (You can run DISM /Get-WIMInfo /WIMFile:D:\sources\install.wim to see what there is)
• shutdown /r restarts the server

This is actually quite a useful template as doesn’t have the binaries needed to put IE etc. back in but is still easy to manage. You could then use this as the the basis for creating VMs for your labs and evaluations by sysprepping it:

sysprep /generalize /shutdown /oobe

and if you wanted to save space you could then use it  as a parent for the  differencing disks behind the various VMs  in your lab environment.

There is a free version of Windows Server 2012R2 specifically designed for Hyper-V,  called Hyper-V Server 2012R2. This is very like Server Core, but you can’t add the UI back in and it only has a few roles in it specifically for Hyper-V, clustering and file servers as that’s all that is included in the license.

To finish up Server core is really useful, and much easier to manage remotely than it was, as Windows Server is setup by default for remote management within a domain. Just as importantly you might have dismissed a feature in Windows when it first shows up in a new version, but it’s worth checking back to how it’s changed in each new versions, be that Server Core, Hyper-V, storage or those big squares!

• Lab Ops Part 10–Scale Out File Servers

In my last post I showed how to build a simple cluster out of two VMs and a shared disk.  The VMs are nodes in this cluster..

and the shared “quorum” disk is used to mediate which node owns the cluster when a connection is lost between them..

However this cluster is not actually doing anything; it isn’t providing any service as yet. In this post I am going to fix that and use this cluster as a scale out file server.  This builds on top of what I did with a single file server earlier in this series. To recap I am essentially going to build a SAN, a bunch of disks managed by two controllers. The disks are the shared VHDX files I created last time and the controllers are the file server nodes in my cluster.

First of all I need to add in the the Scale out File Server role into the cluster..

If I go back to cluster manager I can see that the role is now installed..

This time the pool will be created on the cluster, and while I could expand pools under storage in cluster manager and create a new storage pool via the wizard, it’s more interesting to look at the equivalent PowerShell.  If I look at the storage subsystems on one of my nodes with

Get-StorageSubSystem | Out-GridView

I get this..

I need to use the second option so that my new pool gets created on the cluster rather than the on the server I am working on. The actual script I have is..

\$PoolDisks = get-physicaldisk  | where CanPool -eq \$true
\$StorageSubSsytem = Get-StorageSubSystem | where FriendlyName -Like "Clustered Storage Spaces*"
New-StoragePool -PhysicalDisks \$PoolDisks –FriendlyName “ClusterPool” -StorageSubSystemID \$StorageSubSsytem.UniqueId

Here again is an example of the power in PowerShell comes out:

• \$StorageSubSsytem is actually an instance of a class as you can see when I reference it’s unique ID as in \$StorageSubSsytem.UniqueId

In the same way \$PoolDisk is an array of disks showing that we don’t need to declare the type of object stored in a variable it could be a number, string  collection of these or in this case a bunch of disks that can be put in a pool!

• The use of the pipe command to pass objects along a process, and the simple where clause to filter objects by anyone of its/their  properties. BTW we can easily find the properties of any object with get-member as in

#if you try this on your windows 8 laptop you’ll need to run PowerShell as an administrator

get-disk | get-member

I said earlier that I am building a SAN, and within this my storage pool is a just a group of disks that I can manage.  Having done that my next task on a SAN would create logical units (LUNs). Modern storage solutions are making more and more use of hybrid solutions where the disks involved are a mix of SSD and hard disks (HDDs), and intelligently placing the most used or ‘hot’ data on the SSD.  Windows Server 2012R2 can do this and this is referred to as tiered storage.  Because I am spoofing my  shared storage (as per my last post) the media type of the disks is not set and so tiered storage wouldn't normally be available.  However I can fix that with this PowerShell once I have create the pool..

#Assign media types based on size
# Use (physicalDisk).size to get the sizes

Get-PhysicalDisk | where size -eq 52881784832 | Set-PhysicalDisk -MediaType HDD
Get-PhysicalDisk | where size -eq 9932111872 | Set-PhysicalDisk -MediaType SSD

#Create the necessary storage tiers
New-StorageTier -StoragePoolFriendlyName ClusterPool -FriendlyName "SSDTier" -MediaType SSD
New-StorageTier -StoragePoolFriendlyName ClusterPool -FriendlyName "HDDTier" -MediaType HDD

#Create a virtualDisk to use some of the space available
\$SSDTier = Get-StorageTier "SSDTier"
\$HDDTier = Get-StorageTier "HDDTier"

In the world of Windows Server storage I create a storage space which is actually a virtual hard disk and if I go into Cluster Manager and highlight my pool I can create a Virtual Disk...

Foolishly I called my disk LabOpsPool, but it’s a storage space, not a pool. Anyway I did at check the option to use storage tiers (this only appears because of the spoofing I have already don to mark the disks as SSD/HDD), Next I can select the storage layout.

and then I can decide how much of each tier I want to allocate to my Storage Space/Virtual Disk..

Note the amount of space available is fixed - we have to use thick provisioning with storage tiers.  and I could thin provision if I wasn’t.  BTW my numbers are more limited than you would expect because I have been testing this, and also the SSD number will be lower than you would think because the write cache also gets put on the SSDs as well.  Having done that the wizard will allow me initialise the disk put a simple volume on it and format it.

Now I need to add the disk I created to Clustered Shared Volumes as I want to put application data (VM’s and databases) on this disk.  Then I need to navigate to the ScaleOut File Server role  on the left pane and create a  share on the disk so it can be actually used..

This fires up the same share wizard as you get when creating shares in Server Manager..

I am going to use this for storing application data,

it’s going to be called VMStorage

my options are greyed out based on the choices I already made, but I can encrypt the data if I want to, and I don’t need to do any additional work other than check this.

I then need to setup file permissions. You’ll need to ensure your Hyper-V hosts, database servers etc. have full control on this share to use it. In my case my Hyper-V server are in a group I created imaginatively titled  Hyper-V Servers.

The last few steps can of course be done in PowerShell as well so here’s how I do that, but note that this is my live demo script so some of the bits are slightly different

#create two Storage spaces on with tiering and one without
New-VirtualDisk -FriendlyName \$SpaceName1 -StoragePoolFriendlyName \$PoolName  -StorageTiers \$HDDTier,\$SSDTier -StorageTierSizes 30Gb,5Gb -WriteCacheSize 1gb -ResiliencySettingName Mirror
New-VirtualDisk -FriendlyName \$SpaceName2 -StoragePoolFriendlyName \$PoolName  -StorageTiers \$HDDTier,\$SSDTier -StorageTierSizes 30Gb,5Gb -WriteCacheSize 1gb -ResiliencySettingName Mirror

#create the dedup volume and mount it
#First we need to put the disk into maintenance mode
\$ClusterresourceName = "*("+ \$SpaceName1 + ")*"
Get-ClusterResource | where name -like \$ClusterResourceName | Suspend-ClusterResource
\$VHD = Get-VirtualDisk \$SpaceName1
\$Disk = \$VHD | Get-Disk
Set-Disk  \$Disk.Number -IsOffline 0
New-Partition -DiskNumber \$Disk.Number -DriveLetter "X" -UseMaximumSize
Initialize-Volume -DriveLetter "X" -FileSystem NTFS  -NewFileSystemLabel "DedupVol"  -Confirm:\$false
#note -usagetype Hyper-V for use in VDI ONLY!
Enable-DedupVolume -Volume "X:" -UsageType HyperV
#Bring the disk back on line
Get-ClusterResource | where name -like \$ClusterResourceName | Resume-ClusterResource

#create the dedup volume and mount it
#First we need to put the disk into maintenance mode
\$ClusterresourceName = "*("+ \$SpaceName2 + ")*"
Get-ClusterResource | where name -like \$ClusterResourceName | Suspend-ClusterResource
\$VHD = Get-VirtualDisk \$SpaceName2
\$Disk = \$VHD | Get-Disk
Set-Disk  \$Disk.Number -IsOffline 0
New-Partition -DiskNumber \$Disk.Number -DriveLetter "N" -UseMaximumSize
Initialize-Volume -DriveLetter "N" -FileSystem NTFS  -NewFileSystemLabel "NormalVol"  -Confirm:\$false
#Bring the disk back on line
Get-ClusterResource | where name -like \$ClusterResourceName | Resume-ClusterResource

\$StorageSpaces = Get-ClusterResource | where Name -Like "Cluster Virtual Disk*"
ForEach (\$Space in \$StorageSpaces) { Add-ClusterSharedVolume -Cluster \$ClusterName -InputObject \$Space }

#create the standard share directory on each new volume
\$DedupShare  = "C:\ClusterStorage\Volume1\shares"
\$NormalShare = "C:\ClusterStorage\Volume2\shares"
\$VMShare     = "VDI-VMs"
\$UserShare   = "UserDisks"

md \$DedupShare
md \$NormalShare

\$Share = "Dedup"+\$VMShare
\$SharePath = \$DedupShare + "\" + \$VMShare
MD \$SharePath
New-SmbShare -Name \$Share -Path \$SharePath -CachingMode None -FullAccess contoso\Orange\$,contoso\administrator -ScopeName \$HAFileServerName -ContinuouslyAvailable \$True

\$Share = "Dedup"+\$UserShare
\$SharePath = \$DedupShare + "\" + \$UserShare
MD \$SharePath
New-SmbShare -Name \$Share -Path \$SharePath -CachingMode None -FullAccess contoso\Orange\$,contoso\administrator -ScopeName \$HAFileServerName -ContinuouslyAvailable \$True

\$Share = "Normal"+\$VMShare
\$SharePath = \$NormalShare + "\" + \$VMShare
MD \$SharePath
New-SmbShare -Name \$Share -Path \$SharePath -CachingMode None -FullAccess contoso\Orange\$,contoso\administrator -ScopeName \$HAFileServerName -ContinuouslyAvailable \$True

\$Share = "Normal"+\$UserShare
\$SharePath = \$NormalShare + "\" + \$UserShare
MD \$SharePath
New-SmbShare -Name \$Share -Path \$SharePath -CachingMode None -FullAccess contoso\Orange\$,contoso\administrator -ScopeName \$HAFileServerName -ContinuouslyAvailable \$True

I know this can be sharpened up for production with loops and functions but I hope it’s clearer laid out this way.  Note I have to take the disk into maintenance mode on the cluster while I format them etc. where the storage spaces wizard takes care of that.  This script is part of my VDI demo setup and so I have enabled deduplication on one  storage space and not on the other to compare performance on each of these.

Once I have created my spaces and shares  I am ready to use it, and a quick way to test all is well is to do a quick storage migration of a running VM to this new share. Just right click on a VM in Hyper-V Manager and select move to bring up the wizard.

after about 30 seconds my VM arrived safely on my File Server as you can see from its properties..

Hopefully that’s useful - it certainly went down well in London, Birmingham & Glasgow during our recent IYT Camps and if you want the full script I used I have also put it on Skydrive. Not that this script is designed to be run from FileServer 2 having already setup the cluster with the cluster setup script I have already posted

• Lab Ops part 9 - an Introduction to Failover Clustering

For many of us Failover Clustering in Windows still seems to be a black art, so I thought it might be good to show how to do some of the basics in a lab and show off a few of the new clustering features of Windows server 2012 R2 in the process.

Firstly what is a cluster? It’s simply a way of getting one or more servers in a group to provide some sort of service that will continue to work in some form if one of the servers in that group fails. For this to work the cluster gets its own entry in active directory and in DNS so the service it’s running can be discovered and managed.  Just be clear all the individual servers in a cluster (known as cluster nodes) must be in the same domain.

So what sort of services can you run on a cluster? In Windows Server 2012R2 that list looks like this..

Note: Other roles from other products like SQL Server can also be added in. Notice too that virtual machines (VMs) are listed here and running them in a cluster is how you make them highly available.

All of these roles, but one, can be run from guest clusters; that is a cluster built of VMs rather than physical servers and it is also possible to have physical hosts and VMs combined in the same cluster – it’s all Windows servers after all. The exception to this is when you want to make VMs highly available -  the cluster must only contain physical hosts and this is known as a host or physical cluster.

Making a simple cluster is easy, it’s just a question of installing the failover clustering feature on each node and then joining them to a cluster. When you add in the feature you can add in failover cluster manager but if you have been following this series you’ll have access to this on your desktop as the point of Lab Ops is to remotely manage where possible and also make use of PowerShell.  So In my example I am going to create 2 x VMs (fileserver2 & 3) from my HA FileServerCluster Setup local.ps1 script which adds in the clustering feature (from this xml file).

Note: my SSD drive where I store my VMs is E: s you will need to edit my script if you want to follow along

Having run that I could then simply run a line of PowerShell on one of those servers to create a cluster..

New-Cluster -Name HACluster -Node FileServer2,FileServer3 -NoStorage -StaticAddress 192.168.10.30

Note the –NoStorage switch. This cluster has just got my two nodes in and that’s it.  For some clustering roles such as SQL Server 2012 AlwaysOn this is OK but most roles that you put into a cluster will need access to shared storage, and historically this has meant connecting nodes to a SAN via ISCSI or Fibre Channel.  This applies to host and guest clusters, but for the latter guest VMs will need direct access to this shared storage. That can cause problems in a multi-tenancy setup like a hoster as the physical storage has to be opened up to the VMs for this to work, and even if you don’t work for one of those then this will at least cut across the separation of duties in many modern IT departments.

There’s another problem with this cluster; it has an even number of objects in it. If one of the nodes fails it will restart and think it “owns” the cluster but the other node already has that ownership and problems will occur.  So for most situations we want an odd number of objects in our cluster and the “side” of a cluster that has the majority after a node failure will own the cluster.  This democratic approach to clustering means that there is no one node that is in charge of the cluster, and this enables the Windows Server to support bigger clusters than VMware who use the older style of clustering that Microsoft abandoned after Windows Server 2003.

So having built a simple cluster I need add in more objects and work with some shared storage.   If you look at the script I used to create FileServer2 & 3 there’s a couple of things to note:

I have created 7 shared virtual hard disks and these are attached to both FileServer2 and FileServer3 and if I run this PowerShell

Get-VMHardDiskDrive –VMName “FileServer2”,”FileServer3” | Out-GridView

you can see that..

Also If I look at the settings of FileServer2 in Hyper-V manager there’s a switch to confirm these disks are shared ..

This is new for Windows Server 2012R2 and for production use the shared disks (VHDX only is supported) must be on some sort of real shared storage. However there is also a spoofing mechanism in R2 to allow this feature to be evaluated and demoed and this is in line 272 of my script ..

start-process "C:\windows\system32\FLTMC.EXE" -argumentlist "attach svhdxflt e:"
Note you’ll need to rerun this after every reboot of your lab setup.

Now that my two FileServer VMs are joined in a cluster (HACluster) I can use one of these shared disks as a third object in the cluster. To do that I need to format it add it in as a cluster resource and then declare it’s use as a quorum disk, and I’ll be running this script from one of the cluster nodes..

#Setup (Initialize format  etc. )the 1Gb Quorom disk
\$QuoromDisk = get-disk | where size -EQ 1Gb
Initialize-Disk \$QuoromDisk.Number -PartitionStyle GPT
set-disk \$QuoromDisk.number -IsOffline 0
New-Partition -DiskNumber \$QuoromDisk.Number -DriveLetter "Q" -UseMaximumSize
Initialize-Volume -DriveLetter "Q" -FileSystem NTFS  -NewFileSystemLabel "Quorum"  -Confirm:\$false
#Add in the quorum disk to the cluster, and then set the quorum mode on the cluster
start-sleep -Seconds 20
Get-ClusterAvailableDisk -Cluster \$ClusterName | where size -eq 1073741824 | Add-ClusterDisk -Cluster \$ClusterName
\$Quorum = Get-ClusterResource | where ResourceType -eq "Physical Disk"
Set-ClusterQuorum -Cluster \$ClusterName -NodeAndDiskMajority \$Quorum.Name
Using a disk like this as a third object is only one way to create that odd number of objects (called Quorum). You can just have three physical nodes or use a file share. In my case if a node fails the node that has ownership of this disk will form the cluster, and the failed node will then re-join the cluster rather than try to rebuild an identical cluster when it is recovered

That’s where I want to stop as there are several ways I can use a basic cluster like this and I’ll be covering those in individual posts.

If you want to build the cluster I have described so far all you need is an evaluation edition of Windows Server 2012R2.  and go through part 1 & Part 2 of this series.

• Lab Ops part 8 –Tidying up

If you have been to one of our IT camps you’ll know it’s all live and unstructured which  can mean things can go wrong, typically because we didn’t tidy up properly after the previous event. The problems of repeatedly creating VMs are that:

The VM still exists in Hyper-V

The files making up the VM are still on disk

The VM is registered in active directory ( we don’t currently recreate the main domain controllers for each camp

So the sensible thing is to scripts to tidy this up before I run each demo again, in fact my plan is to have a clean up section at the start of every demo script so I never forget to run it.  My advantage over some of  Power Shell gurus is that I am starting form scratch and can leverage the power of the later versions of PowerShell in Windows server 2012 & R2. For example to delete a DNS record there is now remove-DNSServerResourceRecord where before you would have had to use complex WMI calls.

My clean up code needs to run without errors whatever state my rig is in so where possible I test for objects where I delete them.  Sadly I found one problem here and that was with the DNS cmdlets – if you do Get-DNSServerResourceRecord it returns an error if the record is not there which I think is wrong, but at least the command is easy to use.

Anyway here’s what I do..

#1. delete the File server VM. Note that before a VM can be deleted it must be shutdown no matter what state it’s in when the script is run

\$vmlist = get-vm | where vmname -in \$vmname
\$vmlist | where state -eq "saved" | Remove-VM -Verbose -Force
\$vmlist | where state -eq "off" | Remove-VM -Verbose -Force
\$vmlist | where state -eq "running" | stop-vm -verbose -force -Passthru | Remove-VM -verbose –force

#2. Get back the storage. My scripts to create VMs put all of the metadata and virtual hoard disks in the same folder so once the VM has been deleted in hyper-v, I can just delete its folder.

\$VMPath = \$labfilespath + \$VMName
If (Test-Path \$VMPath) {Remove-Item \$VMPath  -Recurse}

Notes on the above..

• the code to delete the vms makes extensive use of the pipe “|” command perhaps the most important part of PowerShell.  what crosses over this pipe is not text but the actual instance of the object being referenced. For example the vm properties will include the host it is running on so this doesn’t need to be specified.
• Another good example of how much easier PowerShell3 is the much simpler where clause we don’t need the {\$._} syntax anymore
• I got the code to remove the entry for the VM from active directory by grabbing it from the Active Directory Administrative Console which is not only a one stop shop for AD, but shows you the PowerShell for each command you run in it for exactly this purpose..

• If you want to delete a cluster in AD you’ll need to turn off the switch to stop it being accidentally deleted ..

• I haven’t got any code here to delete DNS entries as creating VM’s the way I do with static IP addresses doesn’t need this. However later in this series I’ll be creating clusters and scale out file servers and that’s when I need to do this..

invoke-command -ComputerName Orange-DC -ScriptBlock {remove-DnsServerResourceRecord "HAFileCluster" -ComputerName orange-dc -zone contoso.com -RRType A -Force}
invoke-command -ComputerName Orange-DC -ScriptBlock {remove-DnsServerResourceRecord "HAFileServer" -ComputerName orange-dc -zone contoso.com -RRType A -Force}

where HAFileCluster is my actual cluster and HA FileServer is the scale out fileserver role running on that cluster.  I am invoking this command remotely as I am running it from a cluster node where I don’t have the  DNS Server PowerShell cmdlets as that role is not installed on the nodes.

As ever I hope you find this stuff useful, and please let me know if you have any comments. In the meantime I am at TechDays Online for the rest of the week.

• Lab Ops - Stop Press Windows Server 2012R2 Evaluation edition released

I have just got back to my blog after a few days at various events and I see the Evaluation edition of Windows Server 2012R2 has been released.  I need this for my lab ops because I am building and blowing away VMs for er… evaluations and I don’t want to have to muck about with license keys. For example I have a script to create FileServer1 VM but if I use the media from MSDN for this and I don’t add in a license key to my answer file, the machine will pause at the license key screen until I intervene.  Now I have the Evaluation Edition I can build VM’s that will starter automatically and when they are running continue to configure them. For example for my FileServer1 VM I created in earleir posts in this series  I can add a line to the end of that script while will run on the VM itself once it is properly alive after its first boot..

invoke-command -ComputerName \$VMName -FilePath 'E:\UK Demo Kit\Powershell\FileServer1 Storage Spaces.ps1'
..and this will go away and setup FileServer1 with my storage spaces.

Note both the script to create FileServer1 (FileServer1 Setup.ps1) and the xml it uses to add features into that VM (File server 1 add features.xml) and the File Server1 Storage Spaces.ps1 script referenced above are on my SkyDrive for you to enjoy.

One good use case for executing remote PowerShell  scripts remotely like this is when working on a cluster. Although I have put the Remote Server Administration Tools (RSAT) on my host and to have access to the Failover Clustering  cmdlets I get a warning about running these against a remote cluster..

WARNING: If you are running Windows PowerShell remotely, note that some failover clustering cmdlets do not work remotely. When possible, run the cmdlet locally and specify a remote computer as the target. To run the cmdlet remotely, try using the Credential Security Service Provider (CredSSP). All additional errors or warnings from this cmdlet might be caused by running it remotely.

While on the subject of new downloads the RSAT for managing Windows Server 2012R2 from Windows 8.1 is now available, so you can look after your servers from the comfort of Windows 8.1 with your usual tools like Server Manager, Active Directory Administrative Console, Hyper-V manager and so on On my admin VM I have also put on the Virtual Machine Manger Console ad SQL Server Manager and a few other admin tools..

Before you ask me the RSAT tools you put on each client version of Windows only manage the equivalent version of server and earlier.  For example you can’t put the RSAT tools for managing Windows Server 2012R2 onto Windows 8 or Windows 7.

So using my lab ops guides or the more manual guides on TechNet, you can now get stuck into playing with Windows Server 2012R2, as a way of getting up to speed on the latest Windows Server along with the R2 courses on the Microsoft Virtual Academy.