...building hybrid clouds that can support any device from anywhere
The PowerShell Deployment Toolkit – PDT – performs distributed installations of System Center 2012 SP1, including SQL and all prerequisites. If you are doing a full production highly available scale-out deployment, this could potentially be across a significant number of servers. Keeping track of the status of all that is one of the interesting challenges that PDT addresses, but what do you do when something goes wrong? The good news is that PDT gracefully handles failures mid-flow across that distributed installation, and also allows for restarts of partially failed deployments.
Here’s what happens in the inner workings of PDT. For each server in a deployment, PDT dynamically determines the set of items that need to be installed and configured and the order in which that needs to happen. Then, for each item it determines whether that item has already been done using one of a number of validation types. If it has, it just skips that step. If it has not, it performs the necessary actions for that item, then re-runs the validation to make sure it worked. If the validation fails, PDT does not continue for that server – so any items after the failed item are not completed. If a server has a dependency on an item on another server in the deployment – for example, a management server needs SQL to be installed on another server before it can be installed – it waits for that server to complete that dependency before it continues. If that server has failed any item prior to the dependency, the server that is dependent at that point also fails.
The result of all this is that, if something fails, you can wait for everything else in the deployment to complete, then fix the condition that caused the failure, then just run Installer.ps1 again. Everything that worked the first time through will not be done again because PDT will validate that those items are already in place – so it effectively picks up where the failures in the previous run occurred.
So, how can you tell what went wrong? There are two sets of log files to help you diagnose a failure – the PDT log files themselves, and the log files for the items being installed. The PDT log files are in the folder C:\Users\<username>\AppDataLocal\Installer on the system running Installer.ps1 – there is a log file per server being deployed to, as well as a consolidated log file Installer.log. All files are in the format that can easily be read by the CMTrace.exe utility from Configuration Manager (remember I am an old ConfigMgr guy at heart). The PDT logs list everything PDT is doing – getting information, setting variables, checking if something needs to be installed, creating a task, waiting for that task to finish, waiting for dependencies etc. If something fails, it will tell you what failed. The log files for items being installed are collected by PDT and copied to the installer machine at the end of the deployment – they can be found in C:\Temp\<guid>, guid being a unique identifier that is assigned to each run of the deployment. These logs are generated by each individual setup, and so each has their own format. PDT collects them so that they are easy for you to find, but also because the way PDT runs tasks against remotes machines means that some of the log files get deleted as part of the process, and we need to make sure you have access to them.
So, that’s how to start troubleshooting failures, plus a little insight into how PDT actually works. More on that in future posts!
PDT is a wonderful tool, thanks again for all the work that's gone into it. I have found an issue with several path declarations in Workflow.xml, you have the $TempPath variable set to $SystemPath\Temp, but in several places use $SystemPath\Temp instead of $TempPath. The first time I used PDT from a drive other than C:, I had several failures during the integration component installation. Changing all instances of $SystemPath\Temp to $TempPath in Workflow.xml resolved the issue.
Thanks Nathan! We'll look at that and incorporate as appropriate in next release.
Hey Rob, I have gone through the variable.xml file and located the accounts used for the various services. I have created these in my isolated AD,and assigned matching passwords in both AD and in the XML file. However, the installer borks when it gets to the validating service accounts section. Is there a specific property the script is looking for in the account definition. While in powershell, I can execute a runas command using the credentials from the XML file and the authentication works fine. At first I thought it was failing because I was running the installer from a non-domain machine, but after successive failures, I moved the code to a machine in the AD. No joy there either.
The only PDT logs located at C:\Users\<username>\Appdata\local\Installer\Log are Controller.0 and Controller.1. Installer\Temp is empty. Thoughts?
wnbowman - can you confirm that a version of the .NET Framework is installed on the machine you are running Installer.ps1 from? Either 3.5 or 4.5. The script definitely needs to be run from a domain joined machine.
Rob, the machine I used does have both .NET 3.5 and .NET 4.5 installed.
Good Morning Rob.
For the next version it might be a good idea to do a check for valid characters for passwords. As things like having a password with a < > & " $ ' : ; etc... in it causes bad things to happen or xml to be considered invalid.
The first 3 will cause the variable xml to be invalid and there is no display to say where the error is it just says error. (If you then manually try to do the import it says this is the line that has the error)
Some of those other values will get past the first check but when trying run the commands it then produces invalid batch files for the installer.
@Davey. Doh, yes, my passwords have the $ in them, completely forgot about PS barking about that. I will be changing the passwords and retrying the validation again. Thanks!!
@Davey- validation was successful, thanks for pointing out the issue with $ in the password.
@Rob- Installation fails on installing any/all SQL instances. I manually went into each server and installed .NET Framework 3.5 and attempted to install once again, same results. I have confirmed and saved off logs from each installation attempt.
Thinking about wiping it all and starting from scratch.
@Rob- In reviewing the Installer.log file, I did see where the SQL source files were trying to copy from C:\Temp. As I had my installation and downloaded files under D:\Temp, there was the disconnect. I have moved my files from C to D, and I am seeing more progress now that I have restarted the installer.ps1 again.
(Have not started from scratch yet...)
Something that i have found that makes this very easy is create a 100 - 150 gig VHD and mount it as c:\temp
You can store the VHD were ever you have disk space for it and then it is in the correct path. Also it makes it really easy to move / clone your build machine to another you just make a copy / differencing disk from your build one.
PS glad the password bit helped someone as it drove me nuts for about 2 days. :)