Use of a differencing VHD can cause the Windows Azure VM Role to not start

If the parent VHD size is already at the maximum limit, then using a child differencing disk will cause the VM Role to not start. I don't have any official confirmation, but my assumption is that the use of the differencing disk is causing the "effective" disk size to exceed the upper limit.

Recently, I was helping one of my clients build a VHD image for use as a VM Role on Windows Azure, and we discovered that the VM Role would not start. We had decided to use a combination of base + differencing disk as per the recommendation from section 4.6 on this TechNet article:

If you will be frequently updating an application running on the VM nodes or changing the configuration of the operating system, you may want to upload a base VHD to Windows Azure on which the application is not installed. Then, you can use Hyper-V Manager and the base VHD to create a differencing VHD that contains the application updates and other configuration changes. When you need to update the Windows Azure VM nodes, you can upload the differencing VHD. For step-by-step procedures, see How to Change a Server Image for a VM Role by Using a Differencing VHD (https://go.microsoft.com/fwlink/p/?LinkId=217131).

Initially, we tried to keep the image as small as possible (to reduce the upload time), because we assumed that we would be able to expand it later if necessary. However, when we ran out of space and ran the "Edit Disk" wizard from the Hyper-V management console, we discovered this warning message:

Do not edit a virtual hard disk when it is used by a virtual machine that has snapshots, or when it is associated with a differencing virtual hard disk. Otherwise, data loss may occur.

So, we had to throw away all of the updates and applications we had installed on the differencing disk, expand the parent disk, create a new differencing disk, and re-do all the application installation and configuration work. When we ran out of space, we noticed that even with the differencing disk, the size of the C:\ drive on the Virtual Machine was still the same as on the parent VHD, so we assumed that the differencing disk did not increase the size of the disk. We didn't want to run out of space and have to repeat that process again, so we decided to expand the parent disk 65GB which is the maximum size specified for Medium, Large, and Extra Large Azure instances. We ran sysprep and used the csupload utility to upload the parent VHD image (65GB) to the Azure storage. We then created a differencing disk, installed all the applications, etc., sysprepped, and uploaded that image (9GB).

We then ran a PowerShell script to provision several VM Roles based on the differencing disk. After about 30 minutes, the status message on the Azure portal said , "Failed to Start". There was no indication of what the problem was.

So, we changed the .cscfg file to specify the parent VHD instead of the differencing VHD and ran the provisioning script again. This time, the nodes all provisioned successfully with no problems. Because this worked, we went back to the Hyper-V "Edit Disk" and ran the Merge command to incorporate all of the changes from the differencing disk onto the parent. We then uploaded the new merged image and were able to successfully provision the VM Roles.

I have seen VM Roles which were based on differencing VHD's get provisioned successfully, but it will not work if the parent VHD is already at the maximum size limit.