How to configure an SMB file share for Azure based HPC compute nodes

For the last few weeks, I have been working with one of our major ISV applications to enable it to “cloud burst” from an on-premises HPC cluster to Azure compute nodes. One of the key capabilities introduced with Service Pack 1 (SP1) for Windows Server 2008 R2 is the ability to provision compute nodes on Windows Azure as a supplement to an on-premise cluster. For more information, see Deploying Azure Worker Nodes in Windows HPC Server 2008 R2 SP1 Step-by-Step Guide

Like many HPC applications, this one utilizes an SMB file share to enable all of the compute nodes to access common data and executable code. At first, I simply created a file share on the (on-premises) HPC head node and set-up an Azure Connect group to allow all of the compute nodes to access it. However, the performance was not acceptable. So, I decided to create a file share on Azure to remove the Azure Connect bottleneck.  

It is possible to set up an SMB file share using an Azure worker role as described in this article:

https://blogs.msdn.com/b/windowsazurestorage/archive/2011/04/16/using-smb-to-share-a-windows-azure-drive-among-multiple-role-instances.aspx

However, I found that I could not use these guidelines for HPC managed Azure compute nodes, because the .csdef and .cscfg files are not accessible for customization. The purpose of this blog entry is to document an approach for configuring one of the HPC Azure compute nodes to function as an SMB file share for all the others.

NOTE: There is nothing persistent about this file share (hence the name, “scratch”). In fact, it may not even last the duration of a long running job, depending on the whim of the Azure fabric controller. But if you like to live dangerously, this will work.

The Azure based compute nodes that are provisioned by the on-premises HPC head node already have the File Services Role and the File Server Role Service installed, and port 445 is open by default on the internal endpoint. So, all that is necessary is to run the following commands:

mkdir c:\scratch

Create the directory to be shared

net user guest /active:yes

Enable the Guest user. I found that this was necessary, because the HPC job scheduler creates a local user on the Azure compute nodes which is not part of the AD domain that the head node belongs to. So, if the head node is going to access the file share, the guest user needs to be enabled and have permissions for that directory.

netsh firewall set service type=fileandprint mode=enable scope=all

I took this command from the article referenced above. It will return a warning message about the command being deprecated, but it still works.

icacls c:\scratch /grant guest:F /t

This is the command that gives full permissions to the guest user. This might normally be a security concern, but the internal endpoints on the Azure compute nodes are only accessible from other nodes in that same Hosted Service or through the IPsec connection that Azure Connect provides.

net share scratch=c:\scratch /grant:everyone,FULL

This was taken from the article above as well, except that I am giving full permissions to everyone.

If the on-premises head node does not need to access the file share, then this is all that is necessary. If the head node does need to access the file share, as was the case for this application, then a couple of other steps are required. First, you need to make sure that Azure Connect is configured between the Head node and the Azure file server node. Since most applications use a UNC path to access the file share (\\AzureCN-0006\scratch), all of the nodes need to be able to resolve the name to an IP Address. The AzureCN-xxxx node names are managed by HPC and are maintained in the hosts files on each of the compute nodes. However, since the head node communicates with the Azure compute nodes through a proxy, it does not have an entry in either DNS or its hosts file which references the AzureCN-xxxx name. So, we need to add an entry in the hosts file on the head node which maps the IPv6 address that Azure Connect established to the AzureCN-xxxx name. You can do that manually by using the "clusrun /nodes:AuzreCN-xxxx ipconfig" command to discover the IPv6 address for the PPP connection. 
Then, you need to add a row to the c:\windows\system32\drivers\etc\hosts file for that address and hostname combination.

When I worked through this process the first several times, I entered all of the commands manually, but then, I realized it would be simple to automate the entire process by submitting a job to the HPC scheduler. So, I wrote this PowerShell script:

$an = Read-Host "Please enter the name of the Azure node to use for the file server"

$job = New-HpcJob -Name "Create a File Share" -RequestedNodes $an

$job | Add-HpcTask -Name "Create Folder" -Command "mkdir c:\scratch"

$job | Add-HpcTask -Name "Enable Guest User" -Command "net user guest /active:yes"

$job | Add-HpcTask -Name "Enable Firewall" -Command "netsh firewall set service type=fileandprint mode=enable scope=all"

$job | Add-HpcTask -Name "Grant Full Permissions to Guest" -Command "icacls c:\scratch /grant guest:F /t"

$job | Add-HpcTask -Name "Share Folder" -Command "net share scratch=c:\scratch /grant:everyone,FULL"

submit-hpcJob -Job $job

# Get the IPv6 address of the node

$IPConfiginfo = clusrun /nodes:$an ipconfig

$Addressline = $IPConfiginfo -match "IPv6"

$Addressline[1] -match
"[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}"

$hostsfileentry = $matches[0] + "   " + $an

# Insert new row into hosts file with AzureCN name

$hosts = "c:\windows\System32\Drivers\etc\hosts"

(gc $hosts) -replace "true", "$&`r`n$hostsfileentry" | sc $hosts

If you try this and have any problems, please let me know.