SMB Transparent Failover – making file shares continuously available

SMB Transparent Failover – making file shares continuously available

  • Comments 19
  • Likes

 

SMB Transparent Failover is one of the key features in the feature set introduced in Server Message Block (SMB) 3.0. SMB 3.0 is new in Windows Server 2012 and Windows 8. I am the program manager for SMB Transparent Failover and in this blog post I will give an overview of this new feature.

In Windows Server 2012 the file server introduces support for storing server application data, which means that server applications, like Hyper-V and SQL Server, can store their data files, such as virtual machine files or SQL databases on Windows file shares. These server applications expect their storage to reliable and always available and they do not generally handle IO errors or unexpected closures of handles very well. If the server application cannot access its storage this often leads to databases going offline or virtual machines stopping or crashing because they can no longer write to their disk.

SMB Transparent Failover enables administrators to configure Windows file shares, in Windows Failover Clustering configurations, to be continuously available. Using continuously available file shares enables administrators to perform hardware or software maintenance on any cluster node without interrupting the server applications that are storing their data files on these file shares. Also, in case of a hardware or software failure, the server application nodes will transparently reconnect to another cluster node without interruption of the server applications. In case of a SMB scale-out file share (more on Scale-Out File Server in a following blog post), SMB Transparent Failover allows the administrator to redirect a server application node to a different file server cluster node to facilitate better load balancing.

For more information on storing server application data on SMB file shares and other features to support this scenario, see Windows Server “8” – Taking Server Application Storage to Windows File Shares

Installation and configuration

SMB Transparent Failover has the following requirements:

  • A failover cluster running Windows Server 2012 with at least two nodes. The configuration of servers, storage and networking must pass the all tests performed in the Validate a Configuration wizard.
  • File Server role is installed on all cluster nodes.
  • Clustered file server configured with one or more file shares created with the continuously available property. This is the default setting.
  • SMB client computers running the Windows 8 client or Windows Server 2012.

To realize SMB Transparent Failover, both the SMB client computer and the SMB server computer must support SMB 3.0, which is introduced in Windows 8 and Windows Server 2012. Computers running down-level SMB versions, such as 1.0, 2.0 or 2.1 can connect and access data on a file share that has the continuously available property set, but will not be able to realize the benefits of the SMB Transparent Failover feature.

Installing and creating a Failover Cluster

Information about how to install the Failover Clustering feature, creating and troubleshooting a Windows Server 2012 Failover Cluster see these blog posts:

Installing the File Server role

Once the Failover Cluster is up and running, we can install the File Server role. Do the following for each node in the Failover Cluster:

Graphical User Interface

  • Start Server Manager
  • Click Add roles and features
  • In the Add Roles and Features Wizard, do the following:
    • In Before you begin, click Next
    • In Select installation type, click Next
    • In Select destination server, choose the server where you want to install the File Server role, and click Next
    • In Select server roles, expand File And Storage Services, expand File and iSCSI Services, and check the check box for File Server and click Next
    • In Select features, click Next
    • In Confirm installation selections, click Install

image

Figure 1 - Installing File Server role

PowerShell

In an elevated PowerShell shell, do the following:

Add-WindowsFeature -Name File-Services

Create clustered File Server

Once the File Server role is installed on all cluster nodes, we can create a clustered file server. In this example we will create a clustered file server of type “File Server for general use” and name it SMBFS. I will provide more information on “Scale-Out File Server for application data” in a follow-up blog post.

Do the following to create a clustered file server.

Graphical User Interface

  • Start Server Manager
  • Click Tools and select Failover Cluster Manager
  • In the console tree, do the following
    • Select and expand the cluster you are managing
    • Select Roles
    • In the Actions pane, click Configure Role
    • In Before You Begin, click Next
    • In Select Role, select File Server and click Next
    • In File Server Type, select the type of clustered file server you want to use
    • In Client Access Point, enter the name of the clustered file server
    • In Client Access Point, complete the Network Address for static IP addressed as needed and click Next
    • In Select Storage, select the disks that you want to assign to this clustered file server and click Next
    • In Confirmation, review your selections and when ready click Next

clip_image004

Figure 2 – Select File Server Type

clip_image006

Figure 3 - Configure Client Access Point

clip_image008

Figure 4 - Select Storage

PowerShell

In an elevated PowerShell shell, do the following:

Add-ClusterFileServerRole -Name SMBFS -Storage "Cluster Disk 1" -StaticAddress 192.168.9.99/24

Create a file share that is continuously available

Now that we have created the clustered file server, we can create file shares that are continuously available. In this example we will create a file share named “appstorage” on the clustered file server we created previously.

Do the following to create a file share that is continuously available:

Graphical User Interface

  • Start Server Manager
  • Click Tools and select Failover Cluster Manager
  • In the console tree, do the following
    • Select and expand the cluster you are managing
    • Select Roles
    • In the Results pane, select the file server where you want to create the file share and in the Actions pane click Add File Share. This will start the New Share Wizard
  • In the New Share Wizard, do the following
    • In Select Profile, select the appropriate profile (SMB Share – Applications in this example) and click Next
    • In Share Location, select the volume where you want to create the share and click Next
    • In Share Name, enter the share name and click Next
    • In Configure Share Setting, verify Enable continuous availability is set and click Next
    • In Specify permissions and control access, modify the permissions as needed to enable access and click Next
    • In Confirmation, review your selections and when ready click Create
    • Click Close

clip_image010

Figure 5 - Select Profile

clip_image012

Figure 6 - Select server and path

clip_image014

Figure 7 - Share Name

clip_image016

Figure 8 - Configure Share Settings

To verify a share has the continuously available property set, do the following:

  • Start Server Manager
  • Click Tools and select Failover Cluster Manager
  • In the console tree, do the following
    • Select and expand the cluster you are managing
    • Select Roles
    • In the Results pane, select the file server you want to examine
    • In the bottom window, click the Shares tab
    • Locate the share of interest and examine the Continuous Availability property

 

PowerShell

These steps assume the folder for the share is already created. If this is not the case, create folder before continuing.

In an elevated PowerShell shell on the cluster node where the clustered file server is online, do the following to create a file share with continuous availability property set:

New-SmbShare -Name AppStorage –Path f:\appstorage –Scope smbfs –FullControl smbtest\administrator

In an elevated PowerShell shell on the cluster node where the clustered file server is online, do the following to verify a file share has continuous availability property set.

Get-SmbShare -Name AppStorage | Select *

PresetPathAcl : System.Security.AccessControl.DirectorySecurity

ShareState : Online

AvailabilityType : Clustered

ShareType : FileSystemDirectory

FolderEnumerationMode : Unrestricted

CachingMode : None

CATimeout : 0

ConcurrentUserLimit : 0

ContinuouslyAvailable : True

CurrentUsers : 0

Description :

EncryptData : False

Name : appstorage

Path : F:\Shares\appstorage

Scoped : True

ScopeName : SMBFS

SecurityDescriptor : O:BAG:DUD:(A;OICI;FA;;;WD)

ShadowCopy : False

Special : False

Temporary : False

Volume : \\?\Volume{266f94b0-9640-4e1f-b056-6a3e999e6ecf}\

Note that we didn’t request the continuous availability property to be set. This is because the property is set by default. If you want to create a file share without the property set, do the following:

New-SmbShare -Name AppStorage -Path f:\appstorage -Scope smbfs –FullControl smbtest\administrator -ContinuouslyAvailable:$false

Using a file share that is continuously available

Now that we have created a clustered file server with a file share that is continuously available, let’s go ahead and use it.

The below diagram illustrates the setup that I will be using in this section.

clip_image018

Figure 9 - Clustered File Server

On the file share is a 10GB data file (testfile.dat) that is being accessed by an application on the SMB client computer (FSF-260403-10). The below screenshot shows the SMB Client Shares performance counters for \\smbfs\appstorage share as seen from the SMB Client. As you can see the application is doing 8KB reads and writes.

clip_image020

Figure 10 - Data Access

Zeroing in on data requests/sec in graph form, we see the following:

clip_image022

In an elevated PowerShell shell on the cluster node where the clustered file server is online, do the following to:

Get-SmbOpenFile | Select *

ClientComputerName : [2001:4898:e0:32af:890b:6268:df3b:bf8]

ClientUserName : SMBTEST\Administrator

ClusterNodeName :

ContinuouslyAvailable : True

Encrypted : False

FileId : 4415226380557

Locks : 0

Path : F:\Shares\appstorage\testfile.dat

Permissions : 1180059

ScopeName : SMBFS

SessionId : 4415226380341

ShareRelativePath : testfile.dat

Planned move of the cluster group

With assurance that the file handle is indeed continuously available, let’s go ahead and move the cluster group to another cluster node. In an elevated PowerShell shell on one of the cluster nodes, do the following to move the cluster group:

Move-ClusterGroup -Name smbfs -Node FSF-260403-08

Name      OwnerNode       State

----      ---------       -----

smbfs     FSF-260403-08   Online

Looking at Data Requests/sec in Performance Monitor, we see that there is a short brown-out where IO is stalled of a few seconds while the cluster group is moved, but continues uninterrupted when the cluster group has completed the move.

The tear down and setup of SMB session, connections and active handles between the SMB client and the SMB server on the cluster nodes is handled completely transparent to the application. The application does not see any errors during this transition, only a brief stall in IO.

clip_image024

Figure 11 - Move Cluster Group

Let’s take a look at the operational log for SMB Client in Event Viewer (Applications and Services Log – Microsoft – Windows – SMB Client – Operational) on the SMB Client computer.

In the event log we see a series of warning events around 9:36:01PM. These warning events signal the tear down of SMB connections, sessions and shares. There is also a series of information events around 9:36:07PM. These information events signal the recovery of SMB sessions, connections and shares. These events are very useful in understanding the activities during the recovery and that the recovery was successfulJ

clip_image026

Figure 12 - Events for planned move

So how does SMB Transparent Failover actually work? When the SMB client initially connects to the file share, the client determines whether the file share has the continuous availability property set. If it does, this means the file share is a clustered file share and supports SMB transparent failover. When the SMB client subsequently opens a file on the file share on behalf of the application, it requests a persistent file handle. When the SMB server receives a request to open a file with a persistent handle, the SMB server interacts with the Resume Key filter to persist sufficient information about the file handle, along with a unique key (resume key) supplied by the SMB client, to stable storage.

If a planned move or failure occurs on the file server cluster node to which the SMB client is connected, the SMB client attempts to reconnect to another file server cluster node. Once the SMB client successfully reconnects to another node in the cluster, the SMB client starts the resume operation using the resume key. When the SMB server receives the resume key, it interacts with the Resume Key filter to recover the handle state to the same state it was prior to the failure with end-to-end support (SMB client, SMB server and Resume Key filter) for operations that can be replayed, as well as operations that cannot be replayed. Resume Key filter also protects the handle state after failover to ensure namespace consistency and that the client can reconnect. The application running on the SMB client computer does not experience any failures or errors during this operation. From an application perspective, it appears the I/O operations are stalled for a small amount of time.

To protect against data loss from writing data into an unstable cache, persistent file handles are always opened with write through.

Unplanned failure of the active cluster node

Now, let’s introduce an unplanned failure. The cluster group was moved to FSF-260403-08. Since all these machines are running as virtual machines in a Hyper-V setup, I can use Hyper-V manager to reset FSF-260403-08.

Looking at Data Requests/sec in Performance Monitor, we see that there is a slightly longer brown-out where IO is stalled. In this time period cluster detects that FSF-260403 has failed and starts the cluster group on another node. Once started, SMB can perform transparent recovery.

clip_image028

Figure 13 – Unplanned Failure

And again the SMBClient event log shows events related to the event:

clip_image030

Figure 14 – Events for unplanned failure

Now you will probably ask yourself: “Wait a minute. SMB is running over TCP and TCP timeout is typically 20 seconds and SMB uses a couple of them before determining the cluster node failed. So how come the recovery is ~10 seconds and not 40 or 60 seconds??”

Enter Witness service.

Witness service was created to enable faster recovery from unplanned failures, allowing the SMB client to not have to wait for TCP timeouts. Witness is a new service that is installed automatically with the failover clustering feature. When the SMB client initially connects to a cluster node, the SMB client notifies the Witness client, which is running on the same computer. The Witness client obtains a list of cluster nodes from the Witness service running on the cluster node it is connected to. The Witness client picks a different cluster node and issues a registration request to the Witness service on that cluster node. The Witness service then listens to cluster events related to the clustered file server the SMB client is connected to.

If an unplanned failure occurs on the file server cluster node the SMB client is connected to, the Witness service on the other cluster node receives a notification from the cluster service. The Witness service notifies the Witness client, which in turns notifies the SMB client that the cluster node has failed. Upon receiving the Witness notification, the SMB client immediately starts reconnecting to a different file server cluster node, which significantly speeds up recovery from unplanned failures.

You can examine the state of the Witness service across the cluster using the Get-SmbWitnessClient command. Notice that Get-SmbWitnessClient can be run on any cluster node and provides a cluster aggregate view of Witness service, similar to Get-SmbOpenFile and Get-SmbSessions. In an elevated PowerShell shell on one of the cluster nodes, do the following to:

Get-SmbWitnessClient | select *

State : RequestedNotifications

ClientName : FSF-260403-10

FileServerNodeName : FSF-260403-08

IPAddress : 2001:4898:E0:32AF:3256:8C83:59E5:BDB5

NetworkName : SMBFS

NotificationsCancelled : 0

NotificationsSent : 0

QueuedNotifications : 0

ResourcesMonitored : 1

WitnessNodeName : FSF-260403-07

Examining the above output (run before the unplanned failure), we can see the SMB client (FSF-260403-10) is currently connected to cluster node FSF-260403-08 (SMB connection) and has registered for witness notification for SMBFS with Witness service on FSF-260403-07.

Looking at Event Viewer (Applications and Services Log – Microsoft – Windows – SMBWitnessClient – Operational) on the SMB Client computer, we see that the Witness client received notification for SMBFS. Since the cluster group was moved to FSF-260403-07, which is also the Witness node for the Witness client, the following event shows the Witness client unregistering from FSF-260403-07 and registering with FSF-260403-09.

clip_image032

Figure 15 - Witness event log

Tips and Tricks

Protecting file server services

LanmanServer and LanmanWorkstation runs in service hosts with other services. In extreme cases other services running in the same service hosts can affect the availability of LanmanServer and LanmanWorkstation. You can configure these services to run in their own service host using the following commands:

sc config lanmanserver type= own

sc config lanmanworkstation type= own

The computer needs to be restarted for this change to take effect.

Loopback configurations

Accessing a file share, that has continuously available property set, as a loopback share is not supported.

For example, SQL Server or Hyper-V storing their data files on SMB file shares must run on computers that are not a member of the file server cluster for the SMB file shares.

Using legacy tools

When creating file shares, the continuous availability property is set by default on tools introduced in Windows Server 2012, including the new file share creation wizard and the New-SmbShare command. If you have automation built around using older tools, such as NET SHARE or Explorer or using the NET APIs the continuous availability property will not be set by default and these tools do not support setting it. To work around this issue you can set the following registry key, which will cause all shares to be created with the property set regardless if they support it or not:

Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters" EnableCaAlways -Value 1 –Force

Witness service

By default the network traffic between the Witness Client and Witness Server requires mutual authentication and is signed. However the traffic is not encrypted, as it doesn’t contain any user data. It is possible to enable encryption of Witness network traffic.

To configure the Witness client to send traffic encrypted, set the following registry key on each client:

Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters" WitnessFlags -Value 1 –Force

To configure the Witness Service to not accept unencrypted traffic, set the following registry key on each cluster node:

Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\SMBWitness\Parameters" Flags -Value 1 –Force

Disabling NetBios over TCP/IP

I have seen disabling NetBios over TCP/IP speed up failover times. To disable NetBios over TCP/IP for an interface, do the following in Network Connections:

· Select the interface you want to modify, right-click and select Properties

· In interface properties, select Internet Protocol Version 4 (TCP/IPv4) and click Properties

· In Internet Protocol Version 4 (TCP/IPv4) Properties, click Advanced

· In Advanced TCP/IP Settings, click the WINS tab

· On the WINS tab, select the Disable NetBIOS over TCP/IP radio button

When disabling NetBIOS over TCP/IP it should be configured for all network interfaces on all cluster nodes.

clip_image034

Figure 16 – Disable NetBIOS over TCP/IP

Disable 8.3 name generation

SMB Transparent Failover does not support cluster disks with 8.3 name generation enabled. In Windows Server 2012 8.3 name generation is disabled by default on any data volumes created. However, if you import volumes created on down-level versions of Windows or by accident create the volume with 8.3 name generation enabled, SMB Transparent Failover will not work. An event will be logged in (Applications and Services Log – Microsoft – Windows – ResumeKeyFilter – Operational) notifying that it failed to attach to the volume because 8.3 name generation is enabled.

You can use fsutil to query and setting the state of 8.3 name generation system-wide and on individual volumes. You can also use fsutil to remove previously generated short names from a volume.

Conclusion

I hope you enjoyed this introduction to SMB Transparent Failover and agree how this feature is useful to provide continued access despite needing to occasionally restart servers when performing software or hardware maintenance or in the unfortunate event where a cluster node fails. Providing continued access to file share during these events is extremely important, especially for workloads such as Microsoft Hyper-V and Microsoft SQL Server.

I am looking forward to dive into Scale-Out File Server in a future post.

Claus Joergensen

Principal Program Manager

Windows File Server Team

Comments
  • Awesome! I can't wait to set this up at home!! I think this is something that a lot of us have been waiting for for a very long time, and it's really exciting to see it finally here!

  • So reading this, I gathered that this feature is not transparent to the client? The client needs to support this  level of high availability. Is that correct?

  • Will this work over the WAN?  What kind of bandwidth is necessary?

  • Errol, SMB Transparent Failover is designed to be transparent to the application, but the SMB client is involved in the recovery after a planned or unplanned failover. the SMB client must support SMB3.0, which is first released with Windows 8 and Windows Server 2012.

    Steven, SMB Transparent Failover is primarily targeting server apps such as Hyper-V and SQL Server, which are expected to run in the same datacenter as the Windows File Server. There is nothing in the design that explicitly prevents usage over a WAN and there is only a few extra packets going over the wire during initial connect for Witness etc.

  • Thank you for your last reply.  Great Blog.

    I now have a question regarding the SMB 3.0 & file share failover cluster.  Can you please tell me, when building a 2012 cluster, is it possible to use the local direct storage on each node?  Or do I have to use our SAN or other shared storage solution?

    I've searched for the answer without success. Thanks again for your help!

    -Steven J.

  • Steve, Windows Server 2012 Failover Clustering requires shared storage. In addition to FC or iSCSI SAN solutions, Windows Server 2012 includes the new Storage Spaces feature which uses shared JBODs.

  • Great post.

    Dose SMB 3.0 extents to Linux server thru third party or over Hype-V? It would be great to have this functionality in a mixed OS environment.

  • Nice article. useful

  • Thank you. Very nice article.

  • Add link

    Creating Continuously Available File Shares with Windows Server “8” Beta

    www.microsoft.com/.../details.aspx

  • Hi claus,

    Nice article, thanks.

    Can you please explain more about the resume key filter?

    What is it?

  • This is good info to know and thank you for the article, but I have several servers giving me this error..

    SMB client failed to resume CA handle as Resume Key Filter is not acknowledging the handle

    Any idea what could be causing the Resume Key Filter failure?

  • NFS 4 has supported the idea of rotating mount shares, where although not continuously available, the client fails over to the other node when the first node is down. I wonder if Windows Server had/has a similar feature.

    But I agree, smb failover is LOOOONG overdue. share downtimes are one of the most common occurrences in networks and this is a killer feature.

  • Not completely transparent!

    SMB Client has to retry connection with SMB Server and application does not see this!

    /Nirmal

  • There is also an easiest way to test this:

    1. Create a text file on SMB Share. Let's say File.TXT

    2. put your name in it.

    3. Run below command from SMB Client (Win 8 or 2012)

    FOR /L %L IN (1,1,100000) DO Type \\SMB_CAP\SMBShareName\File.TXT

    4. Open another command window and type below command to stop cluster service on the node where SMB share is active.

    SC \\nodename stop clussvc

    At this point, shared folder is live migrated to available node in the cluster.

    5. While the folder is being live migrated, FOR loop halts for a few seconds (it might take longer)

    Question:

    1. How this process is going to benefit the applications which might time out if no response is received within the time out value defined by the application? For example, application should receive response within 5 seconds before it shows error.

    2. Does an application have to implement SMB client specific functions so it is SMB aware?

    Thanks!

    Nirmal

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment