Troubleshooting ‘Redirected Access’ on a Cluster Shared Volume (CSV)

Troubleshooting ‘Redirected Access’ on a Cluster Shared Volume (CSV)

  • Comments 17
  • Likes

 Cluster shared Volumes (CSV) is a new feature implemented in Windows Server 2008 R2 to assist with new scale-up\out scenarios.  CSV provides a scalable fault tolerant solution for clustered applications that require NTFS file system access from anywhere in the cluster.  In Windows Server 2008 R2, CSV is only supported for use by the Hyper-V role. 

The purpose of this blog is to provide some basic troubleshooting steps that can be executed to address CSV volumes that show a Redirected Access status in Failover Cluster Manager.  It is not my intention to cover the Cluster Shared Volumes feature.  For more information on Cluster Shared Volumes consult TechNet.

Before diving into some troubleshooting techniques that can be used to resolve Redirected Access issues on Cluster Shared Volumes, let’s list some of the basic requirements for CSV as this may help resolve other issues not specifically related to Redirected Access.

  • Disks that will be used in the CSV namespace must be MBR or GPT with an NTFS partition. 
  • The drive letter for the system disk must be the same on all nodes in the cluster.
  • The NTLM protocol must be enabled on all nodes in the cluster.
  • Only the in-box cluster “Physical Disk” resource type can be added to the CSV namespace.  No third party storage resource types are supported.
  • Pass-through disk configurations cannot be used in the CSV namespace.
  • All networks enabled for cluster communications must have Client for Microsoft Networks and File and Printer Sharing for Microsoft Networks protocols enabled.
  • All nodes in the cluster must share the same IP subnets between them as CSV network traffic cannot be routed.  For multi-site clusters, this means stretched VLANs must be used.

Let’s start off by looking at the CSV namespace in a Failover Cluster when all things appear to be ‘normal.’  In Figure 1,  all CSV volumes show Online in the Failover Cluster Management interface.

clip_image002

Figure 1

Looking at a CSV volume from the perspective of a highly available Virtual Machine group (Figure 2), the Virtual Machine is Online on one node of the cluster (R2-NODE1), while the CSV volume hosting the Virtual Machine files is Online on another node (R2-NODE2) thus demonstrating how CSV completely disassociates the Virtual Machine resources (Virtual Machine; Virtual Machine Configuration) from the storage hosting them.

clip_image004

Figure 2

When all things are working normally (no backups in progress, etc…) in a Failover Cluster with respect to CSV, the vast majority of all storage I/O is Direct I/O meaning each node hosting a virtual machine(s) is writing directly (via Fibre Channel, iSCSI, or SAS connectivity) to the CSV volume supporting the files associated with the virtual machine(s).  A CSV volume showing a Redirected Access status indicates that all I/O to that volume, from the perspective of a particular node in the cluster, is being redirected over the CSV network to another node in the cluster which still has direct access to the storage supporting the CSV volume.  This is, for all intents and purposes, a ‘recovery’ mode.  This functionality prevents the loss of all connectivity to storage.  Instead, all storage related I/O is redirected over the CSV network.  This is very powerful technology as it prevents a total loss of connectivity thereby allowing virtual machine workloads to continue functioning.  This provides the cluster administrator an opportunity to evaluate the situation and live migrate workloads to other nodes in the cluster not experiencing connectivity issues. All this happens behind the scenes without users knowing what is going on.  The end result may be slower performance (depending on the speed of the network interconnect, for example, 10 GB vs. I GB) since we are no longer using direct, local, block level access to storage.  We are, instead, using remote file system access via the network using SMB.

There are basically four reasons a CSV volume may be in a Redirected Access mode. 

  • The user intentionally places the CSV Volume in Redirected Access mode.
  • There is a storage connectivity failure for a node in which case all I\O is redirected over a cluster network designated for CSV traffic to another node.
  • A backup of a CSV volume is in progress or failed.
  • An incompatible filter driver is installed on the node.

Lets’ take a look at a CSV volume in Redirected Access mode (Figure 3).

clip_image006

Figure 3

When a CSV volume is placed in Redirected Access mode, a Warning message (Event ID 5136) is registered in the System Event log. (Figure 4).

 clip_image008

Figure 4

For additional information on event messages that pertain specifically to Cluster Shared Volumes please consult TechNet.


Let’s look at each one of the four reasons I mentioned and propose some troubleshooting steps that can help resolve the issue.

1.  User intentionally places a CSV volume in Redirected Access mode:  Users are able to manually place a CSV volume in Redirected Access mode by simply selecting a CSV volume, Right-Click on the resource, select More Actions and then select Turn on redirected access for this Cluster shared volume (Figure 5).

clip_image010

Figure 5

Therefore, the first troubleshooting step should be to try turning off Redirected Access mode in the Failover Cluster Management interface.

2.  There is a storage connectivity issue:  When a node loses connectivity to attached storage that is supporting a CSV volume, the cluster implements a recovery mode by redirecting storage I\O to another node in the cluster over a network that CSV can use.  The status of the cluster Physical Disk resource associated with the CSV volume is Redirected Access and all storage I\O for the associated virtual machine(s) being hosted on that volume is redirected over the network to another node in the cluster that has direct access to the CSV volume.  This is by far the number one reason CSV volumes are placed in Redirected Access mode. Troubleshoot this as you would any other loss of storage connectivity on a server.  Involve the storage vendor as needed.  Since this is a cluster, the cluster validation process can also be used as part of the troubleshooting process to test storage connectivity.

Look for the following event ID in the system event log.

Log Name: System

Source: Microsoft-Windows-FailoverClustering

Date: 10/8/2010 6:16:39 PM

Event ID: 5121

Task Category: Cluster Shared Volume

Level: Error

Keywords:

User: SYSTEM

Computer: Node1.cluster.com

Description:Cluster Shared Volume 'DATA-LUN1' ('DATA-LUN1') is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device over the network through the node that owns the volume. This may result in degraded performance. If redirected access is turned on for this volume, please turn it off. If redirected access is turned off, please troubleshoot this node's connectivity to the storage device and I/O will resume to a healthy state once connectivity to the storage device is reestablished.

3.  A backup of a CSV volume fails:  When a backup is initiated on a CSV volume, the volume is placed in Redirected Access mode.  The type of backup being executed determines how long a CSV volume stays in redirected mode. If a software backup is being executed, the CSV volume remains in redirected mode until the backup completes.  If hardware snapshots are being used as part of the backup process, the amount of time a CSV volume stays in redirected mode will be very short.  For a backup scenario, the CSV volume status is slightly modified.  The status actually shows as Backup in progress, Redirected Access  (Figure 6) to allow you to better understand why the volume was placed in Redirected Access mode. When the backup application completes the backup of the volume, the cluster must be properly notified so the volume can be brought out of redirected mode.

clip_image012

Figure 6

A couple of things can happen here.  Before proceeding down this road, ensure a backup is really not in progress. The first thing that needs to be considered is that the backup completes but the application did not properly notify the cluster that it completed so the volume can be brought out of redirected mode.  The proper call that needs to be made by the backup application is ClusterClearBackupStateForSharedVolume which is documented on MSDN.  If that is the case, you should be able to clear the Backup in progress, Redirected Access status by simulating a failure on the CSV volume using the cluster PowerShell cmdlet Test-ClusterResourceFailure.  Using the CSV volume shown in Figure 6, an example would be –

Test-ClusterResourceFailure “35 GB Disk”

If this clears the redirected status, then the backup application vendor needs to be notified so they can fix their application.

The second consideration concerns a backup that fails, but the application did not properly notify the cluster of the failure so the cluster still thinks the backup is in progress. If a backup fails, and the failure occurs before a snapshot of the volume being backed up is created, then the status of the CSV volume should be reset by itself after a 30 minute time delay.  If, however, during the backup, a software snapshot was actually created (assuming the application creates software snapshots as part of the backup process), then we need to use a slightly different approach.

To determine if any volume shadow copies exist on a CSV volume, use the vssadmin command line utility and run vssadmin list shadows (Figure 7).

clip_image014

Figure 7

Figure 7 shows there is a shadow copy that exists on the CSV volume that is in Redirected Access mode. Use the vssadmin utility to delete the shadow copy (Figure 8).  Once that completes, the CSV volume should come Online normally.  If not, change the Coordinator node by moving the volume to another node in the cluster and verify the volume comes Online.

clip_image016

Figure 8

4.  An incompatible filter driver is installed in the cluster:  The last item in the list has to do with filter drivers introduced by third party application(s) that may be running on a cluster node and are incompatible with CSV.  When these filter drivers are detected by the cluster, the CSV volume is placed in redirected mode to help prevent potential data corruption on a CSV volume.  When this occurs an Event ID 5125[EC4]  Warning message is registered in the System Event Log.  Here is a sample message –

17416 06/23/2010 04:18:12 AM   Warning       <node_name>  5125    Microsoft-Windows-FailoverClusterin Cluster Shared Vol NT AUTHORITY\SYSTEM               Cluster Shared Volume 'Volume2' ('Cluster Disk 6') has identified one or more active filter drivers on this device stack that could interfere with CSV operations. I/O access will be redirected to the storage device over the network through another Cluster node. This may result in degraded performance. Please contact the filter driver vendor to verify interoperability with Cluster Shared Volumes.  Active filter drivers found: <filter_driver_1>,<filter_driver_2>,<filter_driver_3>

The cluster log will record warning messages similar to these –

7c8:088.06/10[06:26:07.394](000000) WARN  [DCM] filter <filter_name> found at unsafe altitude <altitude_numeric>
7c8:088.06/10[06:26:07.394](000000) WARN  [DCM] filter <filter_name>  found at unsafe altitude <altitude_numeric>
7c8:088.06/10[06:26:07.394](000000) WARN  [DCM] filter <filter_name>   found at unsafe altitude <altitude_numeric>

Event ID 5125 is specific to a file system filter driver.  If, instead, an incompatible volume filter driver were detected, an Event ID 5126 would be registered.  For more information on the difference between file and volume filter drivers, consult MSDN.

Note:  Specific filter driver names and altitudes have been intentionally left out.  The information can be decoded by downloading the ‘File System Minifilter Allocated Altitudes’ spreadsheet posted on the Windows Hardware Developer Central public website.

Additionally, the fltmc.exe command line utility can be run to enumerate filter drivers.  An example is shown in Figure 9.

clip_image018

Figure 9

Once the Third Party filter driver has been identified, the application should be removed and\or the vendor contacted to report the problem.  Problems involving Third Party filter drivers are rarely seen but still need to be considered.

UPDATE 4/9: A Hotfix has been released to address an issue where filter drivers can cause the 'redirected access' issue:

FIXED: Cluster Shared Volumes (CSV) in redirected access mode after installing McAfee VSE 8.7 Patch 5 or 8.8 Patch 1

Hopefully, I have provided information here that will get you started down the right path to resolving issues that involve CSV volumes running in a Redirected Access mode.

Thanks!

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support


Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Nice writeup!  I was writing up on this topic on my Blog .... but now I think it may be easier to just link to the pros:

    egotoobigtovirtualize.blogspot.com/.../hyper-v-and-csv-devil-is-in-details-pt1.html

  • Hi  jeffhugh,

    Windows server2008 is implementing new features frequently out of which the latest is Cluster shared Volumes (CSV). Now we can access system anywhere in the cluster because it provides scalable fault tolerant solution for clustered applications that has NTFS file system. This is a great info for us and i must really thankful to jeffhugh for giving such an updation about Windows server 2008.

    gloriatech.com/microsoft-windows-server-2008-setup.aspx

  • ------------ The drive letter for the system disk must be the same on all nodes in the cluster.

    what do you mean by the above statement? in CSV you dont required any drive letter as such because you are presenting a block level storage to the cluster.

    please update??

  • @NM-BG: For clarification, that means the drive you installed Windows should have the same letter across all nodes. Don't make your system partition c:\ on one node and d:\ on the other. CSV isn't referenced by a drive letter, it's a redirect point from %SystemPartition%\Cluster Storage\Volume X

  • when I try executing vssadmin delete shadows /shadow={...} i get the following error

    --

    Error: Snapshots were found, but they were outside of your allowed context.  Try removing them with the backup application which created them.

    --

    Any suggestions?

  • Really nice post!

  • Your article really helped me provide much needed assistance to a Networking Engineer I was assisting on a CritSit. I was able to learn about the issue quickly and troubleshoot immediately. Another great posting by The Man!

  • Hi,

    I have tried everything but nothing works. Hp lefthand is the SAN. Has anyone some more tips for me?

  • alternative to vssadmin for shadow copy management is command diskshadow

    with this command you will be able to delete shadows left by a broken backup application

  • SO DEEP! SO GOOD POST ! Another experience issue with Redirected Access mode don't switch back in direct I/O when DPM told to the Hyper-V Host through VSS Services that backup was finished all the VMs Clusters failed down.

    The reason Was that my VMs Integration omponents was not up to date next HW Drivers patching...

  • Thank you, thank you, thank you.... great helpful article!

  • I read: "Pass-through disk configurations cannot be used in the CSV namespace." Well, the 90 % of my virtual machines are running in Pass-through disks. The location of config files are in CSV's. About 100 servers. It's necesary change this configuration?. Must i create a LUN for each Virtual Machine, add this LUN like a Disk Driver and store there the configuration file and them hyper-v will locate there BIN and VSV files?

  • If you have an explorer window open, either local (C:\ClusterStorage\<CSV>) or via administratative share (\\host\c$\ClusterStorage\<CSV>) this will also bring the CSV into Redirected Access mode.

    Symptoms are: when moving the CSV to a different host or when using Test-ClusterResourceFailure the disk will go into 'Online' mode only to be placed back into 'Online (Redirected Access)' a few seconds later.

  • Nice writeup, sort of "on the perimeter" when it comes to understanding Hyper-V HA Clusters.  Had a problem with Backup Exec 2012 issuing a backup in progress error.  Turns out redirected access was on (we had a failure earlier in the month).  Tried to turn it off, it still indicated backup in progress.  Stopped the remote agents on both hosts, and 10 minutes later the redirected access went off by itself.

    Thanks, helped me a bunch!  :-)

    Kevin

  • If running McAfee in a W2K12 Failover Cluster, ensure McAfee patch levels match.  Also, fltmc.exe @ cmd reports only mfehidk   329998.99 <Legacy>, not the additional mfehidk  321300.00 0