Cluster Disk May Fail to Online and Run ChkDsk when a Backslash ‘\’ or Forwardslash ‘/’ is used in the Resource Name

Cluster Disk May Fail to Online and Run ChkDsk when a Backslash ‘\’ or Forwardslash ‘/’ is used in the Resource Name

  • Comments 2
  • Likes

We wanted to do a quick write-up on an issue we’ve seen in rare circumstances with 2008 R2 clusters where a ‘Physical Disk’ resource may fail to come online with no events logged in the system event log.

Issue

They symptoms of this issue is that the disk will show in a ‘failed’ state in Failover Cluster Manager but there are no events in the system event log to correlate to the failure.

Disk

To see if you are running into this issue, you need to generate a cluster log on the node the disk fails.

From a command prompt:

‘cluster log /gen’

From the cluster log file located in \Windows\Cluster\Reports\cluster.log

ERR   [RES] Physical Disk <Cluster Disk X:\>: VerifyFS: Failed to open file \\?\GLOBALROOT\Device\Harddisk52\Partition1\Logfile.ldf Error: 5.

The problem occurs if and only if the following 2 conditions are present.

1. ‘Local System’ cannot open a handle to a file at root of drive (whether because it’s in use or permissions). In my example, you can see from the cluster log snippet that the file the cluster is trying to open is Logfile.ldf and getting an access denied ‘Error: 5’

AND

2. The name of the ‘Physical Disk’ resource has an backslash or forward slash character in the resource name. In my example, my disk name was ‘Cluster Disk X:\’

Generally, we don’t recommend storing files at the root of a disk as the cluster needs to open handles to files and folders as part of the health detection mechanism used to determine possible access issues to storage. Since the cluster service runs in the context of the ‘Local System’ account, if that account does not have permission to files at the root of a drive, the health check may fail.

Workaround

The simplest resolution is to remove the invalid character(s) in resource names for ‘Physical Disk’ resource types.

OR

Verify that the ‘Local System’ account has at least ‘Read’ permissions to files at the root of the drive.

In the above example, I renamed my resource from ‘Cluster Disk X:\’ to ‘Cluster Disk X:’. I could have also granted the ‘Local System’ account ‘Read’ permissions on the Logfile.ldf file

Additionally, the following event may be noted in the system event log after you correct the problem and online the disk for the first time.


ID:       1066
Level:    Warning
Source: Microsoft-Windows-FailoverClustering
Message:  Cluster disk resource 'Cluster Disk X:' indicates corruption for volume '\\?\Volume{aaeb0322-6921-11e0-a955-00155d50c903}'. Chkdsk is being run to repair problems. The disk will be unavailable until Chkdsk completes. Chkdsk output will be logged to file 'C:\Windows\Cluster\Reports\ChkDsk_ResCluster Disk X:_Disk2Part1.log'.
Chkdsk may also write information to the Application Event Log.

This does not indicate actual corruption on the disk. What happened is that cluster set the dirty bit on the disk so chkdsk is run to verify an intact file system.

Hopefully, this information will allow you to resolve the issue if you happen to run into it.

Jeff Hughes
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support
Himanshu Singh
Support Engineer
Microsoft Enterprise Platforms Support

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • I've seen these a number of time sin different clusters in my environment for Windows 2008 and 2008 R2. Manually setting Maintenance Mode in the registry forces them online as well. However that is ugly. This really should be addressable in a hotfix of some sort, perhaps in a SP be by at least providing a powershell/cluster cli command to override the dirty bit.

  • Hi Jeff. I have faced a similar problem but different symptom.Whenever I try to failover a resource to another node all the resources come online except the disk, which will be in offline stage until I manually make it online.Failover of other resources are happening (Name,clxresource etc),Any idea ?