Windows Server 2008 R2 Failover Clustering - Avoiding Chkdsk on Failover

Overview

In Windows Server 2003 if corruption on a physical disk was encountered the chkdsk process was run to attempt to repair the corruption. The repair attempt would occur when a disk resource was moved or failed over to another cluster node. The disk was taken offline, moved and then would be brought online. During the online stage the disk was checked to see if the dirty bit had been set and if so a chkdsk would be run.

Cluster admins would see the cluster disk in an online pending state. Once the chkdsk was completed the cluster disk would be brought online. Users would not be able to access the shares on the disk and would result in calls to the help desk. If this was a SQL Server installation using the disk SQL Server would not be brought online as it has a dependency on the disk. If this was the quorum disk the whole cluster would be offline until the chkdsk completed.

New Parameter in Windows Server 2008 R2

In Windows 2008 R2 Failover Clustering is now a new parameter (DiskRunChkDsk) that can be set, using cluster.exe or PowerShell, that determines what action to take if the dirty flag has been set on a cluster disk. The options for the setting are described below.

Value Description
0 (default) ShallowCheck open files in root of volume. Check dirty bit
1 FullCheck recursively on all files. Check dirty bit
2 Run chkdsk every time the volume is mounted
3 ShallowCheck. If corrupt run chkdsk. If not corrupt run chkdsk in read-only mode. Online will proceed when chkdsk is running in read-only mode
4

Never run chkdsk

5 FullCheck. If corrupt do not bring online. User intervention required.

Check Dirty Flag On Disks

There are several ways to check if the disk is marked as dirty on Windows Server 2008 R2. You can do this using a scripting language to interrogate WMI, cluster.exe, PowerShell or FSUtil. I prefer the PowerShell way as you can check all disks which are available on the node from where you run the command.

Checks all logical disks on a node for the dirty bit using PowerShell

Get-WMIObject Win32_LogicalDisk | ft DeviceID, VolumeDirty

These commands will allow you to check the dirty bit on a specified disk
Chkntfs X:

Fsutil dirty query x:

For testing you can also manually set the dirty bit of a disk
Fsutil dirty set x:

Check & Change the Default Setting

By default the parameter has a setting that will run a shallow check on the disk if it is marked as dirty when the disk is brought online. To totally avoid running chkdsk you change the parameter to 4. This will ensure chkdsk is not run even if he dirty bit has been set. You must ensure there are procedures in place to allow you manually check the disks and plan maintenance to run chkdsk on the disks manually if the flag is set.

 

Below are the PowerShell commands to check the parameter and set it's value.

 

# Check the parameter on a disk

Get-ClusterResource "<Cluster Disk Name>" | Get-Parameter | fl DiskRunChkDsk

 

# Example

Get-ClusterResource "Witness Disk" | Get-Parameter | fl DiskRunChkDsk

 

From the screen shot below we can see the current value of the property DiskRunChkDsk

 

 

# Set the parameter on a disk ( is the x is the number we want to set)

Get-ClusterResource "<Cluster Disk Name>" | Set-Parameter DiskRunChkDsk = x

 

# Example

Get-ClusterResource "Witness Disk" | Set-Parameter DiskRunChkDsk = 4

 

From the screen shot below we can see the current value of the property DiskRunChkDsk has changed to 4

 

 

You can check if the dirty bit has been set on all CSV volumes using the command

Get-WMIObject Win32_Volume | ft Caption, DirtyBitSet -autosize

 

For CSVs you can run the following commands to check and set the value of the parameter DiskRunChkDsk

 

# Check the values of the parameters

Get-ClusterSharedVolume | Get-ClusterParameter

 

# Set the value of the DiskRunChkDsk parameter

Get-ClusterSharedVolume | Set-ClusterParameter DiskRunChkDsk 4

 

Aeval Shah

Premier Field Engineer - Failover Clustering & Hyper-V