Michael Vargo | Senior Support Escalation Engineer
Some customers are experiencing failures on Windows Server 2012 Hyper-V hosts that utilize Cluster Shared Volumes (CSVs) when backing up virtual machines (VM’s) using System Center 2012 Data Protection Manager (DPM 2012) with Service Pack 1 (SP1).
When this occurs the following error is logged on the Hyper-V hosts.
Log Name: System Source: Microsoft-Windows-FailoverClustering Event ID: 5120 Logged: <date/time> Details: Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.
You will also see the VM’s pause temporarily, go in to a paused state, shut down or see memory spikes on the Hyper-V hosts. Configuring serialized backups doesn't help the problem.
It has been determined that any backup product using shadow copies results in the same errors, thus this isn't necessarily a DPM 2012 SP1 issue. The Windows Server 2012 cluster CSV team is investigating this problem.
The Windows and DPM teams have released a new fixes to address some general Cluster and CSV backup issues and are available for download. There are no other configuration changes necessary after applying these updates.
DPM UPDATE KB2802159 - Description of Update Rollup 2 for System Center 2012 Service Pack 1 (http://support.microsoft.com/kb/2802159)
WINDOWS UPDATES KB2870270 - Update that improves cloud service provider resiliency in Windows Server 2012 (http://support.microsoft.com/kb/2870270)
KB2869923 - Physical Disk resource move during the backup of a Cluster Shared Volume (CSV) may cause resource outage (http://support.microsoft.com/kb/2869923)
Note that this replaces KB2848344 mentioned previously in the post. If you continue to see problems protecting Windows Server 2012 Hyper-V guests after installing this hotfix please open a support case for further investigation.
IMPORTANT NOTE: After installing these fixes, you may start to experience intermittent DPM 2012 SP1 Hyper-V Guest backup failures with the following error. This is caused by a slight regression in one of the above Windows fixes. This is currently under investigation and should be addressed in the next rollup.
Type: Recovery point Status: Failed Description: Change Tracking has been marked inconsistent due to one of the following reasons 1. Unexpected shutdown of the protected server 2. Unforeseen issue in DPM Bitmap failover during cluster failover of one or more datasources sharing the tracked volume. (ID 30501 Details: Unknown error (0xe0062009) (0xE0062009))
More information End time: 7/10/2013 2:03:22 AM Start time: 7/10/2013 12:00:34 AM Time elapsed: 02:02:47 Data transferred: 0 MB Cluster node CLNODE01.CONTOSO.LOCAL Recovery Point Type Express Full Source details: \Backup Using Child Partition Snapshot\VMNAMExx Protection group: Protection Group 1 DPM Auto-consistency check should fix the problem and allow future backups to succeed.
For more information on DPM Auto-CC feature see: http://blogs.technet.com/b/dpm/archive/2011/06/06/how-to-use-and-troubleshoot-the-auto-heal-features-in-dpm-2010.aspx
Michael Vargo | Senior Support Escalation Engineer | Microsoft GBS Management and Security Division
Get the latest System Center news on Facebook and Twitter:
System Center All Up: http://blogs.technet.com/b/systemcenter/ System Center – Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/ System Center – Data Protection Manager Team blog: http://blogs.technet.com/dpm/ System Center – Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/ System Center – Operations Manager Team blog: http://blogs.technet.com/momteam/ System Center – Service Manager Team blog: http://blogs.technet.com/b/servicemanager System Center – Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm
Windows Intune: http://blogs.technet.com/b/windowsintune/ WSUS Support Team blog: http://blogs.technet.com/sus/ The AD RMS blog: http://blogs.technet.com/b/rmssupp/
The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/ The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/ The Forefront TMG blog: http://blogs.technet.com/b/isablog/ The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/
Any idea when the final hotfix will be released? It's been quite some time now.
C'mon MSFT it's been like three months
also using lun and host serialization on dpm side will minimize the impact of error 5120/5142
Is this fixed? I see there is an update, but the TechNet forums look like folks are still experiencing issues.
I'm glad to see that R2 is coming out, but maybe instead some effort can be put into fixing this once and for all?
sad that there is nothing but SILENCE about this
Some of the confusion may stem from the fact that this is affected by a few different things, so the update in the post may fix things in one scenario but not for another. On a positive note, there is a new update that should make things work much more seamlessly.
KB2848344 - Update improves cluster resiliency in Windows Server 2012 (support.microsoft.com/.../2848344)
Note that this replaces KB2838669 mentioned in the post above. If you continue to see problems protecting Windows Server 2012 Hyper-V guests after installing this hotfix please open a support case for further investigation.
Installed both Hotfixes (KB2838669 & KB2848344) but still having the issue. This is ridiculous. How could Microsoft's own products not work together properly.
We still have this issue...on 2012 cluster and using dpm 2012 R2 and have the rollup hotfix KB2870270 installed.
This has been a problem on 2008 R2 cluster as-well, we have never been able to do host level backup's on the Microsoft clusters....