Hello, Shane Brasher here. I’d like to go over some very basic troubleshooting steps used to narrow down some common causes on initial replication and just plain replication failures for System State and Bare Metal backups.
Troubleshooting a System State or BMR backup failures can be frustrating as there are a lot of underlying components that come into play during the process. This post is not to discuss the Data Protection Manager (DPM) architecture but rather basic troubleshooting by approaching the problem in a logical way. The steps below offer a guideline to resolving these issues.
Does a system state backup fail or does the BMR fail? This is an important question as the backup for system state and BMR are handled differently. A system state backup uses the Windows Server Backup feature to take a snapshot of the system state and saves it locally to the server before moving it to the DPM server storage pool. A BMR backup uses also uses the Windows Server backup feature but once the snapshot is taken, it moves the data directly to the DPM server without saving it locally first.
As you may already know, a BMR backup includes the system state. Realizing this, there are two questions:
a.) Is it the system state failing?
b.) Is the system state succeeding but it’s the BMR that failing.
This is easily narrowed down by just selecting the System State in the protection group. If this fails then we can narrow this down even further in the next step discussed later on. If the system state succeeds, then add the BMR to the protection. If this fails then we can narrow this down even farther by looking into the VSS writers, amount of space etc….. discussed later.
Can you take a system state or BMR backup locally via wbadmin or does this also fail? This is another key question to ask ourselves. Again, DPM triggers the Windows Server backup feature to take the system state snapshot. If the WSB on the local server is not functioning correctly for the snapshot, then DPM never comes into the picture. The WSB feature must be functioning properly for this to be a success. This can be done by the command prompt to test out this functionality.
For system state: Wbadmin start systemstatebackup -backuptarget:e: For BMR: wbadmin start backup -allcritical -backupTarget:<any existing drive name>:
If the commands above fail, then DPM never comes into the picture and an investigation as to what may be wrong with the WSB on that local server should be looked into. Event logs, Backup Event logs and WSB backup logs can be useful tools from this point.
Backup event logs:
DPM Event Logs:
Error#1 Log Name: DPM Alerts Source: DPM-EM Date: Event ID: 3106 Task Category: None Level: Error Keywords: Classic User: N/A Computer: HyperVServer.Contoso.com Description: The replica of System Protection on HyperVServer.Contoso.comis inconsistent with the protected data source. All protection activities for data source will fail until the replica is synchronized with consistency check. (ID: 3106) DPM is out of disk space for the replica. (ID: 58)
Error #2 Log Name: DPM Alerts Source: DPM-EM Date: Event ID: 3100 Task Category: None Level: Warning Keywords: Classic User: N/A Computer: HyperVServer.Contoso.com Description:
The used disk space on the computer running DPM for the replica of System Protection has exceeded the threshold value of 90%, or there is not enough disk space to accommodate the changed data. If you do not allocate more disk space, synchronization jobs may fail. (ID: 3100)
Leveraging WSB backup logs located in %windir%\logs\WindowsServerBackup . You will need to convert the etl logs in order to be read. This can be done via Logparser found at: LogParser:
Example: Note the highlighted in red below which clearly shows that the local server does NOT have enough free space available for the snapshot.
2]15cc.134c 04/08/2011-19:17:48.895 [blbengutils Blbvhdhelper.cpp@2153] EXIT: CBlbVhdHelper::GetVolumeVHDInfo 15cc.134c 04/08/2011-19:17:48.895 [blbengutils BlbFillCatalogTemplateInfo BlbCatalogUtils.cpp@1542] INFO:ullTotalSourceSpace = (1745592774144) ullTotalSourceFreeSpace = (819589386752)ullExcludedFileSize = (34346196992) 15cc.134c 04/08/2011-19:17:48.895 [blbengutils BlbFillCatalogTemplateInfo BlbCatalogUtils.cpp@1554] ERROR:Backup target space is not enough, TotalSourceSpace(1745592774144), TotalSourceFreeSpace(819589386752), TotalExcludedFileSize(34346196992), ReclaimableSize(0), CurrentBackupSize(0), TotalTargetFreeSpace(80434085888) 15cc.134c 04/08/2011-19:17:48.895 [blbengutils BlbSecurityUtils.cpp@1203] ENTER: CBlbImpersonationHelper::Revert 15cc.134c 04/08/2011-19:17:48.895 [service engine.cpp@4046] EXIT: CBlbEngine::CreateTemplate 14EC.14A4::04/08/2011-19:17:48.895 [clinew]CreateTemplate failed: 0x80780048 14ec.14a4 04/08/2011-19:17:48.895 [clinew backup.cpp@2401] ENTER: PublishBackupFailureEvents 14ec.14a4 04/08/2011-19:17:48.895 [clinew CBLBCli::OutputOnConsole blbcli.cpp@842] INFO:CLIOUTPUT:There is not enough free space on the backup storage location to back up the data. 14ec.14a4 04/08/2011-19:17:48.895 [util blbtrace.cpp@853] ENTER: BlbStopTracing
Are there any vss related errors in either the event log of the DPM or target server? For all backups, we leverage Volume Shadow Copy service (VSS) to take the snapshots. If VSS is not healthy, then a successful backup to the DPM server may not be successful. If an issue with VSS is severe enough, then a volume may actually be thrown into “shadow copy protected” mode and successful backups will fail until this is rectified. A good proactive measure can be taken by making sure you have the latest VSS by way of Windows Update and\or the latest version via hotfix.
What’s the state of the VSS writers? Is it performing a snapshot? Check the health of the vss writers on the protected server by the use of a simple command from a command prompt.
“vssadmin list writers”--do the writers show up in a failed or hung state. A “waiting for completion” is normail but a failed state indicates an issue. Make sure the ASR (Automated System Recovery) writer is in a healthy state. “vssadmin list shadows”- --do you see a snapshot being taken? If you never see a snapshot being taken on the protected server before being transferred to the DPM server, then a harder look at the protected server needs to be done.
Are the page files for both the target and DPM server set correctly? Page file allocation is very important on both the DPM server and the target server. This is even more so if the server is under a heavy load. As a general rule of thumb a server needs to have 1.5 x RAM installed for a pagefile. DPM is a little more demanding than that as covered in: http://technet.microsoft.com/en-us/library/ff399244.aspx : “DPM requires a pagefile size that is 0.2 percent the size of all recovery point volumes combined, in addition to the recommended size (generally, 1.5 times the amount of RAM on the computer). For example, if the recovery point volumes on a DPM server total 3 TB, you should increase the pagefile size by 6 GB.”
I highlighted the word “requires” to emphasis that this is not really a suggestion or an option so much as DPM will need this as a minimal requirement in order to function optimally.
Do you have enough free space on both the target server and the DPM server? On the target server, if there is not adequate free space for the snapshot of the system state to be saved, then this effort will fail. If you have more than one disk at your disposal, then you can work around this by altering the PSDatasourceConfig.xml file on the protected server to save the system state to another location. This article explains the process to accomplish this: Backup of Protected Computer System State http://technet.microsoft.com/en-us/library/bb809015.aspx As a general rule of thumb the system state will typically require 15 GB of space on the computer. Of course this will vary from server to server depending on the bloat of the registry.
If there is not enough free space on the DPM server replica or recovery point volume, then you will need to allocate more space for this datasource in the protection group. This can be done by following the steps outlined in “How to Modify Disk Allocation” http://technet.microsoft.com/en-us/library/ff399705.aspx If you do not have enough space in the storage pool, then you can add another disk to accommodate your growth. Adding Disks to the Storage Pool http://technet.microsoft.com/en-us/library/ff399691.aspx
Let’s assume that you have allocated additional space in the replica volume and recovery point volume for some growth and you’ve selected to option for autogrow. If you have issues with autogrow failing, then you can reference the following article on how to troubleshoot as to what may be the cause.
How to use and troubleshoot the Auto-heal features in DPM 2010 http://blogs.technet.com/b/dpm/archive/2011/06/06/how-to-use-and-troubleshoot-the-auto-heal-features-in-dpm-2010.aspx
Is the windows server backup service installed? As mentioned before, this feature must be installed on the target server in order for either the system state or BMR backup to take place. This is an easy thing to check for with in server manager and can be added as a feature.
Common error if the WSB role is not installed:
“DPM failed to create the system state backup. If you are trying to create the system state of a Windows 2008 Server operating system, verify that the Windows Server Backup (WSB) is installed, and that there is enough free disk space on the protected server to store the system state. (ID 30214 Details: Internal error code: 0x809909FB)
Be aware of possible conflicting jobs by other applications. Leverage the system and application event logs and the trace logs. The DPM trace logs found in %Program Files%\Microsoft DPM\DPM\Temp, are name MSDPMCurr.errlog etc…. They can be opened with notepad an analyzed. Although not always intuitive, you can find the relevant pieces to note by searching for the Protection Group name or the server name.
Trace logs example: ************* 12C0 17C0 03/09 15:22:02.094 01 TaskExecutor.cs(334) D3BA345D-6222-4E34-86FB-37C12608671D WARNING <q1:Parameter Name="protectedgroup" Value="TestServer-System_State-BMR" />
12C0 17C0 03/09 15:22:02.094 01 TaskExecutor.cs(334) D3BA345D-6222-4E34-86FB-37C12608671D WARNING <q1:Parameter Name="datasourcename" Value="System Protection" />
12C0 17C0 03/09 15:22:02.094 01 TaskExecutor.cs(334) D3BA345D-6222-4E34-86FB-37C12608671D WARNING <q1:Parameter Name="servername" Value="TestServer.contoso.com" />
12C0 17C0 03/09 15:22:02.094 01 TaskExecutor.cs(269) D3BA345D-6222-4E34-86FB-37C12608671D NORMAL Task retired abnormally (error=SimilarTaskExistsForDatasource; 0; None)
Note the error=SimilarTaskExistsForDatasource
Question 1) When DPM 2010 backups the system state to tape, does it backup system state straight to tape, or does it backup to a folder on local disk and then backups up the systemstatebackup folder to tape?
Answer 1) Windows server backup creates a local Systemstate backup, then DPM will backup that folder to the tape.
Question 2) Does the systemstatebackup folder get deleted after the backup is complete or does the folder remain on the local disk?
Answer 2) The folder remains on the local disk and will be overwritten by the next systemstate backup.
Questions 3) Upon a BMR backup I don’t see a WindowsImageBackup folder created or I don’t see it’s contents change when the BMR backup is taking place?
Answer 3) BMR backups are directly written to the storage pool disk.
Question 4) Can you perform a BMR to directly to tape?
Answer 4) No you cannot. DPM will give you the following warning:
Question 5) A BMR backup on one server is much larger than a BMR backup on another as a comparison?
Answer 5) This will depend on what is in the critical volume and registry size. The BMR backup will include a critical volumes.
Example: I once worked on an issue to where a BMR backup of one HyperV server was taking over 1 TB of space on the DPM storage pool. The cause was that they had all of the VHD’s on the C:\. Since a BMR backup includes the critical volume, in this case C:\, it will add the VHD’s to that backup. This required a huge amount of space needed to accommodate the VHD’s.
Hopefully for those of you who have issues with System State and BMR backups, this blog post can offer you some assistance on making some progress when addressing those problems.
System State: http://www.microsoft.com/showcase/en/us/details/bb0b5339-445b-4298-8705-350f13227b93
139822 How to Restore a Backup to Computer with Different Hardware http://support.microsoft.com/default.aspx?scid=kb;EN-US;139822
263532 How to perform a disaster recovery restoration of Active Directory on a computer with a different hardware configuration http://support.microsoft.com/default.aspx?scid=kb;EN-US;263532
249694 How to move a Windows installation to different hardware http://support.microsoft.com/default.aspx?scid=kb;EN-US;249694
Backup of Protected Computer System State http://technet.microsoft.com/en-us/library/bb809015.aspx
How ASR Works http://technet.microsoft.com/en-us/library/cc758365(WS.10).aspx Automated System Recovery (ASR) in Windows Server 2008 and Vista SP1 http://blogs.technet.com/b/filecab/archive/2008/02/11/automated-system-recovery-asr-in-windows-server-2008-and-vista-sp1.aspx
Deciding between System State Backup and Allcritical Backup in Windows Server 2008 http://blogs.technet.com/b/filecab/archive/2009/05/04/deciding-between-system-state-backup-and-allcritical-backup-in-windows-server-2008.aspx
Shane Brasher | Senior Support Escalation Engineer
App-V Team blog: http://blogs.technet.com/appv/ AVIcode Team blog: http://blogs.technet.com/b/avicode ConfigMgr Support Team blog: http://blogs.technet.com/configurationmgr/ DPM Team blog: http://blogs.technet.com/dpm/ MED-V Team blog: http://blogs.technet.com/medv/ OOB Support Team blog: http://blogs.technet.com/oob/ Opalis Team blog: http://blogs.technet.com/opalis Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/ OpsMgr Support Team blog: http://blogs.technet.com/operationsmgr/ SCMDM Support Team blog: http://blogs.technet.com/mdm/ SCVMM Team blog: http://blogs.technet.com/scvmm Server App-V Team blog: http://blogs.technet.com/b/serverappv Service Manager Team blog: http://blogs.technet.com/b/servicemanager System Center Essentials Team blog: http://blogs.technet.com/b/systemcenteressentials WSUS Support Team blog: http://blogs.technet.com/sus/
"263532 How to perform a disaster recovery restoration of Active Directory on a computer with a different hardware configuration" - This is an article about Windows 2000! Who could possibly be interested in that??