Data Protection Manager Agent Network Troubleshooting - System Center: Data Protection Manager Engineering Team Blog - Site Home - TechNet Blogs

Data Protection Manager Agent Network Troubleshooting

Data Protection Manager Agent Network Troubleshooting

  • Comments 6
  • Likes

Toolbox3Hi everyone, Shane Brasher here with some tips on how to troubleshoot networking issues related to the DPM Agent.  The goal of this article is not to make you a networking expert or to provide in-depth networking training, but rather provide you with basic skills and knowledge of specific tools used to assess communication problems with Data Protection Manager (DPM) traffic.

One of the most common issues that is seen with the System Center Data Protection Manager Agent here in product support is connectivity issues and port blockage. This document will go over some of the key troubleshooting methodologies used in addressing these types of issues such as:

a.) You can’t push out your DPM agent to a server via the DPM management console.
b.) You manually install your agent, but the communication is still not working with the DPM server.

Naturally, if a connectivity or port blockage issue is diagnosed, then the DPM administrator may have to work with the Networking Administrator or the Directory Services Administrator in certain circumstances.  For example, if you suspect that the routing tables on a router are missing a route due to an unsuccessful tracert result, then a collaborative effort with the Networking Administrator will be needed to check out the router.  Another example is if the Windows Integrated Firewall for the servers on your domain is configured via GPO and does not have the rule exception for the DPMRA ports, then the Active Directory Administrator will need to be collaborated with in order to change the GPO.

The DPM Agent - What is the agent for? What purpose does it serve?

The DPM agent is a component installed on a server for which we intend to backup by Data Protection Manager. It is what performs the function of tracking changed blocks of data selected to be backed up and is also responsible for transferring data being backed up to the Data Protection Manager Server.

The Startup Type for the DPM agent service (DPMRA) is manual and runs as a Local System Account on each protected machine. The DPM agent will only be started when contacted by the DPM server when a job is scheduled to run. Once a scheduled job has completed DPMRA service will remain running for five minutes before the service is stopped.

The protection agent software consists of two components: the protection agent itself and an agent coordinator. The agent coordinator is software that is temporarily installed on a protected computer during installation, update, or un-installation of a protection agent.

DPM CONNECTIVITY

Before we start looking at some of the tools used for troubleshooting we first need to know which ports that DPM uses for its operations. The below article lists the ports in use for DPM 2007, DPM 2010 and DPM 2012.

NOTE There is an additional port for DPM 2012 certificate use that will be addressed in another article.

DPM PORTS

http://technet.microsoft.com/en-us/library/ff399341.aspx

Protocol

Port

Details

DCOM

135/TCP
Dynamic

The DPM control protocol uses DCOM. DPM issues commands to the protection agent by invoking DCOM calls on the agent. The protection agent responds by invoking DCOM calls on the DPM server.

TCP port 135 is the DCE endpoint resolution point used by DCOM.

By default, DCOM assigns ports dynamically from the TCP port range of 1024 through 65535.

TCP

5718/TCP
5719/TCP

The DPM data channel is based on TCP. Both DPM and the protected computer initiate connections to enable DPM operations such as synchronization and recovery.

DPM communicates with the agent coordinator on port 5718 and with the protection agent on port 5719.

DNS

53/UDP

Used between DPM and the domain controller, and between the protected computer and the domain controller, for host name resolution.

Kerberos

88/UDP 88/TCP

Used between DPM and the domain controller, and between the protected computer and the domain controller, for authentication of the connection endpoint.

LDAP

389/TCP
389/UDP

Used between DPM and the domain controller for queries.

NetBIOS

137/UDP
138/UDP
139/TCP
445/TCP

Used between DPM and the protected computer, between DPM and the domain controller, and between the protected computer and the domain controller, for miscellaneous operations. Used for SMB directly hosted on TCP/IP for DPM functions.

Now that we know which ports are needed, how do we determine between two points if the ports are being blocked or not? We have many tools both built in or that can be downloaded at our disposal.

Let’s start off simple with a few command line tests that can be done that are quick and easy. There are other networking variables to consider such as DNS, arp cache, IPsec etc... but for purposes of this article we are keeping things as simple as possible and are just covering some basic tests that anyone can do with just a little practice.

Troubleshooting tools for DPM connectivity include:

a.) “Ping” to test out name resolution and it traffic can route properly to the destination.
b.) "tracert" to test out the routing
c.) “Net view” accessibility to the server itself.
d.) “Sc” command line to test out RPC connectivity
e.) “WBEMTEST” to test out our DCOM connection.
e.) “Wmic” to test out our DCOM connection.
f.) Netstat to list the ports in use.
g.) Tasklist to list the currently running processes
h.) Tcpview gui for port listing
i.) Integrated firewall logging
j.) Netmon
k.) Toggling Chimney and RSS

The Ping command - “Can I get there from here?”

Ping is probably one of the most widely used built in tools used to test out overall connectivity. I won’t go into all of the switches that can be used, but will briefly cover the basic use and just a few switches for ping.

Ping Test #1 Testing overall communication

The most simple ping test is just pinging the host name of the server, it’s as simple as that.
From a command prompt type: ping <ServerName>

Example:

C:\Users\administrator>ping MemberServer

Pinging MemberServer.contoso.com [10.10.10.10] with 32 bytes of data:

Reply from 10.10.10.10: bytes=32 time=6ms TTL=128
Reply from 10.10.10.10: bytes=32 time<1ms TTL=128
Reply from 10.10.10.10: bytes=32 time=1ms TTL=128
Reply from 10.10.10.10: bytes=32 time=1ms TTL=128

Ping statistics for 10.10.10.10:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 6ms, Average = 2ms

The successful reply tells us that not only can we resolve the name to an IP address but that we know how to get there, so routing is probably working. We will go over testing out the routing via tracert command later.

Ping Test #2 Testing MTU Size

Sometimes communication can fail in route from the DPM server to the protected server if the packet size being sent is larger than what a router or firewall will allow to a segment receiving a smaller packet size . The router may be configured to send an ICMP "destination unreachable" message back to the sending host or if not then the packet is discarded. In such cases when the packet is discarded, this is known as a black hole router and is discussed in:

How to troubleshoot Black Hole Router issues http://support.microsoft.com/kb/314825

Although this scenario is not as common as it was in the past due to the new TCP networking features in 2008 that enable Black Hole router detection, it is still worth checking as this does crop up from time to time.

This can be diagnosed with pinging the destination with a specific packet size with two specific switches.
-l <size> Send buffer size.
-f Set Don't Fragment flag in packet (IPv4-only).

Success Example:
C:\Users\administrator>ping MemberServer -l 1472 -f

Note the “-l” and the “-f “ switch.
-l <size> Send buffer size.
-f Set Don't Fragment flag in packet (IPv4-only).

We are using these switches to test the MTU size of 1472 between the two servers.
Note: Information on Maximum Transmission Unit (MTU) will be provided at the end of this section.

Pinging MemberServer.contoso.com [10.10.10.10] with 1472 bytes of data:

Reply from 10.10.10.10: bytes=1472 time=1ms TTL=128
Reply from 10.10.10.10: bytes=1472 time=1ms TTL=128
Reply from 10.10.10.10: bytes=1472 time=1ms TTL=128
Reply from 10.10.10.10: bytes=1472 time=1ms TTL=128
Ping statistics for 10.10.10.10:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 1ms, Maximum = 1ms, Average = 1ms

Here we see a success of an MTU size of 1472. We specified the –l to set the buffer to 1472 and used the –f to specify do not fragment. The value of 1472 came from the default Ethernet MTU of 1500 minus 28 for the IP and ICMP header.

Reference: http://technet.microsoft.com/en-us/library/cc958871.aspx “The ICMP Echo Request header is 8 bytes, and the IP header is normally 20 bytes. In the Ethernet case shown here, the link layer MTU contains the maximum-sized Ping buffer plus 28, for a total of 1500 bytes”

Failure Example: Here we see an example of a failure of the MTU size specified.

C:\Users\administrator>ping MemberServer -l 1472 -f

Pinging MemberServer.com [10.10.10.10] with 1472 bytes of data:

Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Ping statistics for 10.10.10.10:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)

Let’s lower the MTU value some to a lower number. Let’s choose 1272 in place of 1472:

C:\Users\administrator>ping MemberServer -l 1272 -f

Pinging MemberServer.com [10.10.10.10] with 1472 bytes of data:

Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Ping statistics for 10.10.10.10:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)

We can continue to lower the packet size until we find an acceptable size that is successful.  In addition if the successful packet size is smaller than expected (e.g. you have a gigabit link and are only getting a MTU of 500) you aren’t getting the throughput as expected and can experience packet loss especially during high bandwidth demand.

Note two things above.
1.) We get the message telling us that the packet needs fragmented.
2.) We experienced a 100% loss.

Well now that we know this, now what? How do we fix this?

There are some ways to address this such as:

a.) Reconfiguration of the router or switch to allow a larger packet size.
b.) A firmware update may be needed for the router or switch.
c.) On the server enable black hole detection.

Our purpose for the moment is to diagnose where the failure may be at. A collaborative effort is suggested of course with your Network Administrator should you be experience this type of behavior with what is considered a Black Hole Router. Additional information can be found in the articles below.

Ping : http://technet.microsoft.com/en-us/library/cc952252.aspx

169790 How to Troubleshoot Basic TCP/IP Problems : http://support.microsoft.com/default.aspx?scid=kb;EN-US;169790

314825 How to Troubleshoot Black Hole Router Issues : http://support.microsoft.com/default.aspx?scid=kb;EN-US;314825

The Default MTU sizes for different networking topologies : http://support.microsoft.com/kb/314496

New Networking Features in Windows Server 2008 and Windows Vista : http://technet.microsoft.com/en-us/library/bb726965.aspx

Tracert - “What route (path) do I take from here to there?”

Tracert is an easy to use utility for testing out the routing between two servers using ICMP.  The route taken is displayed as in the example below:

Success Example:

C:\Users\administrator>tracert MemberServer

Tracing route to MemberServer.Contoso.com [10.10.10.10]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 10.10.10.1
2 41 ms 27 ms 29 ms router [10.10.10.2]
Trace complete.

Failure Example:

C:\Users\administrator>tracert 10.10.10.55

Tracing route to 10.10.10.55 over a maximum of 30 hops
1 1 ms <1 ms <1 ms 192.168.1.1
2 * * * Request timed out.
3 * * * Request timed out.
4 * * * Request timed out.

etc……

This result tells us that we don’t know how to get there. From this point you may want to check that your default gateways are valid and that your routing tables on the server and the intermediate devices on the network are correct.

For additional information on tracert see the additional information below:

Tracert : http://technet.microsoft.com/en-us/library/ff961507(WS.10).aspx

How to Use TRACERT to Troubleshoot TCP/IP Problems in Windows : http://support.microsoft.com/kb/314868

Routing Tables : http://technet.microsoft.com/en-us/library/cc957845.aspx

Understanding the IP routing table : http://technet.microsoft.com/en-us/library/cc787509(WS.10).aspx

Net View - “What shares are available to me over the network on this server?”

The net view command is used to view the accessible shares on a remote computer. The ADMIN$ share on the Protected Machine must be accessible from the DPM Server using the account that you are planning to install the agent with. The net view command is a quick way to test this.

Success Example:

C:\Users\administrator>net view \\MemberServer /all

Shared resources at \\MemberServer
Share name Type Used as Comment
-------------------------------------------
ADMIN$ Disk Remote Admin
C$ Disk Default share
D$ Disk Default share
IPC$ IPC Remote IPC
print$ Disk Printer Drivers
Users Disk
The command completed successfully.

Failure Example:

C:\Users\Administrator >net view \\BogusServer

System error 53 has occurred.
The network path was not found.

How to Use the NET VIEW Command to View Shared Resources : http://support.microsoft.com/kb/141229

SC test for RPC connectivity - “Can I reach this server with RPC traffic?”

Now we test RPC and connectivity to Service Control Manager (SCM) by using Service Controller or rather the SC command. This displays a list of services on the remote server when successful:

Sc \\<protected server name> query

A successful output will tell us that RPC connection to the Service Control Manager on the server is accessible.

Success Example:

C:\Users\administrator> sc \\MemberServer query

SERVICE_NAME: AeLookupSvc
DISPLAY_NAME: Application Experience
TYPE : 20 WIN32_SHARE_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0

etc….

The list may be quiet long so for purposes of this document there’s no reason to list the full output. The key point to remember is if it’s successful or not. A failed output is given below:

Failure Example:

C:\Users\administrator>sc \\Server1 query

[SC] OpenSCManager FAILED 1722:
The RPC server is unavailable.

If we know for a fact that “Server1” is a valid server name and we receive the results of “RPC server is unavailable.” Then assuming that ping and tracert are successful, showing we know how to get to the server, then we can conclude that the possible cause is RPC port blockage.

For additional information:

SC query
http://technet.microsoft.com/en-us/library/dd228922(WS.10).aspx
http://technet.microsoft.com/en-us/library/bb490995.aspx

WBEMTEST - Is DCOM accessible?

For this test we can use WBEMTEST for testing out our DCOM connection. This is yet another built in tool that we can use for testing this. There is an excellent walkthrough already written on how to do this at the following link:

Troubleshooting Agent Deployment in Data Protection Manager 2007 - Networking
http://blogs.technet.com/b/askcore/archive/2008/05/01/troubleshooting-agent-deployment-in-data-protection-manager-2007-networking.aspx

Should the test fail with WBEMTEST then verify the DCOM settings as per:

Troubleshooting Agent Deployment in Data Protection Manager 2007 - DCOM
http://blogs.technet.com/b/askcore/archive/2008/05/09/troubleshooting-agent-deployment-in-data-protection-manager-2007-dcom.aspx

WMIC

Test WMI/DCOM. when successful this command lists some basic information about the remote server.

Syntax:
Wmic /node:"<protected server name>" OS list brief

“list”---WMIC verb
“brief”—WMIC adverb
“/node”—Specifying computer name
“OS”—list information about the operating system

Example of success:

C:\Users\administrator>wmic /node:MemberServer OS list brief

BuildNumber Organization RegisteredUser SerialNumber SystemDirectory Version

7600 Microsoft Admin 12345-678-1234567-12345 C:\Windows\system32 6.1.7600

Example of failure:

C:\Users\administrator>wmic /node:BogusServer OS list brief

Node – BogusServer
ERROR:
Description = The RPC server is unavailable.

While it’s true that there can be misconfigured DCOM settings to cause inaccessibility, we will not cover that in this article as we are just looking at this from a networking perspective. For more information in this regards you can reference the following:

Troubleshooting Agent Deployment in Data Protection Manager 2007 - DCOM
http://blogs.technet.com/b/askcore/archive/2008/05/09/troubleshooting-agent-deployment-in-data-protection-manager-2007-dcom.aspx

Additional Information

WMIC verbs : http://technet.microsoft.com/en-us/library/cc784966(WS.10).aspx

WMIC switches : http://technet.microsoft.com/en-us/library/cc787035(WS.10).aspx

Running Windows Management Instrumentation Command-line : http://technet.microsoft.com/en-us/library/cc782919(WS.10).aspx

Netstat - What ports are listening or are being used?

Netstat is another handy built in tool for determining active TCP connections and the ports listing. There are other switches for netstat but we will only focus on a few.

We will use netstat in conjunction with tasklist (listed below) to determine the following:

a.) Can we establish a DPM connection over the DPM port 5718?
b.) Can we be sure that it’s the DPMRA service using that port and not something else?

Syntax: Netstat –ano

The “-ano” switch gives you:
-a Displays all connections and listening ports.
-n Displays addresses and port numbers in numerical form.
-o Displays the owning process ID associated with each connection.

Example:

C:\Users\administrator>netstat –ano

Active Connections
Proto Local Address Foreign Address State PID
TCP 10.10.10.10:58891 157.54.62.44:56281 ESTABLISHED 3608
TCP 10.10.10.10:59628 10.251.16.114:443 ESTABLISHED 1904
TCP 10.10.10.10:60075 157.54.41.53:7575 ESTABLISHED 3608
TCP 10.10.10.10:61763 10.37.38.16:5061 ESTABLISHED 3708
TCP 10.10.10.10:64475 65.53.103.24:1745 ESTABLISHED 1112
TCP 10.10.10.10:65292 157.54.41.53:7576 ESTABLISHED 3608
TCP 10.10.10.10:3143 65.53.65.78:5718 ESTABLISHED 5512

Note the port of 5718 and the PID listed there. We know that this is a DPM port how can we be sure that another service isn’t listening. We can use “netstat –anob” to list the executables as well or we can use tasklist as per below.


Tasklist

Tasklist can be used to list the currently running processes and associated PIDs. I personally prefer Tasklist over using netstat with the “b” switch as tasklist is also handy in seeing what services is associated with svchost.exe.

Syntax: tasklist /svc

C:\Users\administrator>tasklist /svc

Image Name PID Services
========================= ========
OUTLOOK.EXE 3608 N/A
IncidentManagement.Client 1904 N/A
communicator.exe 3708 N/A
DPMRA.exe 5512 DPMRA
svchost.exe 252 CryptSvc, Dnscache, LanmanWorkstation,
napagent, NlaSvc, TermService

Note that we can see that the PID of 5512 belongs to the DPMRA service. So the output from both netstat and tasklist verifies that the current service which is DPMRA on this server is using the 5718 port.  Below is an article on how to use the Netstat output and Tasklist output to assess if another process is listening to the DPMRA ports of 5718 or 5719.

947682 The DPM protection agent service cannot start in System Center Data Protection Manager 2007 : http://support.microsoft.com/default.aspx?scid=kb;EN-US;947682

TCPView

TCPview is a free utility that you can use to accomplish the same thing as netstat and tasklist but with a gui interface.

TCP view : http://technet.microsoft.com/en-us/sysinternals/bb897437

TCPView can show you the ports, protocol, IP addresses in use, and processes. The output can be saved into a text file for reference later on.

clip_image002

Integrated Firewall Logging - “Is the Windows integrated firewall causing an issue?”

Port blockage by the integrated firewall is definitely not out of the realm of impossibility and a commonly over looked cause of DPM agent communication. I’ve bolded that sentence to stress again that this is often overlooked. When the agent is installed by a manual installation the DPMRA rules “should” be created properly to allow traffic. Even if this is done correctly, there is still the possibility of someone changing the firewall rules manually or by GPO applied or even creating a new more restrictive rule. Toggling the firewall off is a good step to follow to rule out that variable.

Turning off the integrated firewall can be done in one of three ways:

a.) From the command prompt

b.) From the mmc or computer management

c.) From netsh

The choice to turn off the integrated firewall is often a rather sensitive decision due to company policies and\or change request. As such, it may be to your advantage to enable and analyze the firewall logging first.

Turning the integrated firewall off by Command prompt

To turn off the firewall via the command prompt you can type: “net stop bfe”

You will be prompted for a confirmation. This is the easiest method but does stop other services. (IKE, Ipsec, TMG if installed) . This being the case, you may not want to go this route if it is unknown as to what other services are needed for this server at the time. You may want to choose one of the other methods below.

Turning the integrated firewall off by the Netsh command

To disable the Windows Firewall, open an administrative command prompt and type the following commands, hitting Enter after each line.

netsh
firewall
advfirewall
set allprofiles state off

Example:
clip_image004
To re-enable the Windows Firewall, change the last line to read the following: set allprofiles state on

Turning off the integrated firewall by MMC
To turn it off via the MMC snap-in. Add the “Windows Firewall with Advanced Security” for the local computer OR you can access it via Computer Management. Right click the top of the tree and select properties.

clip_image006

You will be presented with the following:

clip_image008

There are three things to note:

a.) There are three profiles - Private profile, Public profile and Domain profile.
b.) You “may” but not always see the domain GPO overriding the option to disable the firewall.
c.) You do have logging to reference if needed.

KEY POINTS

If a GPO is overriding the ability to turn off the firewall by the graphical user interface (GUI), then you may have to do so via the command prompt (net stop bfe). This will temporarily turn it off for testing, it will, however, possibly be turned back on upon a GPO refresh (default GPO refresh is 15 minutes). Remember though that turning off the firewall by net stop bfe also turns off other services as mentioned previously.
If you are not allowed to turn it off at all, then rely on firewall logging to tell us what’s going on. You will have the option to enable dropped packets and successful packets. I suggest you log both.

The most important things to note when referencing the logs are:

a.) Action (dropped or allowed)
b.) Protocol
c.) Source and destination IP.

Firewall logging is done on a per profile basis.

Default path: C:\Windows\System32\LogFiles\Firewall\pfirewall.log

Sample log below shows the dropped packets for ICMP, TCP 445, TCP 135, UDP 137 between the source IP address of 10.10.10.100 and destination IP address 10.10.10.50.

clip_image010

Additional information:

System Center Data Protection Manager 2012 agent installation fails with error 319 : http://support.microsoft.com/kb/2621989

TMG Setup for DPM Communication : http://blogs.technet.com/b/dpm/archive/2010/12/06/new-video-tmg-setup-for-dpm-communication.aspx

DPM Traffic and Chimney and RSS - Is TCP Chimney or RSS causing an issue?

In order to answer that questions we first need to understand what is TCP chimney and RSS.  TCP Chimney and RSS are enhancements to increase the throughput and packet processing of network traffic.

RSS technology enables the processing of network packets belonging to the same TCP connections to be distributed across multiple processors in the system if your server has more than one processor.

TCP Chimney Offload is a networking technology that helps transfer the workload from the server CPU to a network adapter during network data transfer. This is of course assuming the adapter can support it.

If it’s an enhancement, why would I want to turn it off?”

Sometimes, however, “the network adapter is not powerful enough to handle the offload capabilities at high throughput. For example, enabling segmentation offload can reduce the maximum sustainable throughput on some network adapters because of limited hardware resources.” ---Quoted from the performance article found at: http://msdn.microsoft.com/en-us/windows/hardware/gg463394

There is also a list of common symptoms you may see if TCP offload is not working as expected. This is discussed in:

948496 An update to turn off default SNP features is available for Windows Server 2003-based and Small Business Server 2003-based computers http://support.microsoft.com/default.aspx?scid=kb;EN-US;948496

The bottom line is if TCP Chimney and\or RSS is not operating properly or as expected then undesirable behavior can be seen such as DPM traffic is extremely slow and\or even failed in some cases. This includes not just DPM replication issues but also failed instances of a DPM agent push. There are various SNP, TCP offloading hotfixes that may apply in some instances to make DPM traffic better. Implementing such hotfixes is best determined via the networking administrator.

Most often if TCP Chimney is not functioning properly then an update of the NIC driver will correct the problem. This is important to remember as updating the NIC driver is often overlooked as a possible solution. Please remember that a NIC is only as good as the driver that is written for it and an updated driver may correct many underlying issues.

If it is suspected that TCP Chimney or RSS is not operating as expected then try toggling them off for testing. Doing this is simple and turning off TCP Chimney or RSS does NOT require a reboot of the server. When finished with your testing they can easily be turned back on.  To turn off TCP chimney or RSS you can follow the netsh syntax below:

Chimney

To determine the current status of TCP Chimney Offload: netsh int tcp show global
To disable chimney: netsh int tcp set global chimney=disabled
To enable chimney: netsh int tcp set global chimney=enabled

RSS

To determine the current status of RSS, follow these steps: netsh int tcp show global

To disable RSS: netsh int tcp set global rss=disabled
To enable RSS: netsh int tcp set global rss=enabled

Example:
clip_image012

Note the following from the output above:

a.) Both chimney and RSS are in use.
b.) We have effectively turned them both off and received a confirmation that our changes took affect.

951037 Information about the TCP Chimney Offload, Receive Side Scaling, and Network Direct Memory Access features in Windows Server 2008:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;951037

Netmon

For a networking administrator, netmon is an extremely useful tool to help determine what’s taking place on the wire. There is a saying however that I’ve heard all too often. “Reading netmon traces is more of an art than it is a science.” Covering all the ins and out of all the types of traffic and the differences therein of baseline behavior vs. failed or abnormal traffic will not be covered in this document. The discussion of protocol analysis can go very deep on just the basics alone. Although we are not going in-depth over Netmon usage it still warrants mentioning as a tool to be used by someone who is savvy in protocol analysis.

If you are interested in obtaining or sharpening this skill then below are a few links to send you down that path.

Netmon: http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f
OneClick-This will allow you to take a network capture. In some cases may be easier to use than Netmon : http://www.microsoft.com/download/en/details.aspx?id=6537

Basics of Reading Traces : http://support.microsoft.com/kb/169292

The SETSPN Command – Are my SPNs properly registered?

Here is an example of how to check to make sure the SPN registration is proper. Run setspn and check the Service Principal Name(s) for the server that we cannot push the agent to.

Follow these steps:

1. Download and install the Windows Support tools from:

<http://www.microsoft.com/downloads/details.aspx?FamilyID=6ec50b78-8be1-4e81-b3be-4e7ac4f0912d&DisplayLang=en>

2. Bring up a command prompt, and while you are logged in as a Domain Administrator, run the following command:

setspn -l <Servername>

where <Servername> is the production server where the DPM agent cannot be successfully deployed.

The output should look similar to the following:

Registered ServicePrincipalNames for CN=EXCHANGESERVER,OU=Member
Servers,DC=joesgarage,DC=com:
SMTPSVC/exchangeserver.joesgarage.com
NtFrs-88f5d2bd-b646-11d2-a6d3-00c04fc9b232/exchangeserver.joesgarage.com
HOST/exchangeserver.www.joes.com/JOES.COM
HOST/exchangeserver.www.joes.com
exchangeMDB/exchangeserver.www.joes.com
SMTPSVC/exchangeserver.www.joes.com
HOST/exchangeserver.www.joes.com/www.joes.com
exchangeRFR/exchangeserver.www.joes.com
exchangeRFR/exchangeserver.joesgarage.com
exchangeMDB/exchangeserver.joesgarage.com
exchangeRFR/EXCHANGESERVER
exchangeMDB/EXCHANGESERVER
SMTPSVC/EXCHANGESERVER
HOST/EXCHANGESERVER

NOTE All the "HOST/" SPNs above showed JOES.COM; this indicates that the primary DNS is set joes.com, not joesgarage.com. When the agent is being deployed, the DPM server is resolving the name EXCHANGESERVER.JOESGARAGE.COM, and this is what is being used to build the SPN that it is being requested at the time of agent deployment. However, as the SPNs registered are for exchangeserver are for www.joes.com, the Kerberos connection attempt fails because of the SPNs that don't match; when this happens, we then try to make an anonymous connection - and this is why we were getting the event ID 6033 logged (in some circumstances) on the Exchange server.

3. Run the following commands to insure that the HOST SPNs are correctly registered:

setspn -a HOST/exchangeserver.joesgarage.com exchangeserver

setspn -a HOST/exchangeserver exchangeserver

4. Replicate changes through out AD, test the DPM agent deployment again

Putting it all together - Well this is a lot of stuff, where do I begin? Which tool do I use first and when?

There is no “absolute” order that must be followed but we will start by keeping it simple.

If you are trying to push out a DPM agent to a server and it fails then I suggest you test the following.

From the DPM server to the protected server
ping <protected server name>
net view \\<protected server name>
Sc \\<protected server name> query
Wmic /node:"<protected server name>" OS list brief
Wbemtest

Ping
If ping fails, then use tracert to see where the traffic dies at.
If ping fails, then check the integrated firewall on the target server.
If ping fails by using the name, then test by pinging the ip address of the target server.
If ping fails with name but works with IP, then check the DNS registration.
If ping works, then test with netview.

Netview
If net view fails, with error 53, then make sure the computer name is correct AND that file and printer sharing are enabled.
If net view fails, with "System error 5 has occurred. Access is denied." verify that you are logged on using
an account that has permission to view the shares on the remote computer.
If net view with the name fails then test with net view \\ipaddress if this works then name resolution may be an issue.
a.) Go to the target server and from a command prompt type: “ipconfig /flushdns” then enter. Then type
“ipconfig /registerdns” then enter. This flushes the dns resolver cache on the local server and re-
registers the name with DNS.
b.) From the DPM server at a command prompt type: “ipconfig /flushdns” and enter. Then type
“ipconfig /registerdns”

Does net view work now?  If net view succeeds then make sure the ADMIN$ is listed.

SC
Does SC \\<ServerName> query fail? If so:
a.) Check the target server integrated firewall to see if RPC traffic is locked down and being denied.
Either turn off the firewall and\or rely on the firewall logging as discussed earlier.
b.) If there are any firewalls in between the DPM server and target server make sure the RPC port range
is opened. Remember the port range is assigned dynamically from the TCP port range of 1024
through 65535.

This port range can be configured to be restricted but will need careful consideration and a collaborative effort with the Networking administrator as it will affect other types of traffic.

How to configure RPC dynamic port allocation to work with firewalls : http://support.microsoft.com/kb/154596

WMIC and WBEMTEST

If “wmic /node:"<protected server name>" OS list brief” fail and\or WBEMTEST fail then follow:

Troubleshooting Agent Deployment in Data Protection Manager 2007 – DCOM : http://blogs.technet.com/b/askcore/archive/2008/05/09/troubleshooting-agent-deployment-in-data-protection-manager-2007-dcom.aspx

Ports and TCP Chimney and RSS
If everything seems to check out by following the steps above then check the ports on the target server and of course try turning off TCP chimney and RSS for testing. Use Netstat or Tcpview and check the firewall logging again for any denied traffic.

Conclusion

Data Protection Manager agent communication can be difficult to chase down. There are many more tools built in, publically downloaded that are free or purchased that you can use and we all have our favorites that we like to use more than others. I just touched base on the ones commonly used in Data Protection Manager Support. They are relatively quick and simple to help cut to the chase as to where the problem may be at on such issues. If everything seems to fall into place as successful communication from a networking perspective then follow the three articles below in the “additional resources” section below and reference the DPM event logs and DPM error logs located at:

Client Side Activity-- %Program Files%\Microsoft Data Protection Manager\DPM\Temp
DPM Server Activity-- %Program Files%\Microsoft DPM\DPM\Temp

They can be opened with notepad for a look as to what may be happening as to the cause of the failure.

Additional Resources

Troubleshooting Agent Deployment in Data Protection Manager 2007 : http://blogs.technet.com/b/askcore/archive/2008/04/23/troubleshooting-agent-deployment-in-data-protection-manager-2007.aspx

Troubleshooting Agent Deployment in Data Protection Manager 2007 – DCOM : http://blogs.technet.com/b/askcore/archive/2008/05/09/troubleshooting-agent-deployment-in-data-protection-manager-2007-dcom.aspx

Troubleshooting Agent Deployment in Data Protection Manager 2007 – Networking
http://blogs.technet.com/b/askcore/archive/2008/05/01/troubleshooting-agent-deployment-in-data-protection-manager-2007-networking.aspx

Shane Brasher | Senior Support Escalation Engineer

Get the latest System Center news on Facebook and Twitter:

clip_image001 clip_image002

App-V Team blog: http://blogs.technet.com/appv/
AVIcode Team blog: http://blogs.technet.com/b/avicode
ConfigMgr Support Team blog: http://blogs.technet.com/configurationmgr/
DPM Team blog: http://blogs.technet.com/dpm/
MED-V Team blog: http://blogs.technet.com/medv/
OOB Support Team blog: http://blogs.technet.com/oob/
Opalis Team blog: http://blogs.technet.com/opalis
Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/
OpsMgr Support Team blog: http://blogs.technet.com/operationsmgr/
SCMDM Support Team blog: http://blogs.technet.com/mdm/
SCVMM Team blog: http://blogs.technet.com/scvmm
Server App-V Team blog: http://blogs.technet.com/b/serverappv
Service Manager Team blog: http://blogs.technet.com/b/servicemanager
System Center Essentials Team blog: http://blogs.technet.com/b/systemcenteressentials
WSUS Support Team blog: http://blogs.technet.com/sus/

The Forefront Server Protection blog: http://blogs.technet.com/b/fss/
The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
The Forefront TMG blog: http://blogs.technet.com/b/isablog/
The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/

dpm 2007 dpm 2010 dpm 2012 system center 2012 data protection manager

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • It might be worthwhile noting, that the accounts used between the AGENT and the SERVER in a workgroup environment (or in my case across a DMZ) must have their passwords set to NEVER EXPIRE.  Otherwise, if the password expires and "must be changed" then the AGENT and the SERVER will not be able to communicate.

  • try updating the dpm2012 sp1 server to .net 4.5 and updating the windows management framework 3.0 in kb2506143.  after patches, i was able to get remote agent to respond on DC without error.

  • Your port instructions are a bit unclear. Which ports need to be open for the protected servers (a.k.a. clients) versus the ports which need to be opened for the DPM server itself. It looks like you mashed them together, but the information you give makes it unclear which ports should be open where.

  • Found the instructions for anyone who is wondering...

    Installing Protection Agents on Computers Behind a Firewall
    http://technet.microsoft.com/en-us/library/hh757971.aspx