RPC over IT/Pro

RPC over IT/Pro

  • Comments 25
  • Likes

Hi folks, Ned here again to talk about one of the most commonly used – and least understood – network protocols in Windows: Remote Procedure Call. Understanding RPC is a foundation for any successful IT Professional. It’s integral to distributed systems like Active Directory, Exchange, SQL, and System Center. The administrator who has never run into RPC configuration issues is either very new or very lucky.

Today I attempt to explain the protocol in practical terms. As always, the best way to troubleshoot is with an understanding of how things are supposed to work, so that when it fails the reasons are obvious.  If you have a metered or capped Internet connection, read this off hours – it’s a biggee.

Some context

The RPC concept has roots in ARPANET, but got its first business computing use – like so many others – at Xerox PARC as “Courier”. The Microsoft implementation is an extension of The Open Group’s DCE/RPC, sometimes called MSRPC. We further extended that into the Distributed Component Object Model (DCOM), which is RPC and COM. The Exchange folks heavily invested in RPC over HTTP. Microsoft also retains the legacy "RPC over SMB" system, often referred to as Named Pipes. That ends the brochure.

As I began to learn RPC, the first problem I ran into was the documentation. It seemed to come in two forms:

image
Let’s do lunch – you like human?

If you actually read the docs, you're let down in the details. It comes in two arrangements, both of which completely miss the IT boat:

1. The “it’s all processes and libraries, get to coding” form:

image
See, it's just code!

2. The “Jedi network magic” form:

image
These aren't the computers you're looking for… move along

I find developers are often like Rain Man: specialist geniuses, bewildered by real life. This isn’t bad documentation, but IT pros aren’t the audience. The developers of RPC are providing a framework and since they live in a perfect world of design where nothing breaks, how it works is not important – they just want you to use the right APIs. The problem is I don’t care about the specifics of MIDL, stubs, or marshaling unless I’m at the point of debugging; I just want to know how it all works in practical networking terms. Then when it breaks, I have somewhere to start, and when I’m designing a distributed system, I’m not setting my customer up for headaches.

Today I focus on MSRPC, as that’s the main RPC protocol of AD components. I may return someday to discuss the others, if you’re interested. And bribe me.

The MSRPC details

Let's start with an analogy: you meet a nice girl and really hit it off. Like an idiot, you manage to lose her phone number. You know that she works for Microsoft though, so you start by looking up the Charlotte office. You call and get a switchboard, so you ask for her by name. The operator tells you her number and then offers to transfer you – naturally, you say yes. Someone answers and you make sure it’s the nice girl by introducing yourself. You both exchange pleasantries, then make plans for dinner and a movie, with directions to the restaurant and a chat about the Flixster reviews. You hang up and think about what you’re going to say to keep her interested until the appetizers arrive. You called her on your mobile phone so you have the outgoing number saved in case you need to call back.

There, now you understand MSRPC. No really, you do…

  1. A client application knows about a server application and wants to communicate with it.
  2. The client computer uses name resolution to locate the computer where that server application runs.
  3. The client app connects to an endpoint locator and requests access to the server application.
  4. The endpoint locator provides that info and the client connects to the server with an initial conversation.
  5. The client and server apps exchange instructions and data.
  6. The client and server apps disconnect.
  7. The client computer has a cache of name resolution and the connection that can save time reconnecting later.   

RPC allows a client application to let other computers work on its behalf, offloading processing to more powerful centralized servers. Instead of sending real functions over the network, the client tells the server what functions to run, and then the server sends the data back. This has nothing to do with the OS: some of these applications can be both client and server – for instance, Active Directory multi-master replication. That RPC application is LSASS.EXE. I’m going to use it as our sample app.

image

There are a few important terms to understand:

  • Endpoint mapper – a service listening on the server, which guides client apps to server apps by port and UUID
  • Tower – describes the RPC protocol, to allow the client and server to negotiate a connection
  • Floor – the contents of a tower with specific data like ports, IP addresses, and identifiers
  • UUID – a well-known GUID that identifies the RPC application. The UUID is what you use to see a specific kind of RPC application conversation, as there are likely to be many
  • Opnum – the identifier of a function that the client wants the server to execute. It’s just a hexadecimal number, but a good network analyzer will translate the function for you. MSDN can too. If neither knows, your application vendor must tell you
  • Port – the communication endpoints for the client and server applications
  • Stub data – the information given to functions and data exchanged between the client and server. This is the payload; the important part

There’s a lot more but we’re getting into developer country. I know it sounds like jabber, so let’s dissect this with a real-world example using our old friend NetMon and the latest open source parsers.

Back to reality

Here I have two DCs in the same AD site, named WIN2008R2-01 and WIN2008R2-02, with respective IP addresses of 10.0.0.101 and 10.0.0.102. I reboot DC2 and have a network capture running on DC1. I create a brand new test user and let it replicate, then I stop the capture. It’s critical to have a network capture see the whole conversation or it will be a mess to analyze; if possible, the captures should always be running on both client and server, but in this case, that’s not possible due to the reboot.

image

When you first examine AD replication traffic in NetMon (like above) it looks like Greek. What the heck is a stub parser? DRSR?

Open the Options menu and select Parser Profiles. The reason you see the “Windows stub parser” messages is that by default, NetMon uses a balanced set of parsers designed for limited analysis without packet loss.

image

When analyzing captures on your desktop, set the active parser to “Windows” and you get the most detail.

image

While you’re in the Options, I also recommend configuring color filters. Since I am examining AD replication, I want visual cues for DRSR (Directory Replication Service Remote protocol), EPM (RPC Endpoint Mapper), MSRPC, and DNS. This makes skimming a capture easier.

image

Now I add a simple filter of: msrpc. Better. Let’s start deciphering:

image

Right away, we see the endpoint mapper request above. The tower for Directory Replication is in that request, using the UUID E3514235-4B06-11D1-AB04-00C04FC2DCD2 (that's how Netmon knows to parse it, by the way). It is connecting to TCP port 135. This happens shortly after LSASS.EXE starts, as domain controllers are nearly always talking about replication.

Naturally, there is a response, and it contains several key ingredients:

image

You can see the towers - there may be more than one - and the floors in each tower with their ports. Importantly, you also see the status of the attempted connection. And a specific server port is listed. That port may be dynamic or static, it depends on the application’s configuration.

Now the client application opens a local client port (again, maybe dynamic, maybe static) and binds to that new application port, using security; the original connection, by default, did not require special permissions - EPM is a switchboard, remember. Because this is MSRPC and domain controllers, this means Kerberos and packet privacy are required. This bind phase below is negotiation.

image

image

The server responds with the (hopefully) successful negotiation, providing details about which security protocols were selected for further encryption of the traffic. The NegState field shows how this is not yet complete, but things are proceeding as planned.

image

This bind was the negotiation. What follows is the completion of the authentication and encapsulation phase, called an ALTER_CONTEXT operation. If all goes well, the authentication is accepted and RPC application communications proceeds with some nice secure packet payloads.

image

Everything after this point is application… stuff. RPC connected from a client port to a server port and then communicates along that "channel" for the rest of the conversation. The two halves of the application send each other requests and responses, with stub data used by the application's functions.

Every application is different, but once you know each one's rules, it will work in a (relatively) predictable fashion. Since this is the well-documented Directory Replication Services application, what happens next is the DC creates a context handle, called a DRSBIND. It then does some work. Let's take a look at one example of the work by switching the NetMon filter to just DRSR, then apply it to our scenario.

image

Netmon is politely translating all of these RPC functions above into semi-intelligible words, like DRSBind, DRSReplicaSync, and DRSGetNCChanges. It knows that when there is an opnum it understands for a given protocol, it means an RPC function that the client is telling the server to run remotely on the client's behalf.

If you examine one of those packets, you see that the data itself is encrypted (good!), but with knowledge of the opnum's purpose and that RPC reached this stage, you have a decent idea what it is doing or how to look it up based on the UUID and Opnum information, even if your network parsers are terrible. In this case:

http://msdn.microsoft.com/en-us/library/cc228532(v=PROT.13).aspx

Function Explanation
IDL_DRSBind

Creates a context handle necessary to call any other method in this interface.
Opnum: 0

IDL_DRSReplicaSync

Triggers replication from another DC.
Opnum: 2

IDL_DRSGetNCChanges

Replicates updates from an NC replica on the server.
Opnum: 3

IDL_DRSCrackNames

Looks up each of a set of objects in the directory and returns it to the caller in the requested format.
Opnum: 12

IDL_DRSUnbind

Destroys a context handle previously created by the IDL_DRSBind method.
Opnum: 1

image

Importantly, you know that RPC and the network appear to be functioning correctly, so any application problems are likely inside the application itself. If the application has internal logging, you can use these network captures to correlate each opnum request/response to real work, and perhaps see where things are failing internally. If the application doesn’t have good security, you can see exactly what it's doing - but so can anyone else. Probably something to bring to the third party vendor's attention, as it will not be Microsoft.

A polite application will tear down the connection with noticeable "unbind" traffic, and perhaps even send a network reset, but many simply abandon the conversation and let Windows deal with it later.

image

A final note: a domain controller has a great many RPC conversations going with multiple partners; always ensure you are looking at the same conversations by filtering based on IP addresses and ports, as well as your network analysis tools conversation ID system. NetMon makes this pretty easy:

image

And we're done. See? It’s just a phone call with a nice girl from Microsoft. Don’t be intimidated when she knows more about computers than you do, bub.

Until next time.

Ned "really pedantic chatter" Pyle

  • Ned, yet another oustanding article!  I can't really think of a better word than outstanding, so that's what I'm going with.  

    You very clearly walk through and illustrate a clear example to help understand what is going on.  I couldn't agree more with your point on the two forms of documentation.  Thankfully it seems more and more stuff is coming out that does focus on things from an IT perpective.

    I probably picked up more details in my first read through than in many years of working with the protocol.  I'll likely read through it a few more times at a much slower pace in the future to pick up more fine details.

    Again, great work as always!

  • Again great post Ned! Thanks for this great explanation!

  • Thanks guys! These nice comments make it worth all the trouble.

  • In reference to your post "Restrictions for Unauthenticated RPC Clients"(Apr 2011), at what step do the incorrectly configured settings break RPC?  My guess is step 1.  The server refuses the unauthenticated end-point mapper request.

    The endpoint mapper "process" reminds me of the old and busted portmap from Unix.

    Thanks for the great information.  Funny AND informative nice change-up for IT documentation.

  • Great question, LA. I thought about putting this in the main article but I am super paranoid about any appearance of "recommending" it. That security change is right at the EPM bind and response. Step 1, as you thought.

  • Great post, thanks Ned!

  • Fantastic post. I have to agree that RPC has got to be the most enigmatic network protocol around. It reminds me of the Richard Feynman quote... "Anyone who says that they fully understand RPC..." wait, that was something else.  Floors? Towers? Did you mean RPC or RPG?

    But seriously, RPC and DCOM are vital for some of the coolest technologies that we use every day.

    In addition, I really need to start working more with MS Network Monitor. I'm proficient enough with Wireshark, but man it's pretty obvious that Netmon's more Windows centric parsers can't be beat for monitoring traffic between Windows machines.

  • Ned,

    Thank you! Finding a deep dive on this topic is hard to come by in one place.. Finally!

  • As always, Ned delivers!

    Thanks!

  • Nice post Ned! Thanks for this very good explanation! :)

  • Solid solid article Ned. These are the Endpoint Mappers we are looking for!!!

  • Ned, first time when I read this article it was Greek to me. Second time I understood part of it. Tomorrow I'm going to read it again. THANKS A LOT..

  • It's funny how the posts people like are never the one I'd expect. I'd have bet money this one was going to be a dud.

    Thanks to you all for the kind words and confidence boost. :)

  • Very Nice. I was indeed looking for a good reading on RPC. Thanks Ned !

  • I was going to post my own explanation of RPC but this one is OK I suppose ;).

    Awesome post Ned. Thanks for taking the time to keep posting every week!