It’s been a while since I wrote this blog on OCS 2007 media traversal.  I’ve since left Microsoft to join a UC consulting company, but media traversal is still near and dear to me. This blog describes some of the improvements in media traversal that have been implemented in OCS 2007 R2.

Author: Alan Shen

Publication date: April 2009

Product version: Office Communications Server 2007 R2

Some things haven’t changed

The overall architecture of media endpoints using ICE and the STUN/TURN capabilities of the A/V Edge server has not changed.  Signaling is still protected by TLS encryption, media is still protected by SRTP encryption. STUN/TURN allocations against the A/V Edge are still protected by a digest authentication mechanism whose password rotates every eight hours, and obtaining this allocation password is still protected within a TLS encrypted SIP SERVICE message.  That said, a lot of improvements have been made in OCS 2007 R2.  Let’s take a look at some of them.

Support of Early Media

In OCS 2007, negotiation of a media path (i.e. ICE connectivity checks) started when the called party answered the call.  Specifically, ICE candidates were sent by the caller in the INVITE and by the callee in the 200 OK.  This resulted in a slight delay between the called party answering and when media would actually flow.  (The one exception to this was outbound calls to PSTN.  To support PSTN gateways that started sending audio before the 200OK, the mediation server would actually send ICE candidates in a 180 RINGING in addition to the 200 OK.  This enabled a poor man’s version of early media where one-way audio could be transferred from the mediation server to the calling endpoint before the full ICE negotiation occurred, preventing any initial “Hello?” audio from being clipped.)

OCS 2007 R2 endpoints support early media, a feature which enables negotiation of media before the call is accepted by the called party.  This addresses the audio clipping issue and enables a number of other scenarios such as playing custom ring back tones to the caller.  Practically speaking, this means that ICE must be negotiated before the 200 OK.  What you’ll notice is that the called party will send back ICE candidates in a 183 SESSION PROGRESS message.  Under the covers, this triggers a full ICE negotiation, enabling the media path to be ready the instant the called party actually answers the call.  (Note that the called party still sends candidates in the 200 OK message and a final ICE negotiation still happens, though this rarely results in a switch of the media path.)

If a called user has multiple R2 endpoints register, each will allocate ICE endpoints and negotiate an early media ICE path with the caller.  However, as soon as the caller receives an audio packet from one of the dialed endpoints, it will stop listening on the other early media paths.  In theory, the media path could switch after the final ICE negotiation occurs with the 200 OK.  (e.g. Let’s say an incoming call is set to simulring a user’s OC endpoint and a his cell phone.  The cell phone system generates a custom ring back tone, but the user ends up answering on OC.)  However, in the vast majority cases, the endpoint that sent early media audio packets will be the same endpoint that actually answers the call and sends the 200 OK.

App Sharing Use of ICE/STUN/TURN

OCS 2007 R2 introduces a new modality called App Sharing, built upon the same RDP protocol used in Terminal Services.  Though functionally similar to the desktop sharing feature in Live Meeting, it functions as a totally separate modality outside of a Live Meeting conference.  For app sharing sessions involving two OC endpoints, the app sharing media stream flows point to point.  For conferences that use app sharing or if a CWA endpoint is involved, the media flows through the new app sharing MCU.  In either case, the same ICE/STUN/TURN mechanism used to negotiate an audio and video path is also used to negotiate an app sharing media path…with one key difference.  Unlike audio and video, the RDP protocol is not designed to be run over an unreliable transport protocol like UDP.  Therefore, the app sharing modality uses ICE/STUN/TURN in a TCP-only mode.  One interesting note is that in this TCP-only mode, TCP candidates are actually supported on the endpoint hosts, enabling a point to point TCP media stream.  For voice and video, only a point to point UDP stream is possible.

Support of ICE version 19

In OCS 2007 R2, all endpoints support ICE version 19.  In actually, OCS 2007 R2 endpoints support both ICE version 19 and the legacy ICE version 6 implemented in OCS 2007.  Full treatment of the differences between these two versions is beyond the scope of this blog and probably not something you’ll ever need to know, but let’s look at an SDP fragment from on R2 OC client to get a sense for some of the key differences:

------=_NextPart_000_0149_01C9A22E.BDA43360
Content-Type: application/sdp
Content-Transfer-Encoding: 7bit
Content-Disposition: session; handling=optional; ms-proxy-2007fallback

v=0
o=- 0 0 IN IP4 192.168.5.150
s=session
c=IN IP4 192.168.5.150
b=CT:99980
t=0 0
m=audio 50010 RTP/AVP 114 111 112 115 116 4 8 0 97 13 118 101
k=base64:ROFyvlcWFwsPej5xrWlQj+PFsw9Uyy0OSHoFv62mLTPvXdpnn5XvqcxI556k
a=candidate:Y821qEyRKswvPiFeMBgkQBTTL0vJDm//txizLAGyhKQ 1 o4IBYszjQDYWPTb58I7szQ UDP 0.830 192.168.5.150 50010
a=candidate:Y821qEyRKswvPiFeMBgkQBTTL0vJDm//txizLAGyhKQ 2 o4IBYszjQDYWPTb58I7szQ UDP 0.830 192.168.5.150 50008
a=candidate:VS7Zjeu4CJwh6kMO3xTuwAOhW6gGpoC9NpqEv7S8geA 1 9cJV/DeRmf+hwEws92rRNQ TCP 0.190 64.105.253.213 56653
a=candidate:VS7Zjeu4CJwh6kMO3xTuwAOhW6gGpoC9NpqEv7S8geA 2 9cJV/DeRmf+hwEws92rRNQ TCP 0.190 64.105.253.213 56653
a=candidate:cnsB1P6I85tVDpl/UgjTWRl8rFOYSkXOa8nPvnl2RJU 1 +Mkh11586TV6kN8IpnLVMQ UDP 0.490 64.105.253.213 58140
a=candidate:cnsB1P6I85tVDpl/UgjTWRl8rFOYSkXOa8nPvnl2RJU 2 +Mkh11586TV6kN8IpnLVMQ UDP 0.490 64.105.253.213 55208
a=candidate:/YhjMGvsupfnJrUraPnPUwnSUV3IsMpMLHwZIqW4aQI 1 Fvf+CecTZF6sVN/Svuunrg TCP 0.250 10.0.0.2 50014
a=candidate:/YhjMGvsupfnJrUraPnPUwnSUV3IsMpMLHwZIqW4aQI 2 Fvf+CecTZF6sVN/Svuunrg TCP 0.250 10.0.0.2 50014
a=candidate:VCZf8gadJG6G8Pb3xS7bj/4CVK/P+GeIhuew2tHBy9k 1 DIX0ZzFlrnlzdLGqfqWB0w UDP 0.550 10.0.0.2 50005
a=candidate:VCZf8gadJG6G8Pb3xS7bj/4CVK/P+GeIhuew2tHBy9k 2 DIX0ZzFlrnlzdLGqfqWB0w UDP 0.550 10.0.0.2 50017
a=cryptoscale:1 client AES_CM_128_HMAC_SHA1_80 inline:yEiOl3HA+vbDHvqSmvplV9BGpfg19jSxwjFElAPz|2^31|1:1
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:HdnKHORdSJgC/rcYZ1y3uMRbKvybFruyFiD+UkoZ|2^31|1:1
a=maxptime:200
a=rtcp:50008
a=rtpmap:114 x-msrta/16000
a=fmtp:114 bitrate=29000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:112 G7221/16000
a=fmtp:112 bitrate=24000
a=rtpmap:115 x-msrta/8000
a=fmtp:115 bitrate=11800
a=rtpmap:116 AAL2-G726-32/8000
a=rtpmap:4 G723/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:97 RED/8000
a=rtpmap:13 CN/8000
a=rtpmap:118 CN/16000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=encryption:required

------=_NextPart_000_0149_01C9A22E.BDA43360
Content-Type: application/sdp
Content-Transfer-Encoding: 7bit
Content-Disposition: session; handling=optional

v=0
o=- 0 0 IN IP4 192.168.5.150
s=session
c=IN IP4 192.168.5.150
b=CT:99980
t=0 0
m=audio 50003 RTP/AVP 114 111 112 115 116 4 8 0 97 13 118 101
k=base64:ROFyvlcWFwsPej5xrWlQj+PFsw9Uyy0OSHoFv62mLTPvXdpnn5XvqcxI556k
a=ice-ufrag:VXim
a=ice-pwd:OKEB+HhXDUoNP4lrx8AH+syY
a=candidate:1 1 UDP 2130706431 192.168.5.150 50003 typ host
a=candidate:1 2 UDP 2130705918 192.168.5.150 50006 typ host
a=candidate:2 1 TCP-PASS 6556159 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:2 2 TCP-PASS 6556158 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:3 1 UDP 16648703 64.105.253.213 54183 typ relay raddr 64.105.253.213 rport 54183
a=candidate:3 2 UDP 16648702 64.105.253.213 51646 typ relay raddr 64.105.253.213 rport 51646
a=candidate:4 1 TCP-ACT 7076863 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:4 2 TCP-ACT 7076350 64.105.253.213 53119 typ relay raddr 64.105.253.213 rport 53119
a=candidate:5 1 TCP-ACT 1684797951 10.0.0.2 50001 typ srflx raddr 192.168.5.150 rport 50001
a=candidate:5 2 TCP-ACT 1684797438 10.0.0.2 50001 typ srflx raddr 192.168.5.150 rport 50001
a=candidate:6 1 UDP 1694234623 10.0.0.2 50011 typ srflx raddr 192.168.5.150 rport 50011
a=candidate:6 2 UDP 1694234110 10.0.0.2 50009 typ srflx raddr 192.168.5.150 rport 50009
a=cryptoscale:1 client AES_CM_128_HMAC_SHA1_80 inline:yEiOl3HA+vbDHvqSmvplV9BGpfg19jSxwjFElAPz|2^31|1:1
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:HdnKHORdSJgC/rcYZ1y3uMRbKvybFruyFiD+UkoZ|2^31|1:1
a=maxptime:200
a=rtcp:50006
a=rtpmap:114 x-msrta/16000
a=fmtp:114 bitrate=29000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:112 G7221/16000
a=fmtp:112 bitrate=24000
a=rtpmap:115 x-msrta/8000
a=fmtp:115 bitrate=11800
a=rtpmap:116 AAL2-G726-32/8000
a=rtpmap:4 G723/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:97 RED/8000
a=rtpmap:13 CN/8000
a=rtpmap:118 CN/16000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=encryption:required

------=_NextPart_000_0149_01C9A22E.BDA43360--

The first thing you notice is that this contains two complete sets of SDP.  The first SDP block contains a version 6 ICE candidate list and the second contains one for version 19.  You can see “ms-proxy-2007fallback” string identifies which one is the legacy block.  This is called a multipart SDP and explains how OCS 2007 R2 endpoints are still able to negotiate media with Exchange 2007 UM and other legacy OCS 2007 endpoints.  If the caller is R2, both SDPs are offered and the legacy endpoint responds with a ICE version 6 SDP only.  This tells the R2 endpoint to go into legacy mode.  If the callee is R2, the offer will contain just a legacy ICE SDP which indicates to the callee that it should only respond with a legacy ICE SDP.  Keep in mind that because app sharing is a new feature of OCS 2007 R2, you will never see any app sharing candidate lists or media offer in a legacy SDP block.

You’ll also notice the version 19 candidate list is shorter and more readable.  Rather than encoding a unique username/password per candidate, a common one is used for the entire set of candidates.  The type of ICE candidate is also encoded, where HOST is a candidate on the endpoint itself, SRFLX (short for Server Reflexive) is a STUN candidate on the NAT, and RELAY a candidate on the A/V Edge.  You’ll also notice that TCP candidates are denoted as ACT (Active) or PASS (Passive), indicating whether the candidate will initiate or receive connectivity check requests.  In OCS 2007, TCP A/V Edge candidates behave as active and passive, but TCP NAT candidates were passive only.  However this was not apparent from looking at the candidate list SDP.  Another difference is the priority encoding.  ICE version 6 used a three digit decimal to encode the priority and required floating point math to compute the combined priority of a candidate pair.  In ICE version 19, the priority is now an integer, which makes the computation less intensive.

Again, the details of the SDP differences between the two ICE versions is not terribly important.  Just remember the multipart nature of the SDP and how an R2 endpoint negotiates with legacy ICE endpoints.

Differences in A/V Edge 50,000 port range requirement

In OCS 2007, the external side of the A/V Edge server role required ports 50,000-59,999 to be open for UDP and TCP in the inbound and outbound direction.  Although this was a secure solution (see my original blog post), networking administrators perceived this to be a security threat and were very resistant to deploying the A/V Edge role.  To mitigate this deployment hurdle, OCS 2007 R2 reduces the requirement to just allowing ports 50,000-59,999 for TCP outbound only.  Moreover, the product documentation now states that this outbound TCP port support is only required to support federation with OCS 2007 R2 environment.  To support remote users only, opening ports UDP 3478 and TCP 443 is sufficient.  (This remote-only mode worked in OCS 2007, but was not officially supported.)  What changed in the A/V Edge?  Well, the A/V Edge now supports a federation over a “tunneled” link.

Let’s say a R2 OC endpoint within the Contoso company network calls an R2 OC endpoint within the Litware company network.  Both endpoints still advertise allocated ports in the 50,000-59,999 range in their candidate lists.  Now let’s say connectivity checks are happening and the Contoso R2 A/V Edge receives a UDP STUN connectivity check destined for the Litware A/V Edge.  Instead of sending that to the Litware A/V Edge using a source and destination port in the 50,000-59,999 range, the Contoso A/V Edge actually encapsulates this connectivity check in a new TURN tunnel message and sends it to the Litware A/V Edge using a UDP source and destination port of 3478.  Keep in mind that the intended source and destination IP/port numbers are passed within this tunnel packet.  When the Litware R2 A/V Edge receives this tunnel packet, it unpacks the message, looks at the intended source/destination IP/port info, and treats the packet as if it came to the destination IP/port from the source IP/port.

The idea is that conveying the knowledge of the intended source and destination IP/port for this connectivity check provides the equivalent security as actually sending the connectivity check along that route.  This explains why UDP ports in the 50,000-59,999 range are no longer needed.  Why is TCP needed in the outbound direction only?  In turns out TCP also supports the same tunneling mechanism.  However, TCPs connection oriented nature means problems can arise if the listening port is used as the source port when opening a TCP connection.  So in the connectivity check example used above, the Contoso A/V Edge opens a TCP connection to port 443 on Litware’s A/V Edge, choosing and ephemeral source port in the 50,000-59,999 port range.

Supporting federation with legacy A/V Edge servers

The example above works for two R2 OCS deployments.  What would happen if Litware was still on OCS 2007?  Again, both OC endpoints will advertise A/V Edge candidates in the 50,000-59,999 port range.  In order for connectivity to succeed, Contoso’s R2 A/V Edge must be able to send a connectivity check to Litware’s A/V Edge and vice versa.  To support the former, Contoso doesn’t know that Litware’s A/V Edge is only on OCS 2007, so it tries to send the tunneling connectivity check packet, but Litware’s A/V Edge is legacy, so it drops these packets.  Hearing no response, the Contoso A/V Edge will then flip to direct mode where it will send the packet using a source and destination port in the 50,000-59,999 port range.  Similarly in the other direction, the Litware A/V Edge has no ability to send a tunneled connectivity check, so it sends directly in the 50,000-59,999 port range as well.  The same logic applies to TCP connectivity checks.  You can now see why opening the 50,000-59,999 port range for UDP and TCP in the inbound and outbound direction is required to support federation with legacy OCS 2007 A/V Edge deployments.

Port Range Implications

Supporting two versions of Ice in an Invite does have implications on the number of ports allocated at the start of a call.  In the SDP snippet above, you’ll notice the version 6 ICE candidates are totally different than the version 18 ICE candidates, meaning two full candidate sets are allocated instead of just one set in OCS 2007.  Early media could also have an impact on the number of allocated ports if a called user has multiple points of presence.  Each called endpoint will allocate a set of candidates and perform a full ICE negotiation prior to the call being answered.  That application sharing uses ICE could also increase the port allocation usage for ICE.

The majority of these ports is short lived and will be de-allocated within 10 seconds of the call being answered.  The only ports that remain for the duration of a call are actually used to send and receive media.  Nonetheless, this increased port usage at the start of a call could be an issue for enterprises who have narrowed the allowed port range of their endpoints or the reduced number of ports in the A/V Edge’s 50,000 port range.  For these reasons, the OCS team recommends the media port range for R2 Office Communicator clients to be at least 40, twice the recommendation provided in OCS 2007.

Conclusion

Although the fundamental architecture of media traversal remains the same in OCS 2007 R2, a number of enhancements have been.  Key impacts include: faster negotiation of the media stream through early media ICE negotiations, leveraging ICE/STUN/TURN for new modalities such as application sharing, and easing the port range requirements on the A/V Edge server through a tunneled federation mode.  This revised implementation of ICE/STUN/TURN will serve as a great foundation for enabling connectivity of new media scenarios in future versions of the Microsoft Unified Communications product line.

This insight into Office Communications Server 2007 R2 was created as part of Alan Shen’s participation in the Microsoft Certified Master program.

The Microsoft Certified Master Program: The Microsoft Certified Master: Microsoft Office Communications Server 2007 program provides the most in-depth and comprehensive training available today for Office Communications Server 2007. This three-week training program is delivered by recognized experts from Microsoft and Microsoft partner organizations.

Lync Server Resources

We Want to Hear from You