This article is the first part of a two-part series. This part provides you with an overview of how to analyze and troubleshoot problems with audio/video sessions when using Microsoft Office Communications Server 2007 R2, Microsoft Office Communicator 2007 R2, and Microsoft Office Live Meeting 2007 by providing you with a look into the following:
Part 2 of Analyzing Connectivity for Office Communications Server 2007 R2 Audio Video sessions introduces you to using the Pre Call Diagnostic Tool to analyze problems with audio video communications.
Author: Mike Adkins
Publication date: October 2010
Product versions: Microsoft Office Communications Server 2007 R2, Microsoft Office Communicator 2007 R2, and Microsoft Office Live Meeting 2007
The Office Communications Server 2007 R2 clients, Office Communicator 2007 R2, and the Live Meeting 2007 client all provide their user's with the ability to enjoy audio and video conferencing that is hosted by a Office Communications Server 2007 R2 Audio Video (A/V) server. However, sometimes the clients may provide their users with a degraded A/V experience. Poor A/V quality may be caused by network conditions that impair the delivery of the A/V data streams that are shared between the peers. Understanding some of the basics about how audio and video sessions are managed for the peers can lead to a more straightforward resolution for these types of issues.
This article points out the types of network connectivity that are required to establish the inter-unified communications (UC) client A/V session, how RTP is designed to use adaptive measures to help ensure the quality of A/V playback, and how to use the Communications Server 2007 R2 Resource Kit Pre Call Diagnostic Tool to analyze network connectivity issues that may affect the users of the client A/V experience.
The initiation of an A/V conference by a client begins with a series of SIP requests and responses that provide the exchange of security, media port, and supported A/V codec information that is used by the clients and the Communications Server A/V server during an A/V conference. This initial communication is known as SIP signaling, which requires Transmission Control Protocol/Internet Protocol (TCP/IP) connectivity between the clients, the Communications Server A/V server, and the internal edge of the Communications Server A/V Edge Server.
The SIP signaling procedure that is used to initiate an A/V conference provides the parameters that are needed to secure the communication between the peers that are joined during the conference. These SIP communications require that the Communications Server 2007 R2 A/V server (audio/video multi-party conferencing) and the clients (peer to peer audio/video communication) can make the necessary Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) connections to the Media Relay Service (RTCMEDIARELAY) and Media Relay Authentication Service (MRAS) and is hosted on the Communications Server 2007 R2 A/V Edge Server. Figure 1 and Figure 2 show what the SIP signaling information should look like.
Note The following information was taken from the Communicator 2007 R2 client (communicator-uccapi.uccplog) that initiated the A/V session with another internal Communicator 2007 R2 peer. For brevity, only the Session Description Protocol (SDP) information from the SIP SERVICE Request packet is shown.
The SIP SERVICE request for MRAS that is routed to the Communications Server 2007 R2 A/V server as shown in Figure 1
Figure 1. SIP SERVICE Request for MRAS
The SIP SERVICE Response for MRAS that is routed to the Communications Server 2007 R2 A/V server is shown in Figure 2.
Figure 2. SIP SERVICE Response for MRAS
The information in green in Figure 2 (<mediaRelayList></mediaRelayList>) demonstrates the need for specific TCP and UDP port connectivity between the internal Communicator 2007 R2 clients and the Communications Server 2007 R2 Audio Video/Edge Server.
For more information, see Firewalls for Office Communications Server 2007 R2 about the following:
After the security requirement for the A/V session has been fulfilled, it is the responsibility of the peers to provide each other with their media connectivity address information and their list of available audio codec and video codec information. The Interactive Connectivity Establishment (ICE) client prioritizes the media connection information by using UDP as the preferred candidate's transport for the client media connection. Because RTP can use TCP as well as UDP, the ICE client adds TCP as an additional transport option for the client's media connection to the preferred candidates list, but with a lower priority than the UDP transport. Providing the options for both TCP and UDP transports helps to ensure a media connection between the Communications Server 2007 R2 audio and video endpoints on networks where TCP is the only routable transport protocol. The ICE client advertises separate preferred candidate pair connection and codec information for audio and video sessions. This is because the difference in the technologies that are used by audio and video codecs.
Note The following information was taken from the Communicator 2007 R2 client (communicator-uccapi.uccplog) that initiated the A/V session with another internal Communicator 2007 R2 peer. For brevity, only a limited amount of the SDP information from the SIP INVITE Request packet that creates the A/V session is shown.
The SIP INVITE information, if successful, is acknowledged with a 200 OK response from the Communicator 2007 R2 peer for the A/V session. The responding Communicator 2007 R2 peer provides the requesting peer with the media connectivity information that it prefers to use for its media connection. This allows both clients to have knowledge of its peer(s) as shown in Figure 3 and Figure 4:
Figure 3. SIP Invite Request for audio
Figure 4. SIP INVITE request for video
For more information about using Office Communicator logging to help analyze SIP signaling issues with A/V sessions, see Client Logging in Communicator.
Audio and video communications for the clients requires the support of RTP. It uses a dynamic feature set that complies with audio and video codec definitions to manage a consistent media stream between all endpoints in an A/V conference on a computer network. This section of the article provides some definitions of the RTP's header information that is used to define the parameters for coordinating each audio/video session.
As packets are prepared and sent from the sender, they are labeled with a sequence number so that the receiver can identify the packet order and determine if a packet was lost or received out of order. The numbers are assigned sequentially by the sender, but the starting value is always random.
When an audio payload is added to the RTP packet, a time stamp is added to the sample so that the delay between packets can be calculated by the receiver. The Sender cannot know how long transit latency will be, or what order the receiver will be receiving the packets in. The receiving device will use the time stamp value to build a buffer for consistent replay of the audio stream in the order that it was sent.
The starting time stamp value for RTP packets delivering an audio payload is a randomly generated number. It is incremented by the size of the audio payload sample in each of the subsequent RTP packets for a specific audio session. For more information, see the "RTP Audio" section of this article.
The starting timestamp value for RTP packets delivering a video payload is a randomly generated number. However, the timestamp value that is used manage the playback of the RTP packets video payload are parameterized to meet the requirements of the RTP video stream. For more detailed information see the "RTP Video" section of this article.
The Synchronization Source (SSRC) ID is a randomly generated number that is added as a header to each RTP packet for each independent media session that is generated by a client or a server. It's possible to have multiple media sessions associated with the same client; the RTP packets SSRC value is used to identify these separate media sessions.
The real-time control protocol (RTCP) provides per-session management for each audio or video session that uses RTP. RTCP provides all the endpoints that are joined in the media session with information that allows them to take adaptive measures, which corrects the flow of RTP packets to endpoints on the network. This flow-control mechanism is managed through the use of RTCP Sender and Receiver reports. These reports are delivered to all the endpoints that are joined in a media session to help ensure the consistent delivery of the media stream.
The RTCP Sender report provides detailed information about the following:
Each time a client or server initiates an audio or video session, a Sender report is sent (on a periodic basis) to all the peers who are receiving streaming media from that client or server.
The RTCP Receiver report includes information that is similar to the Sender report. It contains information such as fractional packet loss, jitter, an NTP time stamp of the sender reports, and the SSRC value that is specific to the media session. This information can be used by the sender to make adjustments to the way the sender shapes and sends packets to the receivers on the network. Each client that has been receiving streaming media from its peers sends a periodic Receiver report to those peers.
Figure 5 shows the content of an Ethernet frame that contains the RTP information that defines the video portion of the A/V session. Analysis of this RTP packet provides us with detailed information about how our video session is being managed and how RTP helps provide a consistent delivery of the RTP audio packets.
Figure 5. Ethernet Frame containing RPT information that defines the video portion of the A/V session
There are a few substantial differences between the encoding and decoding of the audio and video streams that are defined by the design of the individual codecs that are used. These differences are reflected in the RTP traffic that is used for each stream. Following are RTP headers that are used by both audio streams and video streams. Notice that an RTP video stream manages its time-stamp process differently than an RTP audio stream. An RTP time-stamp frequency of 90,000 Hz is typically used with video codecs as follows:
Figure 6 shows the contents of an Ethernet frame that contains the RTP information that defines the audio session that is described in the "SIP Invite Request for Audio" section of this article. The analysis of this RTP packet provides us with more information about how our audio session is being handled. The information should further clarify how RTP provides a consistent delivery of audio packets to the peers that are involved in the A/V session.
Figure 6. Ethernet frame that contains RPT information that defines the audio session
There are a variety of audio codecs on the market that are designed to be used with specific audio applications. As noted in the "SIP Invite Request for Audio" section of this article, we can see that the Windows client operating systems, which host the client, have a list of approximately eight audio codecs. This variety of audio codecs helps ensure that the Windows client operating systems can participate in audio conferences that are hosted by different audio-enabled clients and servers.
Audio codecs use one of two methods for determining the interval for the RTP time stamp-sample or frame. The following example describes the use of the RT Audio codec, which uses the sample method. The audio codec used in this example functions at a sample rate of 16 kHz, and the playback duration (ptime) of one packet is 20 ms. This particular correlation increments the time stamp value by 320.
The following shows a simple way to calculate the time-stamp value and the frames per second for RTP traffic that has a ptime value of 20 ms.
Hz = 16000R = ptime (20 ms) Y = Packet size X = Packets per second
Y = (Hz *R) or (16000 * .02) = 320 bits or 40 bytesX = (Hz/Y) or (16000/320) = 50 packets per second
Now let's dig deeper into the RTP information that we have available through our network capture as we saw in Figure 6.
Frame: Number = 1294- Rtp: PayloadType = Audio, Codec: RT Audio, ClockRate: 16000, P-Times: 20,40,60, Channels: 1, SSRC = 1502982158, Seq = 25779, TimeStamp = 2803428209
Locating the next RTP packet by its sequence number allows us to determine the difference between each RTP packet's time-stamp values. This allows us to know the ptime that is currently being used for our audio session.
Frame: Number = 1296 Udp: SrcPort = 13561, DstPort = 17707, Length = 102- Rtp: PayloadType = Audio, Codec: RT Audio, ClockRate: 16000, P-Times: 20,40,60, Channels: 1, SSRC = 1502982158, Seq = 25780, TimeStamp = 2803428529
Collecting the time stamps from each of the packets and taking the difference of 2803428529 - 2803428209 = 320, which means we have a ptime of 20 ms.
X = ptime >X = 320/16000 X = .02 (20 milliseconds)
The PayloadType field in Frame 1296 shows that we have P-Times 20,40,60 available for dynamic use with the RT Audio codec. The use of multiple ptime values throughout an audio session provides RTP with the flexibility to adjust the payload of the audio RTP packets to help ensure consistency in buffered playback.
Jitter is a variation in packet transit delay. The typical causes of jitter are queuing, contention, and serialization effects on the path through a network. Higher bandwidth networks tend to have less jitter; slower networks tend to have more congestion and therefore more jitter.
The advanced time-warping jitter buffer dynamically adjusts the audio play-out speed to optimize both quality and latency under network jitter as a function of the actual jitter conditions. The dynamic capabilities of the time-warping jitter buffer minimizes the buffering impact on latency in low jitter conditions, and then smoothly transitions to and from high jitter conditions. This is down by varying the buffer length and the playing speed in a manner that is barely noticeable to the listener.
To compensate for delays in RTP packet delivery and to ensure a smooth reconstruction of the audio stream, a legacy jitter buffer is constructed. Its size is calculated by adding the average packet delivery delay into the jitter buffer. The difference between the time-stamp values that those two frames contain represents the size of the jitter buffer as follows.
Buffer Size = Frame Y timestamp - Frame X timestamp
Our previous packet capture example gave two time stamps that resulted in a ptime value: 2803428529 - 2803428209 = 320 or 20 ms.
Unfortunately, legacy jitter buffers introduce an incremental delay, which can negatively impact the audio playback experience. Legacy jitter buffers typically contain about 20 to 40 ms of voice. Values of jitter in excess of the buffer length result in packets being discarded.
Note The jitter buffer is a separate function that is introduced by the audio codec, which is designed to process an RTP audio stream. The jitter buffer itself is not defined by RTP.
For additional information about analyzing RTP traffic on your network and tools to help solve issues with that have poor A/V performance, see Troubleshooting Network-Related Voice Quality Issues.
Successfully troubleshooting A/V communications between peers is based on the following:
This article is intended to provide you with an understanding of how to use the tools that Microsoft provides to identify some of the causes of A/V communication failures that take place on a Office Communications Server 2007 R2 network.
To learn more, check out the following information:
Note: To review communicator-uccpapi.uccplog on the local Windows client, install the Office Communications Server 2007 R2 Resource Kit by using the instructions in the Snooper Tool article.
We Want to Hear from You
To give us feedback about this article or to propose a topic for an article, e-mail us at NextHop@microsoft.com.
You can also send us a tweet at http://www.twitter.com/DrRez.
The above analysis is done automatically, when you use the