Microsoft Lync Server 2010 communications software adds VGA conferencing to a set of video capabilities that includes peer-to-peer video with a maximum resolution of high definition (HD) 720p and conference video supported by the Polycom CX5000 USB conferencing device. This article describes the video platform in Lync Server 2010 for peer-to-peer and VGA video conferencing. (Note that interoperability scenarios with Polycom and other partners are supported in Microsoft Lync 2010 but out of scope for this article. Check the NextHop blog again soon for an article about Lync 2010 video interoperability.)

Author: Andrew Sniderman

Publication date: June 2011

Product version: Lync Server 2010

Video Platform

Each previous server version (Microsoft Office Communications Server 2007 R2, Microsoft Office Communications Server 2007, and Microsoft Office Live Communications Server) has introduced new video capabilities for peer-to-peer calls. Office Live Communications Server supports a single video codec (H.263) and low-resolution Common Intermediate Format (CIF) (352x288) video. Office Communications Server 2007 adds support for a second video codec, RTVideo, and VGA resolution (640x480) for peer-to-peer calls. Office Communications Server 2007 R2 introduces support for HD resolution (720p: 1280x720). And now, Microsoft Lync Server 2010 gives you VGA for conferences. VGA conferencing adds VGA resolution for multiparty conferences that are hosted on the A/V Conferencing Server to the CIF resolution that was available in previous server versions.

To support multiple resolutions, a discovery mechanism negotiates the highest video resolution. This requires each client to publish their video capabilities when a conference begins. You can see this information and negotiation in the client Session Description Protocol (SDP) traffic that is captured in local Unified Communications Client Platform (UCCP) logs (for example, Communicator-uccapi-1.uccapilog). If logging is enabled on your client, you’ll find these logs in the %userprofile%\tracing directory and can view them by using the Snooper tool. Look for the x-caps line that shows the resolution and frame rate a client is capable of.

Each x-caps line shows a media type, a unique identifier, the resolution (represented by width and height), the frame rate, and the bitrate in the following syntax:

a=x-caps: <media> <ID>:<width>:<height>:<framerate>:<bitrate>

For example:

a=x-caps:121

263:1280:720:30.0:1500000:1;

4359:640:480:30.0:600000:1;

8455:352:288:15.0:250000:1;

12551:176:144:15.0:180000:1

Note. I added line breaks to improve readability. This entry is all on one line in the SDP.

The first line in this example shows the video capabilities for RTVideo (numeric media type 121 or x-rtvc1). The second line shows the characteristics of HD video (720p) with a resolution of 1280x720, at 30 frames per second (fps) and a bitrate of 1.5 megabits per second (Mbps) maximum. Subsequent lines show lower resolutions and framerates for VGA, CIF, and quarter CIF (QCIF), respectively.

For more details about the x-caps syntax, see [MS-SDPEXT]: Session Description Protocol (SDP) Version 2.0 Extensions at the MSDN Library.

The capabilities each client publishes in the SDP are governed primarily by processor power. Video encoding and decoding are processor intensive—two processor cores are required for VGA and four are required for HD. Table 1 describes the video capabilities you will see reflected in the x-caps lines based on client version and processor core counts.

Table 1. Video capabilities published in the SDP

Client

One Core

Two Cores

Four Cores

Lync 2010

CIF15, VGA15

CIF15, VGA30, HD15

CIF15, VGA30, HD30

Communicator 2007 R2

CIF15, VGA13

CIF15, VGA30, HD13

CIF15, VGA30, HD25

Communicator 2007

None

None

None

Note. As shown in Table 1, Microsoft Office Communicator 2007 doesn’t publish an x-caps line; Lync Server 2010 assumes Office Communicator 2007 supports CIF at 15 fps and VGA at 15 fps.

In Microsoft Lync 2010, the frame rate for HD is increased to 30 fps from 25 fps in Microsoft Office Communicator 2007 R2. A single core computer can decode VGA at a reduced frame rate—this is necessary to support single core clients viewing a VGA conference. We’ll cover this in more detail later, in the section “Conference Video.”

There are a number of other factors at play—just because your client is capable of using HD, doesn’t mean your users will see HD video. Factors affecting this negotiation include the following:

  • HD-capable camera The user’s camera must be HD-capable.
  • Size of the video screen The video stack won’t waste bits by sending HD video unless the user is viewing video in full-screen mode or sized the video to pop out.
  • Type of call Only peer-to-peer calls support HD; conferences don’t.
  • Policy The default maximum video resolution is VGA. You can increase this to HD by using the Windows PowerShell cmdlet Set-CsMediaConfiguration.
  • Call admission control (CAC) CAC can restrict how much bandwidth is available for video on a per session or per link basis. Unless your CAC policy permits sufficient bandwidth for HD, Lync Server clients are forced to use VGA resolution or lower.

Video Fundamentals

Streaming and real-time IP video encodes discrete images as frames. Frames are sent in rapid succession (for example, in HD, 30 frames are sent per second) and decoded by the receiving end so that the user sees continuous video. Different types of frames optimize the number of bits required to encode and send subsequent images. Lync Server uses the following common frame types:

  • I-frames Each frame is a complete rendering of an image and can be decoded individually. I-frames are typically composed of many packets. For example, a VGA I-frame could be 10 packets or more. I-frames are sometimes referred to as key frames.
  • B-frames Each bidirectional frame references or depends on both the prior and subsequent frames for decoding.
  • P-frames Each predictive frame references the prior frame.

Microsoft Office Communications Server also uses a special frame type that is an optimization of the VC-1 codec, which RTVideo is based on. This is called a super-P (SP)-frame. SP-frames are sent every second. This allows for better recovery in environments with high packet loss.

Frames are grouped together in a structure called a group of pictures (GOP). Each GOP begins with an I-frame. After that, the GOP is different for Lync Server and Office Communications Server.

Communications Server uses a 10 second GOP. Figure 1 is an example of a Communications Server conference GOP.

Figure 1. Communications Server GOP

 

Figure 1 shows RTVideo CIF at 15 fps, so you see a SP-frame every one second or every 15 frames. B-frames and P-frames follow I-frames and SP-frames in a BPB pattern every second.

In Lync Server, the SP frame is removed, so RTVideo has minimal differences from the VC-1 standard it’s based on. The GOP has been reduced to four seconds for peer-to-peer calls and three seconds for conferences. Figure 2 is an example of a Lync Server conference GOP.

Note. For details about VC-1, see VC-1 Technical Overview at the Windows Media website.

Figure 2. Lync Server GOP

 

In this example, I’m assuming RTVideo VGA at 30 fps, so you see an I-frame every three seconds or every 90 frames. B-frames and P-frames follow I-frames in a BPBP pattern.

Note. Forward error correction (FEC) packets are included to protect key frames from packet loss.

Conference Video

Now that you understand how Lync Server performs video, let’s take a closer look at how this applies to conferences. For peer-to-peer video, the endpoints negotiate the correct video codec and resolution over the signaling channel. When a user changes the size of the video window, the video resolution is changed in the call signaling (by using the real-time transport control protocol, or RTCP).

Note. For details about RTCP, see [MS-RTP]: Real-time Transport Protocol (RTP) Extensions at the MSDN Library.

Conferences complicate things—we now have a larger group of participants to negotiate conference video resolution and, potentially, a need to accommodate clients that can’t negotiate VGA.

Conferences start at CIF resolution. Requests to increase to VGA are initiated by users switching to full-screen mode or increasing the size of their video window to align with VGA (640x480) resolution. These requests are sent to the A/V Conferencing Server over RTCP. The Lync Server Audio/Video Conferencing service aggregates these requests and adjusts conference video resolution accordingly. A single stream is sent to all conference participants—either CIF or VGA.

If a participant is using a single processor computer that can’t properly decode VGA, the Lync Server Audio/Video Conferencing service uses a technique called rate matching to reduce the frame rate. As shown in Table 1, this will be either VGA at 13 fps or 15 fps, depending on the client version.

Rate matching doesn’t send B-frames to receiving endpoints that require a lower frame rate. Figure 3 shows the same example from Figure 2 with rate matching (without B-frames).

Figure 3. Rate matching

Because I-frames and P-frames are not dependent on B-frames for decoding, rate matching doesn’t introduce artifacts into the video stream.

To let a user know that he or she might be seeing reduced video quality due to rate matching, the Lync Server Audio/Video Conferencing service signals to the client by using an ms-diagnostic code sent over RTCP. This is one of many ms-diagnostic codes Lync Server sends to clients. These ms-diagnostic codes appear on the client as quality notifications in the UI. For more details, see [MS-RTP]: Real-time Transport Protocol (RTP) Extensions at the MSDN website.

Summary

VGA conferencing leverages high fidelity video to bring more value to conferences. To create this more seamless user experience, some behind-the-scene changes were required. I hope this article gives you an understanding of these changes and some information about what you see in a Lync Server network trace and the user experience with peer-to-peer and conference video in Lync.

Additional Information

To learn more, check out the following articles:

●  VC-1 Technical Overview

Note. VC-1 is the standard code that RTVideo is based on.

●  [MS-RTP]: Real-time Transport Protocol (RTP) Extensions,

●  [MS-SDPEXT]: Session Description Protocol (SDP) Version 2.0 Extensions

Lync Server Resources

We Want to Hear from You

Keywords: VGA, Lync, video, peer-to-peer video, conference, video, conferencing, resolution, Snooper, CAC