Check out the most comprehensive, actively managed Lync blog roll in the known universe, your one-stop source for links to over 100 of the very best Lync blogs. Here you will also find weekly blog highlights and a feed for a dozen of the top blogs.
Lync Server Support Home
Top Lync Solutions RSS Feed
Microsoft Senior Support engineers walk you through real-life support cases, giving you an insider’s view into the systematic approach they use to troubleshoot Lync Server issues.
These short videos focus on specific tasks and show you how to accomplish them for Microsoft Lync Server 2010.
As an IT Admin, how do you know when end user experience will start to suffer and which Performance Monitor counters should you be monitoring to ensure your users continue to have a quality experience? Also, how would you predict degradation of user experience proactively?
Author: Stu Osborn
Publication date: July 2008
Product version: Office Communications Server 2007 R2
My colleague Pauline already has an excellent UC blog on this subject. Great stuff... She concentrates on the Front end server role and its interaction with the pool’s SQL Back end server. But there are hundreds and hundreds of separate Performance Monitor counters for Office Communications Server 2007 and most deployments include several other server roles besides Front end and Back end. Current guidance on this subject from the product team includes: administration guides, deployment guides, planning guides, technical reference guides and the like. But what am I offering new here?
Well, this blog has new information about how to determine server health. In addition to listing Perfmon counters as recommended by the product team, I identify certain thresholds so you can see when health is degrading and exactly when to take action! I also recommend a three-pronged approach to this task by “polling”, “monitoring” and taking “remedial actions”.
Below are the recommended perf counters with thresholds that should trigger action on the part of an Administrator. The resource utilization, user load and server health counters below are directly applicable to IM/Presence functionality. But you as an IT Admin will need to run resource utilization and user load baseline tests during medium load first to determine what is “normal” for your deployment. Then once you have your baseline numbers, you can add health monitoring counters to your overall monitoring scheme and go from there.
Recommended baseline counters to test and monitor resource utilization:
Processor; % Processor Time (_Total) [should operate at less than 80% during peak load]
Process; % Processor Time (RtcSrv)
Process; % Processor Time (IMMcuSvc)
Memory; Pages/sec ---
Network Interface; Bytes Total/sec ([your NIC]) [should operate at less than 80% capacity of the NIC]
(No baseline rules for individual process or memory utilization) Pages/sec - indicates total “pressure” on the server’s available memory Network Interface example: 100Mbit/sec NIC should be <80%x12.5Mbytes/sec ~ <10Mbytes/sec
Recommended baseline counters to test and monitor user load:
LC:SIP – 01 - Peers; SIP - 028 - Incoming Requests/sec (_Total) LC:SIP - 01 – Peers; SIP – 001 – TLS Connections Active (_Total) LC:SIP – 01 – Peers; SIP – 000 – Connections Active (_Total) [should be less than 15,000 connections per Front end] LC:SIP – 02 – Protocol; SIP - 001 - Incoming Messages/sec ---- LC:ImMcu – 00 - IMMcuSvc Conferences; IMMCU – 000 - Active Conferences ----
LC:ImMcu – 00 - IMMcuSvc Conferences; IMMCU – 001 – Connected Users ----
LC:USrv – 00 – DBStore; Usrv – 002 – Queue Latency (msec) [healthy is less than 100 msec] (server health decreases as latency increases to 12 sec when server throttling begins)
LC:USrv – 00 – DBStore; Usrv – 004 – Sproc (Stored Procedure) Latency (msec) [healthy is less than 100 msec] (server health decreases as latency increases to 12 sec when server throttling begins) Queue Latency=the time a request spent in the queue to the Back end server Sproc Latency= the time it took the Back end server to process the request
Recommended counters to monitor for server health:
(These counters will indicate negative trends as well as overall server health) LC:SIP – 01 - Peers; SIP - 024 – Flow-controlled Connections Dropped (_Total)
LC:SIP – 01 - Peers; SIP - 025 – Average Flow-Control Delay (_Total)
LC:SIP – 07 – Load Management; SIP – 000 – Average Holding Time For Incoming Messages ----
LC:ImMcu – 02 – MCU Health And Performance; IMMCU – 005 – MCU Health State ----
LC:USrv – 20 – Https Transport; USrv – 002 – Number of failed connection attempts ----
LC:USrv – 20 – Https Transport; USrv – 002 – Number of failed connection attempts / Sec ----
OCS 2007 MOM Pack thresholds from the documentation:
IMMCU - 020 - Throttled Sip Connections (Sample) (number of connections at which new SIP requests are refused) Sample Interval is 15 minutes. The current health of the MCU. 0 = Normal. 1 = Loaded. 2 = Full. 3 = Unavailable. Causes: MCU is overloaded, backend server is slow to respond, net problem Resolutions: This could happen if too many conferences are assigned to this MCU. [should be no more than 500 maximum sessions per MCU] (Normal= healthy; Loaded=marginal; Unavailable=maximum reached)
IMMCU - 020 - Throttled Sip Connections (Warning) (Error) (number of throttled Sip connections total) Sample Interval is 15 minutes Numeric Threshold Rule triggered when the sampled value is greater than 10. Causes: Peer is not processing requests in a timely fashion. Resolutions: This can happen if the peer machine is overloaded. (“Peer”=connected servers or adjacent Front end servers or MCUs in the same EE Pool – the same set of counters apply)
There are three phases of determining overall deployment health and wellness in a strategic monitoring plan:
For an in-depth resource on Office Communications Server 2007, including detailed troubleshooting tips, refer to the Office Communications Server 2007 Resource Kit, especially Chapter 13: “Monitoring,” available from MS Press at: http://www.microsoft.com/MSPress/books/10482.aspx.
Stu prepared the content for this post prior to transferring to Unify2