LCSKid

LCS and OCS Product Information

Blogs

LCS - Hardware Load Balancers and time out values

  • Comments 1
  • Likes

Customers ask questions regarding the configuration of time out values for LCS and their Hardware Load Balancers. There are 2 partner load balancer solutions of which we have documentation on at the http://office.microsoft.com/livecomm site.

Some customers are looking in the Planning Guide at the following blurb and end up with questions.

Must provide a configurable TCP idle-timeout interval with a maximum value greater than or equal to the minimum of the REGISTER refresh / SIP Keep-Alive interval. Attribute msRTCSIP-DefRegistrationTimeout

 The customer question on this internal email discussion was:

The concern I have here is that when a failover takes place it typically takes 90 seconds. Sometimes it's a little faster sometimes a little slower. After talking to our Microsoft rep he said that failover should be fairly seamless to the user. Our load balancer (BigIP) has a TCP(5060) monitor set up to check every 5 seconds. I have noticed during testing that the load balancer does detect the server is unavailable within that time but it appears the Messenger client doesn't failover until a much later time. The default reg expiry value is set to 600. If I set the load balancer to that it will only check every 10 minutes to make sure the server(service) is up. Do you have any thoughts on what needs to be changed? Is 90 seconds accurate or should failover be much more seamless?


 <Product Group member responses>



The planning guide recommends adjusting the TCP idle-timeout interval on the loadbalancer based on the default setting. Adjusting the default reg expiry based on the loadbalancer setting is not recommended.


The LB setting you have mentioned below is the heartbeat interval between the LB and the front-end. It is fine for it to be 5 seconds. The corresponding blurb from the planning guide is

The Load Balancer must be able to detect Live Communications Server availability by establishing TCP connections to ports 5060, 5061 or both (often called a ‘heartbeat’ or ‘monitor’). The pooling interval must be a configurable value, with a minimum value of at least five seconds. The Load Balancer must not select a Live Communications Server that shuts down until a successful TCP connection (heartbeat) can be established again.

The other LB setting is the TCP idle-timeout which must be configured according to the following. This is not related to the heartbeat internal mentioned above. The Load Balancer must provide a configurable TCP idle-timeout interval with a maximum value greater than or equal to the minimum of the REGISTER refresh / SIP Keep-Alive interval.

Yes failover will be fairly seamless to the user and it is normal to take about 90 seconds. The client has inbuilt randomization for sign in retry to avoid stressful spikes on server load when thousands of clients are connected to the server which might adversely affect client experience.  It is ok for the TCP monitor setting to be set at 5 seconds as this will help the LB to mark the server down quicker. Any new clients will not be load balanced to this server. On the other hand any existing clients will use the inbuilt keep alive mechanisms and retry randomizations to log back in seamlessly to another server.


For every TCP connection the load-balancer maintains state associating that connection with a particular target server. This state has an associated timer that determines how long the connection has been idle (aka. inactive). This is the TCP idle-timeout interval. If this setting is smaller the REGISTER refresh interval or SIP Keep Alive interval (SIP Keep-Alive interval is fixed at 5 minutes) then the load-balancer will TCP idle-timeout will hit and reap the connection removing its state. A subsequent data packet from the client will fail with the load-balancer indicating that the connection was closed which will cause the client to have to retry and re-establish a new connection (this is expensive for the server and will cause intermediary failures for the client during the retry period.)


I hope this provides some helpful background on the timeout values and their relationship. One of our challenges in support is that not all vendors have specific information on the configuration of their solution and they use different/proprietary terms. We will do our best to help as always.

Toml LCSKid
Comments
  • One of the big differences between LCS and OCS is the need for hardware load balancers for the Enterprise

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment