I worked a support case recently that involved an OCS 2007 R2 Enterprise deployment with two Front Ends and a two Edge array, load balanced by Brocade/Foundry Load Balancer. Example of the working configuration follows.
Basically, the internal facing edge VIP cannot have keepalives. The external facing edge VIP has to have keepalives extended (not the default 5sec, 5 retries.). The firewall also had limits set for failed attempts that were very restrictive and it also had snooper checks on the load balancer that kept putting the servers in failed state. We removed the snooper checks and increased the limits.
Actual customer IPs’ and FQDN’s have been changed.
INTERNAL FACING EDGE VIP
server real edge001
port ssl
port ssl no-health-check
port 3478
port 3478 no-health-check
port sips
port sips no-health-check
port 5062
port 5062 no-health-check
server real edge002
server virtual OcsEdge.xxx.com
tcp-age 30
sym-priority 4
predictor round-robin
port ssl sticky
port 3478 sticky
port sips sticky
port 5062 sticky
track-group sips 5062
bind ssl EDGE001-int ssl
bind 3478 EDGE001-int 3478
bind sips EDGE001-int sips
bind 5062 EDGE001-int 5062
bind ssl EDGE002-int ssl
bind 3478 EDGE002-int 3478
bind sips EDGE002-int sips
bind 5062 EDGE002-int 5062
EXTERNAL FACING EDGE VIP
server port 443
session-sync
tcp
tcp keepalive 30 3
server port 5061 session-sync tcp tcp keepalive 30 3
Real Server Config on Foundry:
server real EDGE002-ext-01 xxx.xxx.xxx.xxx port ssl port ssl keepalive port sips port sips keepalive
Vip config on Foundry:
server virtual sip.xxx.com xxx.xxx.xxx.xxx tcp-age 30 sym-priority 4 predictor least-conn port ssl sticky port sips sticky track-group ssl 5061 bind ssl EDGE002-ext-01 ssl bind sips EDGE 002-ext-01 sips
bind ssl EDGE 001-ext-01 ssl bind sips EDGE 001-ext-01 sips
other settings in the dmz
Firewall settings:
No snooper checks on the ports used in the load balancer. This was causing the server to fail and not come online with the lb.
Limits to failed attempted connections increased. We were reaching the limits for failed connections.