Lync client failover timing question

Question

We are having a rough time with Lync and LTM's for just the Front End setups. Using 11.1 HF4 (tried 11.2) with the tempalte and docs. I've built this before without any issue, but our issue is the time it takes the client to reconnect once one of the servers is down, or marked down.&nbsp;&nbsp;

&nbsp;It's taking about 1 min. 45s for the client to finally switch over to the second server. It appears the LTM's know the client is down (within the 16s timeout we have) but the client just churns. We've had a ticket open for a few weeks on this and everyone's stumped. It appears the clients communicate fine to the VIP then at some point in the conversation the clients talk directly to the server. When the server goes away the client just keeps trying because it's not being poisoned by the LTM's.&nbsp;
&nbsp;We even tired DNS RR which is about the same thing, but I wanted to know from the group, how long should it really take for the client to "get" that the server is down and reconnect to the other pool member?&nbsp;&nbsp;&nbsp;
Thanks in advance,&nbsp;&nbsp;&nbsp;&nbsp;
Bob James&nbsp;&nbsp;

mikeshimkus_111 · Answer

Hi Bob, I'm not sure why the clients would do that-especially since we are only using TCP monitors for those pool members.  One would think clients trying to connect directly to the servers would be immediately reset.   
&nbsp;  
&nbsp; Have you opened a ticket with Microsoft about this issue? 
&nbsp; thanks 
&nbsp; Mike

robert_james_10 · Answer

Hey Mike,&nbsp;&nbsp;Well this one has 7 people stumped, including 3 F5 engineers.&nbsp;&nbsp;&nbsp;I'm tired from taking traces all day.&nbsp;&nbsp;&nbsp;The last test we ran was really bizzare; with a client on a PC with a sniffer on it and controlling the servers, we verified the client was talking to the vip and the vip only. We pulled the plug on the server nic and watched the F5's know it's down, the http session for the client timed out right away but the sip-tls session stayed open a lot longer, about 2 minutes the client reconnected to the still up server. Now the strange part, when the old server was brought up, the client reconnected to it, but the sniffer trace showed the client was talking along to the never down server, then it went right out did a DNS lookup for the now brought up server (not the pool name)' it then connected directly to the newly up server bypassing the F5 all together.&nbsp;&nbsp;&nbsp;So far we have verified dns, srv records, configs, etc. To me it looks like the client is somehow learning the individual server ip's and going direct to them, but we don't know how they are learning this. We have a ticket opened with MS, one with F5, and have had many guru's look at this. We are all stumped -stupid Lync :D&nbsp;&nbsp;&nbsp;At this point there are nly two questions we really want to know;&nbsp;What is the time it takes a client to reconnect through the F5's in a good setup (front end pool)?&nbsp;And should the client ever have an individual host name in their Lync local file (client)?&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cheers,&nbsp;&nbsp;&nbsp;Bob James&nbsp;

mikeshimkus_111 · Answer

It's my understanding that Lync itself provides the name of the Front End server to the client.  The client has a preferred pool and servers within that pool, which may explain why the client connected to the downed server when it came back up.  I am not sure how often Lync client requeries DNS for the SRV record that would direct it to the VIP, but it's something the MS engineers should know.

mikeshimkus_111 · Answer

Robert, can you please PM me with your F5 case number, so I can track this and work with the engineers if necessary? 
&nbsp; thanks 
&nbsp; Mike

ryan_korock_46 · Answer

Robert... By MSFT design, after a Lync client gets load balanced to a Lync front end, the client recieves a list of front end pool members and can (and eventually will) connect to these servers directly. Once they connect to the servers directly, BIG-IP loses the ability to manage the connection. BIG-IP has functionality designed to reconnect the client to a valid server upon the original servers failure, however when the Lync client routes around the BIG-IP, you lose the ability to take advantage of this functionality. If at this point a server goes down, it is up to the Lync client to initiate a reconnection on it's own.