Forum Discussion
Lync client failover timing question
We are having a rough time with Lync and LTM's for just the Front End setups. Using 11.1 HF4 (tried 11.2) with the tempalte and docs. I've built this before without any issue, but our issue is the time it takes the client to reconnect once one of the servers is down, or marked down.
It's taking about 1 min. 45s for the client to finally switch over to the second server. It appears the LTM's know the client is down (within the 16s timeout we have) but the client just churns. We've had a ticket open for a few weeks on this and everyone's stumped. It appears the clients communicate fine to the VIP then at some point in the conversation the clients talk directly to the server. When the server goes away the client just keeps trying because it's not being poisoned by the LTM's.
We even tired DNS RR which is about the same thing, but I wanted to know from the group, how long should it really take for the client to "get" that the server is down and reconnect to the other pool member?
Thanks in advance,
Bob James
- mikeshimkus_111Historic F5 AccountHi Bob, I'm not sure why the clients would do that-especially since we are only using TCP monitors for those pool members. One would think clients trying to connect directly to the servers would be immediately reset.
- Robert_James_10NimbostratusHey Mike,
Well this one has 7 people stumped, including 3 F5 engineers.
I'm tired from taking traces all day.
The last test we ran was really bizzare; with a client on a PC with a sniffer on it and controlling the servers, we verified the client was talking to the vip and the vip only. We pulled the plug on the server nic and watched the F5's know it's down, the http session for the client timed out right away but the sip-tls session stayed open a lot longer, about 2 minutes the client reconnected to the still up server. Now the strange part, when the old server was brought up, the client reconnected to it, but the sniffer trace showed the client was talking along to the never down server, then it went right out did a DNS lookup for the now brought up server (not the pool name)' it then connected directly to the newly up server bypassing the F5 all together.
So far we have verified dns, srv records, configs, etc. To me it looks like the client is somehow learning the individual server ip's and going direct to them, but we don't know how they are learning this. We have a ticket opened with MS, one with F5, and have had many guru's look at this. We are all stumped -stupid Lync :D
At this point there are nly two questions we really want to know;
What is the time it takes a client to reconnect through the F5's in a good setup (front end pool)?
And should the client ever have an individual host name in their Lync local file (client)?
Cheers,
Bob James
- mikeshimkus_111Historic F5 AccountIt's my understanding that Lync itself provides the name of the Front End server to the client. The client has a preferred pool and servers within that pool, which may explain why the client connected to the downed server when it came back up. I am not sure how often Lync client requeries DNS for the SRV record that would direct it to the VIP, but it's something the MS engineers should know.
- mikeshimkus_111Historic F5 AccountRobert, can you please PM me with your F5 case number, so I can track this and work with the engineers if necessary?
- Ryan_Korock_46Historic F5 AccountRobert... By MSFT design, after a Lync client gets load balanced to a Lync front end, the client recieves a list of front end pool members and can (and eventually will) connect to these servers directly. Once they connect to the servers directly, BIG-IP loses the ability to manage the connection. BIG-IP has functionality designed to reconnect the client to a valid server upon the original servers failure, however when the Lync client routes around the BIG-IP, you lose the ability to take advantage of this functionality. If at this point a server goes down, it is up to the Lync client to initiate a reconnection on it's own.
- Robert_James_10NimbostratusRyan, this is what we found as well. This issue (or at least one of them) is the web services are going through the VIP and the client is talking to the server. When the server is powered off, the client gets RST from the LTM, but they don't seem to get killed from the Lync server, or at least it seemes they wait an extremely long time before finally doing a DNS lookup for the other pool members.
Ps
Is there a work around to send it back to F5, do they present names to the client, if so can we use host files on the client pointing the host names to the VIP?
- Ryan_Korock_46Historic F5 AccountBob, I understand your frustration. Believe me, I feel it too. Unfortunately, Microsoft has not given me any solution that would force the Lync clients to continue to use the VIP. I don't believe there is a way to safely do this.
- Robert_James_10NimbostratusFunny, we have had many, many "experts" tell us how they think things work; even Microsoft, but actually none of them really know. How about putting the Lync FE Pool members behind the F5's and no SNAT if all traffic has to pass through the LTM's then I would think it may help the LTM's kill the connection. We also thought about 2 1 member pools, at least that way there is settings for timeout in Lync, but I think it's the 3 polls of a dead server before client switchover that's killing us.
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com