Zero downtime deployment with f5 GTM+LTM
We have a GTM+LTM set up for our application which is running on 12 servers. This servers are separated in 4 LTMs with 3 servers each with monitoring set to a static page and "Action on service down" set to "None". We want to have a zero downtime deployment set up and currently we do it like this:
- Mark half of the servers as down (results in 2 LTM having only down servers) but keep them running for as long as we can detect running requests
- Deploy to this half and mark them as up
- Mark the second half as down and deploy
Even after all of this some of our users are complaining about dropped requests when we deploy. From the logs i see that the requests are being dropped immediately after we mark the servers as down even though they are still running and" Action on service down" is set to None.
So my question is might this be related with GTM marking the whole LTM pool as down and dropping all the running requests?
GTM does also play a role here, I'd say before you mark down the respective ltm's servers, mark the GTM Pool member as disabled too. In that way the DNS resolution goes to other VIP's. Also how's your GTM setup configured here, RR ? How many GTM pools ? etc etc.
Also before you mark start your deploy, always check if there's any existing connections coming to the VIP.
Because they have to tcp time out before you start you deploy. Else when the subsequent request comes to the pool member, its obvious it would get dropped & you'll see client impact.
Hope these helps.