Forum Discussion

Xhuxhu's avatar
Xhuxhu
Icon for Nimbostratus rankNimbostratus
Jan 12, 2021

Zero downtime deployment with f5 GTM+LTM

We have a GTM+LTM set up for our application which is running on 12 servers. This servers are separated in 4 LTMs with 3 servers each with monitoring set to a static page and "Action on service down" set to "None". We want to have a zero downtime deployment set up and currently we do it like this:

 

  • Mark half of the servers as down (results in 2 LTM having only down servers) but keep them running for as long as we can detect running requests
  • Deploy to this half and mark them as up
  • Mark the second half as down and deploy

 

Even after all of this some of our users are complaining about dropped requests when we deploy. From the logs i see that the requests are being dropped immediately after we mark the servers as down even though they are still running and" Action on service down" is set to None.

 

So my question is might this be related with GTM marking the whole LTM pool as down and dropping all the running requests?

 

  • GTM does also play a role here, I'd say before you mark down the respective ltm's servers, mark the GTM Pool member as disabled too. In that way the DNS resolution goes to other VIP's. Also how's your GTM setup configured here, RR ? How many GTM pools ? etc etc.

     

    Also before you mark start your deploy, always check if there's any existing connections coming to the VIP.

    Because they have to tcp time out before you start you deploy. Else when the subsequent request comes to the pool member, its obvious it would get dropped & you'll see client impact.

     

    Hope these helps.

  • GTM does also play a role here, I'd say before you mark down the respective ltm's servers, mark the GTM Pool member as disabled too. In that way the DNS resolution goes to other VIP's. Also how's your GTM setup configured here, RR ? How many GTM pools ? etc etc.

     

    Also before you mark start your deploy, always check if there's any existing connections coming to the VIP.

    Because they have to tcp time out before you start you deploy. Else when the subsequent request comes to the pool member, its obvious it would get dropped & you'll see client impact.

     

    Hope these helps.

    • Xhuxhu's avatar
      Xhuxhu
      Icon for Nimbostratus rankNimbostratus

      Thanks a lot for the answer. The setup is one GTM pool with 4 ltms, and yes it is RR. We do wait 5 min before deploying and after marking them as down. I believe the issue here might be with the TCP timeout or the DNS resolver cache. That would explain the connection dropping since they would still go to the same LTM even after marking all the servers inside as down. I'll try to always keep at least one server in the LTM pool as up and see if it fixes it.

      • If I were you, I'll be disabling the VIP's on the GTM pool 1st.

        Then I'll just disable the pool members or even force disable them in the respective 2 LTM's. Watch for the connections to get timed out. Because any existing connections would still go through, thereby no outage is caused. Once all traffic is drained, will let the deployment team to start deploying their code.

         

        Once code is deployed, then enable them back on the LTM, have the servers tested though this VIP directly to see everything is working fine. Then turn it on the GTM pool, so its customer facing.

         

        The practice to follow traffic drain depends from application to application. Some application could be heavily used, so traffic drain takes even hours.

         

        Hope this helps too.