Forum Discussion

Stephen_Kaiser's avatar
Stephen_Kaiser
Icon for Nimbostratus rankNimbostratus
Apr 19, 2021

vIP Queue not releasing connections

I've configured a vIP with a default pool, created an iRule that inspects the http:payload, and routed certain requests to a second pool based on the payload contents. This was working perfectly until I load tested it. I increased the synthetic traffic on the vIP until the requests in the second pool began queueing and then eventually 50% of requests in the queue timed out before being forwarded to a member. The load test worked very well and all requests were being handled correctly well past the point we normally see under real peak load conditions. However, when I removed the synthetic load we noticed that real web traffic routed to the second pool was timing out in the queue half of the time. This despite the fact that both member servers were green, healthy, and not under nearly enough load to justify queuing. Requests that weren't queued were handled quickly and immediately. Requests that were queued just waited in the queue until they timed out. It seems like there is some bug under the hood that I've stumbled across. Is this a known issue? Are there hotfixes for something like this? Using BIG-IP 13.1.3.4 Build 0.0.5

Recycling services on all client and member servers did not help. Forcing member nodes offline and disabling them did not clear up the issue. We could not see any evidence of health monitors failing, nor was there any reason why they would.

when HTTP_REQUEST {
  if {[HTTP::method] eq "POST"}{
    if {[HTTP::header "Content-Length"] ne "" && [HTTP::header "Content-Length"] <= 500}{
      set content_length [HTTP::header "Content-Length"]
    } else {
        set content_length 500
    }
    if { $content_length > 0} {
      HTTP::collect $content_length
    }
  }
}
when HTTP_REQUEST_DATA {
  if {[HTTP::payload] contains "DoLongSearch"}{
    pool website_search
  }
}

5 Replies

  • Did you check the F5 connection table during the issue to see that F5 does not time out the connections?

     

    https://support.f5.com/csp/article/K53851362

    https://support.f5.com/csp/article/K40033505

     

     

    Also I have seen issues where the F5 and servers have different Time_Wait configured and if you used something like netstat -an to see if there many connections is Time_Wait state on the servers, this could have been the issue.

     

    https://support.f5.com/csp/article/K14400019

  • I did not. I'll use those in my future testing, thank you. I'm attempting to reproduce the issue on our test environment but so far I've had no luck breaking it again in the same way.

     

    However, I would expect issues of that nature to cause problems with any pools targeting those servers, and also cause the health monitors to fail. But the issue only presented in the website_search pool, not the default_pool, and not in the health monitors. The default pool and the website_search pool dedicated to website searches point at the same server nodes. The only difference between the two pools is the connection limit on each member server and the queue timeouts.

    • default_pool:
      • Queue timeout: 30000ms
      • Queue depth: 3000
      • member_server_1:
        • Connection limit: 80
      • member_server_2:
        • Connection limit: 80
      •  
    • website_search:
      • Queue timeout: 15000ms
      • Queue depth: 3000
      • member_server_1:
        • Connection limit: 20
      • member_server_2:
        • Connection limit: 20
    • Nikoolayy1's avatar
      Nikoolayy1
      Icon for MVP rankMVP

      For issues like the F5 connectioon table not timing out or the Time_Wait only the data plane will be affected not the control plane ( health monitoring ) as they are a sepate thing.

       

       

      If the issue was with the connection limit, then you should be having non stop issues, also to protect the servers from connection exaustion attack the "Slow Ramp Time" is a better option.

       

       

      https://support.f5.com/csp/article/K14804

       

       

       

       

      You may also check irule if it is freeing the HTTP data with log local0 (it should do so during the "HTTP_REQUEST_DATA" event but just in case). Maybe adding HTTP::release at the end just to make certain that the collected data is released.

       

       

      https://clouddocs.f5.com/api/irules/HTTP__release.html

       

      https://support.f5.com/csp/article/K55131641

       

      https://clouddocs.f5.com/api/irules/HTTP_REQUEST_DATA.html

       

       

       

       

      Also you may check the bug tracker if there bugs matching you simptoms and version for "HTTP_REQUEST_DATA", "HTTP::payload" , "HTTP::collect" , etc.

       

       

      https://support.f5.com/csp/bug-tracker

       

  • Adding HTTP::release to the end of HTTP_REQUEST_DATA seems to have resolved the issue despite documentation making it seem it shouldn't be needed. Thank you!

    • Nikoolayy1's avatar
      Nikoolayy1
      Icon for MVP rankMVP

      Well it could be a bug with your version that without HTTP::release the irule does not release the traffic. Can you close the discussion as we managed to catch the issue?