Forum Discussion

dustyob_116297's avatar
dustyob_116297
Icon for Nimbostratus rankNimbostratus
May 16, 2013

Dynamic throttling based on target system CPU load

Hi, I've tried searching, but haven't found any truly telling results. I'm not an F5 expert, and so I'm not sure if this is the right forum. Please redirect me if not.

 

 

I would like to be able to throttle connections to proxied/load-balanced systems based on the load indicated by those systems. Imagine the F5 pinging a web-service on each of the load-balanced systems for CPU and I/O metrics, and if any of them look bad, to start rate-limiting that machine, or even the entire cluster. (I know this isn't the ideal user experience for those whom are throttled, but it's better than the systems failing and impacting all users).

 

 

Does anyone have any pointers where I can look?

 

 

Thanks!

 

Dusty

 

 

9 Replies

  • There's a few ways you could do this (an iRule with Sideband connections, an external monitor perhaps?) but the simplest would be to have the server/hosts make the decision on load and simply report whether it's acceptable or not. You could then use a monitor to probe the page for some specific text that's only present when the load is high and mark the pool member as down. As long as you are using the default Action on Service Down setting as None existing connections remain but no new ones are established to that host/service. When it's load reduces, the page text changes again and it's marked up. You can use the Pool Slow Ramp Time and health monitor Time Until Up settings to ensure the host isn't hammered when it returns to the Pool and that it's genuinely healthy again.

     

     

    It might not read like it but this is infinitely better than using an iRule to control load balancing. If the hosts can't do the necessary an external monitor and related script that can is still preferable to the iRule.
  • just wondering if dynamic ratio load balancing is applicable.

     

     

    Configuring Dynamic Ratio Load Balancing

     

    http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ltm_configuration_guide_10_1/ltm_appendixb_dynamic_ratio_lb.html
  • Hi What Lies Beneath -- the case I'm referring to is when an entire cluster is busy -- that I start throttling the whole cluster. Balancing between machines is working reasonably well in my case. Do the solutions you mention address that? They seemed more like ways to distribute load to an appropriately sized cluster. What's going on here is that sometimes there are "bad guys" that flood us with traffic that they haven't asked us to size for -- often due to configuration issues on their side etc.

     

     

    Here's the trick:

     

    - I'd like to change the throttle dynamically with time -- the simplest way to understand is that the backend system is doing more than real-time processing (e.g. some batch jobs etc), and when those things are loading up the system (which doesn't occur on a regular schedule), I'd like to be able to start reducing the amount of traffic allowed to hit the service

     

    - Ideally I'd allow sticky sessions to continue using the service, so I'd throttle sessions rather than connections. But if connections is all I can accomplish, that would still be better than what I have.

     

    - This is a multi-tenant system, and so even more ideally, I'd love to track the tenant's usage patterns, and when one tenant is being a "bad guy" to throttle him more than others

     

     

    Thanks,

     

    Dusty

     

  • Thanks nitass, I read the link you sent as well as did some digging and found this link -- which actually has an interesting formula on how the performance indicator is calculated:

     

    http://support.f5.com/kb/en-us/solu...l9125.html

     

     

    These talk about load balancing within a pool/cluster, but not necessarily about throttling back the entire pool/cluster as I mention above. Do you think this would help address throttling an entire pool/cluster?

     

     

    Thanks,

     

    Dusty

     

  • Do you think this would help address throttling an entire pool/cluster?as i understand, ratio is on node (server) level.
  • No, what I suggested doesn't take into account all your requirements. However, if you'd like to avoid writing a complex iRule I'd still suggest querying a status page generated elsewhere.

     

     

    An iRule using iStats (to count connections from each tenant) and sideband connections (to read in CPU load etc.) is an option but I'm afraid it's a big job that I don't have time for. Can you perhaps make use of F5 Services or a consultancy or similar to help you here?
  • Thanks all -- here's another pass at this. Let's suppose I had a page on each server that could indicate if it was "too busy". And I used an external monitor or sideband connection to read that page, and indicate if the node is too busy. Now, after a point (especially when we're talking about unexpected load), all the nodes will indicate they are too busy.

     

     

    Once that happens -- all nodes in a cluster reporting too busy:

     

    a) would the F5 automatically start throttling? i doubt it

     

    b) how/where could I configure the F5 to start throttling?

     

     

    BTW, I'm not the guy who would be implementing this. But we have some folks in house that would be. They can reach out to professional services if they can't do it themselves. But I'm hoping to steer them in a good direction.

     

     

    THANKS!
  • Let's suppose I had a page on each server that could indicate if it was "too busy". And I used an external monitor or sideband connection to read that page, and indicate if the node is too busy. Now, after a point (especially when we're talking about unexpected load), all the nodes will indicate they are too busy. i understand normally health monitor does not change load balancing ratio (except dynamic ratio load balancing). it marks pool member up or down.
  • Hendry_Chandra_'s avatar
    Hendry_Chandra_
    Historic F5 Account

    Hi All,

     

    I have similar requirements, but instead of CPU load, I need to adjust the LB ratio based on other SNMP OID (namely current bandwidth utilization of router). Does anyone know whether the snmp_dca or the snmp_dca_base can be tweak-ed to fulfill the requirements I have? i.e. modify the OID and also the formula for the dynamic ratio value?

     

    Thanks.