Forum Discussion

auyantepui_1695's avatar
auyantepui_1695
Icon for Nimbostratus rankNimbostratus
Sep 09, 2014

Optimal timeout and interval settings for Apache health monitor

We have a farm of Apache 2.2 webservers being load balanced by a F5. Currently timeout is set to 30 seconds -believe it's the default value-.

 

Based on this configuration; if a specific webserver has an outage it could take up to 30 seconds for the F5 to identify the server is not available and stop sending requests to that pool member. During that interval end users sent to that web server will received an error.

 

Are there recommended configurations to define for timeout and interval in the health monitor to avoid downtime in a highly available environment?

 

Appreciate any feedback.

 

regards,

 

  • BinaryCanary_19's avatar
    BinaryCanary_19
    Historic F5 Account

    I don't think there's anything better than lowering those values at the moment. interval of 5 seconds and timeout of 16 seconds is the default for most monitors I think, and assuming your servers can serve up responses fast enough, and you don't have too many (many thousands) of monitors, you can probably get away with interval=2, timeout=7. Keep in mind that if you set the values too low, the F5 may be able to process them in a timely manner, but you will suffer unexpected timeouts if for any reason the actual server itself was too busy to respond within the timeout.

     

    Basically, it's a double-edged sword. If you lower it, you get faster failure detection at the risk of increasing the fraction of false-negatives.

     

  • Thanks for the feedback. We were thinking something similar, where the F5 could identify a webserver being down within 5 seconds. Our current F5 administrator is stating that the default values are the recommended configuration.

     

    Is there some kind of guideline/whitepaper on how to define health monitors?

     

    Appreciate any additional information you can provide.

     

    Regards,

     

  • BinaryCanary_19's avatar
    BinaryCanary_19
    Historic F5 Account

    Some enterprising people have cooked up solutions based on this irule event https://devcentral.f5.com/wiki/iRules.lb_failed.ashx

     

    I won't be able to write such an irule for you, but the idea is, for your critical apps, you use that event to detect when a connection sent to a particular pool member fails, and then attempt to select a new member instead of propagating the failure back to the client. You would use some kind of counter to make sure you don't fall into the pit of infinite repetition (if all pool members are down, a naive irule will continue trying to select a new member indefinitely until a pool member is available, so you use a counter that limits reselection attempts to maybe 3, and then propagate the failure back to the client).

     

    If this is not too much work for your requirements, then enjoy the challenge.

     

  • Thanks both for your feedback. Based on the link to the documentation the recomemded settings are 5/16 for http monitors, not 30/91 as initially suggested.

     

    Regards,