F5 and a pool of webservers - what settings/algorithm to use
We have a pool with a set of IIS based webserver nodes behind our F5 LTM (BIG-IP 9.4.5 Build 1049.10 Final).
The F5 terminates SSL. The VIP is set with a customized profile based on http profile.
We use the least connections algorithms to distribute the http requests from users browsers to the webservers.
We thought that this would be enough to protect 'high request executing' problems on individual webservers.
We have ~20 concurrent requests executing on a webserver during normal hours.
Looking at the F5, there are ~325 active connections for each webserver in the pool. This also matches the number when I run a netstat on the webserver. There are ~325 connections to the client browsers (we dont use SNAT), and ~94 more connections to the F5 itself (all in TIME_WAIT).
Before we saw this, we were under the impression that the connections correspond to the number of requests our webserver is actually processing at a time, but seems we we understood incorrectly.
So when a webserver goes bad for any reason, the number of concurrent requests executing on the server increases to say 100, it seems like the F5 still uses the least connections which it keeps (~325?) and keeps sending requests normally to the webserver whose threads pile up quickly and finally we have to take the webserver out of the pool.
Normally, we would like the F5 to stop sending requests based on the concurrent requests on our webserver since the situation would self correct (since the slowness is due to some external resource e.g. cache/DB etc., which eventually frees up as long as we dont add too many threads).
We thought of the following options
1. Using Observed algorithm, so that the F5 can route by both connections open and speed (which would definitely get slower on the effected webserver)
2. Use a dynamic ratio with a custom WMI monitor for requests executing on webserver.
3. Reduce some TCP timeout setting on the F5 so that the number of connections match the actual requests executing. I am guessing this will come at a cost.
2. is not something we want to do due now to the work involved and dependency on the webserver WMI.
Any advice on this would be appreciated.