Forum Discussion

Funkdaddy's avatar
Funkdaddy
Icon for Nimbostratus rankNimbostratus
Apr 15, 2020

Pool setting: "Action On Service Down" set to "Reject" for loaded Pool - bad?

We have a Pool (HTTP) that hosts thousands of connections at any given moment, and the pool has "Action On Service Down" (link to F5 Article below) set to "Reject". We've had a few incidents where we've had most of a pool was impacted after a single server had an issue, and (I cannot be sure but) I believe some of the problem is that once the first pool member has gone down, the RST's sent back to the clients caused them all to re-send their requests all at the same time and begin to overwhelm other servers in the pool.

 

This config has been around since it was the default (v4???) and I'm wondering if it's time for a change.

 

I notice the default in the recent versions for "Action On Service Now" is "None", and I'm wondering if that might be a better choice for our situation - possibly allowing the servers to recover. Our loads are very "peaky" and perhaps we would be better served in giving them more time to recover from a barrage of requests before just sending them to another server.

 

So questions:

Am I characterizing this correctly and do my assumptions seem sound?

What drawbacks to using "None" over using "Reject"?

How do I know how long a request or connection will remain on the DOWN-ed server if it doesn't come back up? Is that a TCP timeout?

Anything else I'm missing?

 

Thanks,

-Funkdaddy

 

https://devcentral.f5.com/s/articles/ltm-action-on-service-down

2 Replies

  • > Am I characterizing this correctly and do my assumptions seem sound?

     

    Pretty much.

     

    > What drawbacks to using "None" over using "Reject"?

     

    None allows the chance that the pool member may still service the connection even though the monitor is down. This may work. It may not. None will be slower than Reset.

     

    > How do I know how long a request or connection will remain on the DOWN-ed server if it doesn't come back up? Is that a TCP timeout?

     

    It will be the TCP Retransmission Timeout, not the TCP Idle Timeout

     

    > Anything else I'm missing?

     

    Not really. Just don't use Reselect unless you are loadbalancing stateless routers.

  • Thank you, Simon!

     

    Is TCP Transmission Timeout the same as "Initial Retransmission Timeout Base Multiplier for SYN Retransmission" on the TCP Profile?