facing an interesting issue
version - BIG-IP 220.127.116.11 Build 0.0.8 Point Release 6
iapp > pool > i7 rule > if traffic matches i7 uri rules then it's redirect to "subpool"
"subpool" consists of 2 node/member f5 pool with 120 minute persistance, oneconnect multiplex and least connections lb with 2 linux virtual servers behind it - can provide more detail lmk what's required
issue is whenever one member server in the "subpool" is rebooted, often during slow periods during maintenance window (usually by automated process) application processing time on it goes from a few milliseconds to 2-3 seconds and it hardly gets any connections, while the server that has not been rebooted processes normally while taking most of the connections; the only workaround we have so far is rebooting both servers at once; the issue does not self-remediate with time, we waited about a day (24 hours) to see if it would, I have yet to try and modify the persistance profile/timeout etc since this is happening in a production environment and I can't change stuff like that on the fly while the issue is occuring
has anyone seen an issue like this? what's the best way of going about troubleshooting?
we suspect the persistance rule, aside from possibly deleting all the persistance records when the issue is happening how can I determine it's the cause or not?
HI @jonnyf5, the 120 minute persistence is very likely the culprit. When one of two servers goes down, the remaining one will take all connections while the other is out of service, and retain those connections once it's back as long as the clients are hitting that virtual within the 2hr persist timeout (which is not absolute, it resets on each connection).
I'd need to know more of the why of your scenario before making recommendations, but off the top of my head, you can likely use an iRule to track the two server LB::status state and then when there is a transition from false->true, you can issue a persist none on the one that was active solo.
Another approach you can explore is using the PERSIST_DOWN event to add any clients making a request to the server that's down that had a persistence entry to a table, and then you can on future requests do a table lookup against the client IP and if present, AND the server is back up, send them over.
sorry for my delayed reply, I didn't see your responses until today
Thanks much for the information and suggestions you provided so far; let me know what details you need to fully understand the issue and make further suggestions and I can provide them.
On a somewhat unrelated issue I keep trying to grab the statistics from the lb subpool from the ltm web gui and it just hangs, then displays a "disconnected" message, reconnects but the stats never display. Was trying to grab the statistics so I can see how the connection values change over time, especially righ after either server is rebooted.