Technical Forum
Ask questions. Discover Answers.
Showing results for 
Search instead for 
Did you mean: 

What is the 'correct' way to retry a request to each pool member in result of 404?


I am attempting to modify this f5 example so that requests to a virtual server that result in a 404 response are retried on every member node.


My setup:

Virtual server with supplied iRule applied to it. Pool associated to the virtual server has 2 member nodes and a load balance method of 'Round Robin'. One member node is 'good' (will respond with a 200 on test request) and the other member node is 'bad' (will respond with 404).


Here are the relevant logs from the iRule when I issued a single request from my computer to the virtual server.


696168:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <HTTP_REQUEST>: Saving HTTP request headers: GET /images/cool-image.svg HTTP/1.1 Host: client_ip User-Agent: curl/7.47.0 Accept: */* X-Forwarded-For: 696169:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen ip_bad_server on retry 0 696170:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <HTTP_RESPONSE>: server ip_bad_server returned 404. Retry 0 out of 2 696171:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen ip_good_server on retry 1 696172:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen after reselect on retry 1 696173:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen ip_bad_server on retry 1 696174:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen after reselect on retry 1 696176:Nov 4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <HTTP_RESPONSE>: server returned 404. Retry 1 out of 2


Here is the iRule that generated these logs

# Retry requests to the virtual server's default pool if the server responds with a 404 when CLIENT_ACCEPTED {   # On each new TCP connection track that we have not retried a request yet   set retries 0   # Save the name of the virtual server default pool   set default_pool [LB::server pool] } when HTTP_REQUEST {   # We only want to retry GET requests to avoid having to collect POST payloads   # Only save the request headers if this is not a retried request   if { [HTTP::method] eq "GET" && $retries == 0 }{      set request_headers [HTTP::request]      log local0. "Saving HTTP request headers: $request_headers"   } } when LB_SELECTED {   # Select a new pool member from the VS default pool if we are retrying this request   log local0. "server chosen [LB::server addr] on retry $retries"   if { $retries > 0 } {      LB::reselect pool $default_pool      log local0. "server chosen [LB::server addr] after reselect on retry $retries"   } } when HTTP_RESPONSE {   # Check for server errors   log local0. "server [LB::server addr] returned [HTTP::status]. Retry $retries out of [active_members $default_pool]"   if { [HTTP::status] == "404" } {      # Server error, retry the request if we have not already retried more times than there are pool members      incr retries      if { $retries < [active_members $default_pool] } {         # Retry this request         HTTP::retry $request_headers         # Exit this event from this iRule so we do not reset retries to 0         return      }   }   # If we are still in the rule we are not retrying this request   set retries 0 }


So the order of events that appears to be happening.

1) We get a request and save the headers

2) A server is chosen based on LB algorithm (in this case the bad server is chosen)

3) We detect a 404 in HTTP_RESPONSE event and invoke HTTP::retry

4) We enter LB_SELECTED again due to HTTP::retry triggering HTTP_REQUEST and all subsequent events. And you can see we have selected the good server

5) Because the variable retires is greater than 1 we invoke LB::reselect. Notice after LB::reselect we do no log the selected servers address ([LB::server addr]), I am guessing this is due to LB::reselect not happening instantly and likely clearing properties in the LB object.

6) LB_SELECTED is entered again and this time logging shows we now have the bad server again.

7) the retries variable is still greater than 0 so we at least go into the conditional block that would invoke LB::selected. We loose logging suggesting that command was ran.

😎 We got another 404, increment the retries variable and then no more retries (does not pass conditional).


So in my test situation we end up with a case where all requests return a 404 because we keep ending up on the bad server. This seems due to both HTTP::retry causing a new node to be reselected based on LB algorithm and then explicitly telling f5 to reselect again based on the LB algorithm. If that is the case and with Round Robin being the default LB algorithm the example seems to mislead, as I expected this iRule to "try all members in a pool". In testing I can achieve this by just removing the LB::reselect command. But then I am confused as to why that is even in the irule for HTTP::retry. Also relying on the LB algorithm leads to an edge case where an unrelated request would 'increment' the LB algorithm and then I could 'miss' the 'good' server by just blindly retrying.


Also strange is the loss of the server addr. And that LB::reselect triggers the event LB_SELECTED that seemingly triggers LB::reselect but this does not result in an infinite loop.


I am looking for some validation on my assumptions and understanding of the order of events. And suggestions on the correct way to guarantee all member nodes are retried. Right now the best solution I have is blindly issue retries and set the number of retries I will have high enough that I will likely find the right server.