This is probably going to be another odd-ball post but there is a good reason for it. We have a site with a recently discovered bug that sends our app servers into full garbage collection mode for very, very long periods of time when specific customer conditions are met. While reproducing the problem and upon further investigation, we've learned that after a period of 5 minutes (and Yes, people seem to be waiting that long on this site for a response), the bigip issues a connection reset and then the browser retransmits the POST. 5 minutes later, another reset followed by in some cases, another retransmit. This appears to be standard browser behavior. I don't want to disable keep-alives wholesale--I am wondering if there is a way to disable keep-alives on POSTs using an irule? But I'm more concerned whether this is something I should not even think about attempting.

I thought most browsers would show 'page cannot be displayed' or some other error when they receive a RST. I don't think there is an automatic retry. What is the criteria you'd want to use for disabling keep-alives? I would guess this wouldn't work as expected because the client would establish a new TCP connection to make a new request after receiving a RST on a past request. Can you explain more on what you think the problem is and how you'd like to try to fix it? Thanks, Aaron

I've only been able to so far find two links that describe the same symptom--however I'm seeing it not only in IE6 but IE7 and Firefox 3.0.5: http://www.experts-exchange.com/Software/Internet_Email/Web_Browsers/Q_20915496.html http://www.coderanch.com/t/68471/BEAWeblogic/Resends-same-request-after-minutes

What web/app servers are you using? Are you load balancing the client to web connections as well as the web to app connections using two separate VIPs? Is the client retrying the request or is the application automatically retrying the request? The second link you provided suggests the app is resending the request (not the browser). They suggest a solution for WebLogic: By default, WLIOTimeoutSecs for the WebLogic Plugin (which you are using for IPlanet) is configured to 300 seconds. After 300 seconds (5 minutes) the plugin will interpret the request as being "hung" and try to send another. This is what is causing your multiple requests. You options are... 1) Up the WLIOTimeoutSecs value. 2) Change the paradigm you are using so that a request doesn't take 5 minutes. This is generally a bad user experience anyways. For example, you could execute the logic asynchronously and return immediately to the user with a message to either check back later for results or email them with a link to the results. I'd suggest you try to determine exactly what the failure is before trying to solve it with an iRule or changes to the application. You can use a browser plugin like HttpFox for FF or Fiddler for IE to see what the client is sending. You can use an iRule to log request/response headers (Click here). It might also help to enable debug on the application and check the logs there as well. If the systems are in production, you might want to enable this logging during a maintenance window. Aaron

BigIP>>Two Apache Servers >> mod_jk >> 5 JBoss Application Servers We did use HTTPFox and HTTPWatch under Firefox and Internet Explorer respectively and don't see the retransmit show up in either utility. However, the tcpdump between browser and BigIP shows a retransmit and the mod_jk logs show that a second POST to the same resource was performed even though I did not refresh the browser or resubmit the transaction manually. HttpFox shows the intial POST but then nothing until 10 minutes later when the connection is reset the second time. An irule logging Request and Response headers isn't going to help because there is no response. The app server is hung. The request header for the first post is: POST /xxxx/xx/createDocLink?templateId=6005F8056AA0EB1CE0409E0AE8125ECE&visibility=Everyone HTTP/1.1 Host: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: Cookie: JSESSIONID= Content-Type: multipart/form-data; boundary=---------------------------41184676334 Content-Length: 282 In my trace, this POST occurs at 16:33:34.322991. At 16:38:34.790165, the BigIP issues the connection reset. At 16:38:34:951729, the browser sends the POST again. At 16:43:35.189957, the BigIP issues another connection reset. **At no time is a response other than a connection reset ever received** The Request Headers for the retransmitted POST are: POST /xxxx/xx/createDocLink?templateId=6005F8056AA0EB1CE0409E0AE8125ECE&visibility=Everyone HTTP/1.1 Host: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: Cookie: Content-Type: multipart/form-data; boundary=---------------------------41184676334 Content-Length: 282 Thanks for taking the time to think about this!

If this helps out at all, I've gleaned this from the rfc for http 1.1: 8.2.4 Client Behavior if Server Prematurely Closes Connection If an HTTP/1.1 client sends a request which includes a request body, but which does not include an Expect request-header field with the "100-continue" expectation, and if the client is not directly connected to an HTTP/1.1 origin server, and if the client sees the connection close before receiving any status from the server, the client SHOULD retry the request. If the client does retry this request, it MAY use the following "binary exponential backoff" algorithm to be assured of obtaining a reliable response: 1. Initiate a new connection to the server 2. Transmit the request-headers 3. Initialize a variable R to the estimated round-trip time to the server (e.g., based on the time it took to establish the connection), or to a constant value of 5 seconds if the round- trip time is not available. 4. Compute T = R * (2**N), where N is the number of previous retries of this request. 5. Wait either for an error response from the server, or for T seconds (whichever comes first) 6. If no error response is received, after T seconds transmit the body of the request. 7. If client sees that the connection is closed prematurely, repeat from step 1 until the request is accepted, an error response is received, or the user becomes impatient and terminates the retry process. What I'm trying to figure out is if there is a way to prevent the client from retrying the request for POST methods.

Selectively Disabling Keep-Alives

11 Replies

hoolio
Cirrostratus
Jan 14, 2009
I thought most browsers would show 'page cannot be displayed' or some other error when they receive a RST. I don't think there is an automatic retry.

What is the criteria you'd want to use for disabling keep-alives? I would guess this wouldn't work as expected because the client would establish a new TCP connection to make a new request after receiving a RST on a past request.

Can you explain more on what you think the problem is and how you'd like to try to fix it?

Thanks,

Aaron
mahnsc
Nimbostratus
Jan 14, 2009
I've only been able to so far find two links that describe the same symptom--however I'm seeing it not only in IE6 but IE7 and Firefox 3.0.5:

http://www.experts-exchange.com/Software/Internet_Email/Web_Browsers/Q_20915496.html

http://www.coderanch.com/t/68471/BEAWeblogic/Resends-same-request-after-minutes
hoolio
Cirrostratus
Jan 14, 2009
What web/app servers are you using? Are you load balancing the client to web connections as well as the web to app connections using two separate VIPs?

Is the client retrying the request or is the application automatically retrying the request? The second link you provided suggests the app is resending the request (not the browser). They suggest a solution for WebLogic:

By default, WLIOTimeoutSecs for the WebLogic Plugin (which you are using for IPlanet) is configured to 300 seconds. After 300 seconds (5 minutes) the plugin will interpret the request as being "hung" and try to send another. This is what is causing your multiple requests.

You options are...

1) Up the WLIOTimeoutSecs value.

2) Change the paradigm you are using so that a request doesn't take 5 minutes. This is generally a bad user experience anyways. For example, you could execute the logic asynchronously and return immediately to the user with a message to either check back later for results or email them with a link to the results.

I'd suggest you try to determine exactly what the failure is before trying to solve it with an iRule or changes to the application. You can use a browser plugin like HttpFox for FF or Fiddler for IE to see what the client is sending. You can use an iRule to log request/response headers (Click here). It might also help to enable debug on the application and check the logs there as well. If the systems are in production, you might want to enable this logging during a maintenance window.

Aaron
mahnsc
Nimbostratus
Jan 14, 2009
BigIP>>Two Apache Servers >> mod_jk >> 5 JBoss Application Servers

We did use HTTPFox and HTTPWatch under Firefox and Internet Explorer respectively and don't see the retransmit show up in either utility. However, the tcpdump between browser and BigIP shows a retransmit and the mod_jk logs show that a second POST to the same resource was performed even though I did not refresh the browser or resubmit the transaction manually. HttpFox shows the intial POST but then nothing until 10 minutes later when the connection is reset the second time.

An irule logging Request and Response headers isn't going to help because there is no response. The app server is hung. The request header for the first post is:

POST /xxxx/xx/createDocLink?templateId=6005F8056AA0EB1CE0409E0AE8125ECE&visibility=Everyone HTTP/1.1

Host:

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729)

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

Referer:

Cookie: JSESSIONID=

Content-Type: multipart/form-data; boundary=---------------------------41184676334

Content-Length: 282

In my trace, this POST occurs at 16:33:34.322991. At 16:38:34.790165, the BigIP issues the connection reset. At 16:38:34:951729, the browser sends the POST again. At 16:43:35.189957, the BigIP issues another connection reset. **At no time is a response other than a connection reset ever received**

The Request Headers for the retransmitted POST are:

POST /xxxx/xx/createDocLink?templateId=6005F8056AA0EB1CE0409E0AE8125ECE&visibility=Everyone HTTP/1.1

Host:

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729)

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

Referer:

Cookie:

Content-Type: multipart/form-data; boundary=---------------------------41184676334

Content-Length: 282

Thanks for taking the time to think about this!
mahnsc
Nimbostratus
Jan 14, 2009
If this helps out at all, I've gleaned this from the rfc for http 1.1:

8.2.4 Client Behavior if Server Prematurely Closes Connection

If an HTTP/1.1 client sends a request which includes a request body, but which does not include an Expect request-header field with the "100-continue" expectation, and if the client is not directly connected to an HTTP/1.1 origin server, and if the client sees the connection close before receiving any status from the server, the client SHOULD retry the request. If the client does retry this request, it MAY use the following "binary exponential backoff" algorithm to be assured of obtaining a reliable response:

1. Initiate a new connection to the server

2. Transmit the request-headers

3. Initialize a variable R to the estimated round-trip time to the

server (e.g., based on the time it took to establish the

connection), or to a constant value of 5 seconds if the round-

trip time is not available.

4. Compute T = R * (2**N), where N is the number of previous

retries of this request.

5. Wait either for an error response from the server, or for T

seconds (whichever comes first)

6. If no error response is received, after T seconds transmit the

body of the request.

7. If client sees that the connection is closed prematurely,

repeat from step 1 until the request is accepted, an error

response is received, or the user becomes impatient and

terminates the retry process.

What I'm trying to figure out is if there is a way to prevent the client from retrying the request for POST methods.
mahnsc
Nimbostratus
Jan 14, 2009
So the subject of this post should probably be renamed to "Selectively Disabling Retransmissions"
hoolio
Cirrostratus
Jan 14, 2009
Interesting... I hadn't read that part of the RFC before. I'm surprised that neither HttpFox or Fiddler would show the retry. Are you running tcpdump on the client or LTM? Do you see the retried request on the client VLAN? If you run tcpdump on the client using a tool like Wireshark, do you actually see the request being sent from the client?

Another thing to check is if you have the pool's 'Action on Service Down' set to Reselect. I'm not sure whether this action would be taken if the pool member was marked down by a monitor or if it's also used when the pool member takes longer than the idle timeout to respond. If the latter was true and you had Reselect enabled, LTM would resend the request to a new pool member.

I still don't remember ever seeing a browser automatically retry a request after receiving a RST. If that is indeed what is happening here, is the problem then that the server takes more than the LTM's idle timeout (default of 5min) to start sending the response? So the client retried the request and it eats up server resources? If so, can you test by existing the idle timeout on the client TCP profile to a bit longer than the time you expect the server to take to respond? Tweaking the idle timeout might be more ideal than blocking the client's subsequent retries. If modifying the TCP profile fixes the problem, you could consider using an iRule to dynamically extend the idle timeout for these specific types of requests. You can do this using IP::idle_timeout:

http://devcentral.f5.com/wiki/default.aspx/iRules/ip__idle_timeout when HTTP_REQUEST { if {$some_condition == 1}{ log local0. "original timeout: [IP::idle_timeout]" IP::idle_timeout 1801 log local0. "updated timeout: [IP::idle_timeout]" } } when SERVER_CONNECTED { log local0. "original timeout: [IP::idle_timeout]" if {$update_serverside_idle_timeout}{ IP::idle_timeout 1802 log local0. "updated timeout: [IP::idle_timeout]" } }

Aaron
mahnsc
Nimbostratus
Jan 14, 2009
tcpdump was run on the LTM. we were sniffing traffic between "everything" and the LTM as well as traffic between the LTM and the web servers. "Everything" is in quotes because we were testing against an identically configured stack as our production site. (Same LTM. Different web servers and app servers--although the servers in the lab are a bit beefier than production. We have a swing environment set up of two identical silos of servers. Identical numbers of web servers, app servers, database servers, etc. and we use shell scripts on the LTM to detach and attach pools to point traffic to a particular silo but that's another long story)

At the same time we ran the trace, the mod_jk log level on the web servers was set to 'debug'. While my colleague was running the tcpdump, I was staring at httpfox on my local machine as well as tailing the logs on the web servers so I neglected to run wireshark locally as well.

On our test bed, when the problem occurs, the transaction never completes. The nature of this particular bug places so much data into the jvm's memory that the jvm may very well perform full garbage collections forever. In production, due to the volumes of traffic we have, this bug ultimately results in the web servers hitting MaxClients and no longer being able to service any kind of request. The weird thing about this particular problem, which is outside the scope of this particular thread but is in-scope for a ticket with JBoss, is that even though only 1 of the 6 app servers is in Full GC mode, threads on all application servers quickly pile up until we're out of threads on all app servers, the web servers are at MaxClients, and users are down. It's a nasty bug to say the least!
mahnsc
Nimbostratus
Jan 14, 2009
Oh, one other thing. Since we're seeing the browser retransmit these POSTs after a period of time, we're thinking that the retransmission of these is hastening the demise of the site. Not the root cause but if you have several hundred users retransmitting every 5 minutes, we're running out of threads faster than we can assemble the right people on a call to deal with it.
hoolio
Cirrostratus
Jan 14, 2009
Do you have any way to differentiate between the initial request and the second request? I suppose you could track POST requests by client IP and maybe user-agent plus the URI using the session table. You could remove the session table entry for that client if/when the response is received. If the client already has an unanswered request pending and sends another one from the same IP with the same user-agent to the same URI, you could send back a 503 or some other response. You would want to set a timeout when adding the session table entry to that the client would be able to make another request after the timeout expired.

Else, would extending the idle timeout on earlier requests help give the server more time to answer those requests and prevent the client from retrying it over and over again?

Aaron

Forum Discussion

Selectively Disabling Keep-Alives

11 Replies

F5 Container Ingress Services (CIS) and using k8s traffic policies to send traffic directly to pods

F5 Architecture Track Sessions - AppWorld 2026

Recent Discussions

F5 Software Downgrade from version 17.x.x to 15.x.x

Error diskmonitor

CPU utilization of F5OS on r2600

sslprovide (--f5 ssl) does not generate CLIENT/SERVER_TRAFFIC_SECRET on server-side TLS traffic

Hardware replacement i2800 to r2600

Related Content

Do Keep Alives renew Source Persistence table entry

Disable below cipher

Disabling Weak Ciphers

Selective SNAT

iRules: Disabling Event Processing

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS