Virtual sends [RST, ACK]

Hi,

we recently replaced Cisco CSS by F5 BigIP, but we know have a problem in our production environment under load which we did not noticed in out acceptance environment.

We have a virtual configured and on this virtual we have an irules which does pool selection based on the requested uri.

What I see in the tcpdumps I took, is that on an existing tcp session there are several GET and POST requests where I can see the identical GET or POST message is also send to a poolmember as you would expect. But sometimes on an open client tcp connection we are getting a http GET or POST request and I do not see the request being send to a poolmember. I do see the F5 is first responding with a ACK and than ~5 seconds later a [RST, ACK] is send back to the client terminating the TCP session.

The uri requested should match on of uri's in the irules and I don't have any poolmembers down messages in the ltm log. Some of the pools are used in combination with SNAT automap, the uri with problems are mostly for pools without snat.

I have seen a couple of post reporting the same behaviour, but haven't seen a solution. Does anybody know what can cause this behaviour and how it can be solved?

The problem looks a similar as posted at: http://devcentral.f5.com/Community/...ault.aspxv

config

design

13 Replies

hoolio
Cirrostratus
Jan 16, 2012
Hi Remco,

That poster resolve their issue by adding more IPs to a SNAT pool. Do you see any errors like inet port exhaustion in the /var/log/ltm file when the failures occur?

Aaron
Remco
Nimbostratus
Jan 16, 2012
Hi Aroon,

we are only using snat automap on a couple of pools, the http requests that are replied with the [RST, ACK] are for the pools without snat configured. I have also no messages at all in the /var/log/ltm.

We are using a ONECONNECT profile on this virtual otherwise the uri loadbalancing is not working correclty.

I will post our config tomorrow ( don't have access to the box at the moment)

Remco
nitass
Employee
Jan 17, 2012
is nothing related to this sol at all?

sol9812: Overview of BIG-IP TCP RST behavior

http://support.f5.com/kb/en-us/solutions/public/9000/800/sol9812.html
Remco
Nimbostratus
Jan 17, 2012
Hi Nitass,

I have also seen this SOL but I have not seen anthing in the log files that should suggest some of the reasons as explained in this SOL.

This is the relevant Virtual configuration:

virtual vip-p_ {

destination a.b.c.d:http

ip protocol tcp

rules irule-p__urllb

persist prof_pers-p_-itc

profiles {

prof_HTTP_generic {}

prof_ONECONNECT_generic {}

prof_TCP_pool-p_ {}

}

}

profile persist prof_pers-p_ {

defaults from cookie

mode cookie

timeout immediate

}

profile http prof_HTTP_generic {

defaults from http

}

profile oneconnect prof_ONECONNECT_generic {

defaults from oneconnect

source mask 255.255.255.255

}

profile tcp prof_TCP_pool-p_ {

defaults from tcp

idle timeout 160

}

rule irule-p__urllb {

when HTTP_REQUEST {

switch -glob [HTTP::uri] {

"/app1*" {

pool pool-p__app1

persist cookie insert 111

}

"/app2*" {

pool pool-p__app2

persist cookie insert 222

}

"/app3*" {

pool pool-p__app3

persist none

}

"/app4*" {

pool pool-p__app4

persist none

snat automap

}

default {

reject

}

}

}

}
nitass
Employee
Jan 17, 2012
did URL really match switch condition in the irule?
Remco
Nimbostratus
Jan 17, 2012
I am also looking into the possibility the reset was send because it did not match any of the uri's in the switch statement. I have checked the tcpdump and on a first glance I only see requests that should match, but I will have closer look.

Since the problem happened in the production environment the business decided to rollback to the CSS. We are now going to try to replicate the problem on our acceptance environment by increasing the test load. When this is ready I will add log statements in the default action to see if there are hits here.
Remco
Nimbostratus
Jan 24, 2012
Finally managed to replicate our problem in the acceptance environment.

What I have found that it looks like that once 'snat automap' is used for one of the pools in the irule all future request(in this tcp session) will also have snat enabled even for pools were this not configured.

Since there is a FW in between the F5 and the poolmembers, the tcp session where SNAT is incorrectly used are dropped by the FW, since it is only configured to allow traffic from specific clients and only the self-ips of the F5 (health monitors). After F5 unanswerred SYNC on the serverside the F5 is sending the [RST, ACK] on the client side.

My firs idea was to explicitly disable snat for all other pools by adding 'snat none' but this did not make a difference.

Does anybody have an idea how to limit SNAT to only be used in the pools were is it required?
Hamish
Cirrocumulus
Jan 24, 2012
I think you'd need to do an LB::reselect pool if the request wasn't the first one in the tcp connection... I'm not certain that it's defined what happens otherwise (In fact I'm surprised it doesn't just fail).

Oh.. And a 'snat none' to turn off any SNAT'ing that was performed previously.

H
Remco
Nimbostratus
Jan 24, 2012
Just tried another options and it solved the problems.

What I did is enable 'snat automap' on the virtual and in the irule either used 'snat none' or 'snat automap' for the different pools.

Strange things is that with snat automap enabled on the virtual I still needed to explicit activate is on the pools where snat is required.
Hamish
Cirrocumulus
Jan 24, 2012
The flag on the pool allows or disallows... Not enable or disable...

H