cancel
Showing results for 
Search instead for 
Did you mean: 
Login & Join the DevCentral Connects Group to watch the Recorded LiveStream (May 12) on Basic iControl Security - show notes included.

WEBSERVICES becomes inaccessible when failover

f5mkuDefault
Cirrus
Cirrus

Hi experts,

 

I would like to get some help with this issue we've been trying to resolve that even F5 TAC is unable to find the root cause until now for several months already.

 

We have an F5 Big-IP which runs LTM and ASM. Each virtual server is assigned with a WAF policy.

Our big problem is, whenever we failover the active unit to standby unit all websites becomes inaccessible for more than an hour. Some comes up in 30 mins, most are within an hour and some after nearly 2 hours. It is also the same symptom when we fail back the active unit.

 

Anyone can advise what issue we could be facing here?

We are using version 12.1.3.3 Build 0.0.1 Point release 3

 

Thanks a lot in advance,

9 REPLIES 9

PeteWhite
F5 Employee
F5 Employee

There is a lot to unpack here - does the use of ASM make a difference? Do you have MAC masquerading configured? Are the ARP tables being updated after GARP? Do pools go down? Do you see client-side traffic hitting the newly active BIG-IP?

Hi Pete, during some incidents removing the ASM resolves the issue. So for example, we failover and experience the slowness where web pages does not load, we remove the ASM policy and then web immediately load. However for some instances this does not help for other websites. The ARP tables yes it becomes update pointing to the new active self-ip. The pools never went down and we see clients hitting the new big-ip.

 

On recent tshooting we did, as per the F5 support, client traffic hits all the way to the real server. However, the return traffic between the F5 and the firewall keeps on bouncing. F5 keeps on sending but no reply from the firewall. On the firewall end we get a reply that there is no issue with the firewall.

 ,

 

I would 1st start looking from tracepath if it even reaches the self ip of the box, to confirm whether its going to active box or the standby box.

Then start looking on the show sys connection table on boxes to check where the traffic is landing.

Then check the packets whats happening...

hi Jaikumar, these part we have already checked but not able to find the root cause. We asking for RMA as we suspect it could be due to resource issue but F5 refuse to accept it but after months we still don't have clarity.

PeteWhite
F5 Employee
F5 Employee

Have you checked the ARP tables on the Firewall? It is common for firewalls to drop GARP packets because of the risk of ARP cache poisoning attacks. ie the request may be coming through the BIG-IP, through the firewall and to the server, then the response gets back to the firewall which sends it to the standby BIG-IP. This then depends on the firewall ARP cache to time out ( which explains the time variance ) before it does an ARP request and receives the MAC address for the correct BIG-IP. Worth taking a look ( or configuring MAC masquerading on the traffic group, which is the best solution )

f5mkuDefault
Cirrus
Cirrus

Just want to update this, currently f5 is pushing this to firewall problem...no closure yet

This does sounds like a firewall problem. For example, when a failover occurs the TCP connections are not recognised by the new appliance (unless network mirroring is enabled for a VS). This results in a large number of TCP RSTs to all the servers and clients. I've seen a "nextgen" firewall see the large number of RSTs from BIG-IP and think it's a port scan.

I agree with eey0re that you may test with f5 connection mirroring and  mac masquerade and also during a failover the firewall teams needs to check the security and ddos logs.

f5mkuDefault
Cirrus
Cirrus

confirmed its firewall. Thanks everyone.