What are the steps to take in case of an unforeseen 'disaster' to F5 devices?
What can we do if, for example, the F5 device (F5 i5800 or C2400) and its peer are damaged at the same time?
Is bypassing the F5 device possible? How?
Is there anything that can replace the F5 device just to make sure the network traffic flows smoothly and we can still remotely access the other devices connected to it?
This is a difficult answer, as it may vary a lot depending on your setup. Can you give a better context?
So, you have a cluster of 2 appliances.
What are you using them for? Do they host business critical applications that resolve to F5 IPs? In this case, when both units are down applications will be unavailable, and there's no way to bypass F5 easily since you'll need to route all traffic in another way.
There's external appliances that can act like a "bypass" and monitor links to F5 devices, and when they see links down they forward traffic downstream on another link, but in this case destination will still be F5 IP's so you need to plan for a disaster plan that might involve updating DNS entries.
A workaround to this problem might as well be configuring F5 virtual servers with the same IP's of BE servers, so that with a bypass device there will still be something that respond to those IP's, but again with so little context it's difficult to consider.
Another thing that comes to my mind, which is the most straightforward, will be upgrading your cluster to three devices. If application availability is so critical to you that you need to plan for a double fault, this might as well justify the budget to improve redundancy.
Hello, and thanks for your feedback.
What external appliances that can act like a "bypass" and monitor links to F5 devices are you referring to? (i.e. switches, routers, firewalls)
Can an F5 L3 switch or router be used to replace both of the damaged F5 devices?
Or could you suggest any other way to solve the issue aside from RMA?
I'm referring to TAP bypass appliances, for example like Garland.
This will work pretty good when there is no NAT and your F5 is proxying something using the same IP address of the service, for example if you run a WAF. In this case when F5 is down TAP will forward traffic downlink and your routing processes will forward it to destination without packet loss.
Again, I think we need better context, for example if you rely on NAT and if virtual address IP's only exist on F5 TAP won't really do much and you still need to plan where to send this traffic when a disaster manifests.
@lttarvina In the unlikely event that both F5 devices fail at the exact same time in your HA cluster and they are configured to be in path the only device that can realistically be swapped to just allow routed traffic to get back and forth would be a router or an L3 switch configured to route traffic between the two or more subnets that the F5 was routing traffic between. If you had a router in front of and behind the F5s already I would imagine you could have an additional path that goes around the F5s and utilize SLA monitors to change routes to the bypass if the F5 floating IP doesn't respond to the SLA. This seems like an edge case that you are attempting to solve for because any device in HA that would have both devices fail simultaniously is most likely having a larger issue in that region or would require 1 of the 2 failed devices that failed to be replaced anyway. You might consider having a single F5 for each device type as a backup so that you can restore the configuration and put that extra device into production and wait on an RMA for the failed devices.