on
06-Jul-2023
05:00
- edited on
10-Jul-2023
15:13
by
Rebecca_Moloney
BIG-IP connection mirroring is not supported in public cloud environments. Cloud Failover Extension (CFE) supports failover between BIG-IP devices, and persistence mirroring will work, but connection mirroring will not. This article discusses a customer case where we explored what happens if we try.
I don't see it often anymore, but some F5 customers use connection mirroring as a way to provide High Availability (HA) to applications where network connections are long-lived, such as telnet or FTP. Typically, connection mirroring is not required for short-lived connections like HTTP and UDP. Because modern applications tend to be stateless in nature, and their network connections are often resilient to network-layer failures (eg HTTP), it's rare that I get asked about connection mirroring in public cloud.
Note: persistence mirroring is different than connection mirroring. Persistence mirroring will work in public cloud. If you're unsure of the difference, I found this explanation fairly helpful.
Despite connection mirroring being unsupported in public cloud, I had a customer ask me if they could test connection mirroring in AWS so they could explain to their management the reasons behind things like cloud architectures, failover planning, and expectations of support.
A failover diagram across two AZ's in AWS. Failover between devices is supported, but Connection Mirroring is not.
Let me be clear, connection mirroring is not supported in public cloud. Do not plan to use this in public cloud. This article is intended to satisfy your curiosity and answer "why not", rather than "how to".
If you deploy F5 BIG-IP VE in public cloud and configure connection mirroring following the typical setup guidelines, you might still see mirrored connections on your standby device with the command show sys connection type mirror
. You may wonder to yourself, "what would happen if I enable connection mirroring on a Virtual Server, and attempt a failover?"
Naturally, my customer and I were curious to see if we could even test connection mirroring to watch it fail in public cloud.
When we read this FAQ from the Cloud Failover Extension (CFE) documentation, the support stance made sense. There are many reasons that connection mirroring will not work in a public cloud. In a rough order of my opinion of biggest to smallest show-stoppers, here are some (but not all) reasons:
To summarize, there's much to cloud networking that we as users cannot see or control.
As egineers, sometimes our instinct is to push back: I know it's unsupported and I understand why, but can I test myself? Naturally, my customer and I wanted to push further, and we had read Jeff Giroux's article about HA in cloud which made us curious. My customer was using AWS route-based failover with CFE, which is typically performed in only a few seconds.
First, we had to enable connection mirroring on the VE image they had used, which came straight from our Cloud Formation Templates (CFT), specifically this one. To do this, we had to:
After setting this up and creating a Virtual Server where the "Advanced" configurations had "Connection Mirroring" checkbox checked (default is unchecked), our test was not successful. I tested with a SSH session using Putty, but my SSH session was dropped at the time of device failover.
My client was an Ubuntu VM in the external VLAN of the BIG-IP, the SSH session was established to a VIP where the IP was from an alien range, and I used AWS route-based failover to move this IP address between Active and Standby BIG-IP's, which were in different AZ's. The backend pool member was another Ubuntu VM to which my SSH connection was proxied.
So, even without destination NAT'ing, without any 3rd party security devices, and using the bare minimum of failover time, we've shown that connection mirroring doesn't work in the public cloud.
I have heard anecdotal stories of connection mirroring half-working sometimes in public cloud. It has never worked for me, but I did hear from a customer that said only some of their EPIC EHR connections dropped when they did similar testing. However, they confirmed, that other connections did drop, so they did not successfully test connection mirroring in any kind of reliable way.
I'll leave it there for now, but if you have other test scenarios you can think of, let me know in the comments!
Do not plan to use connection mirroring in public cloud. It is unsupported. But I hope this article has helped you think through some of the architectures and implications of planning for HA in public cloud. Thanks for reading!
K84303332: Overview of connection and persistence mirroring (13.x - 16.x)
Thanks for the writeup @MichaelOLeary.
I will add that there are some other things to think about when looking to use connection-mirroring (TL;DR most times it's unnecessary):
# Settings 10 retry messages at one every 60sec
ServerAliveInterval = 60
ServerAliveCountMax = 10
Having said all that, if you do have a fastL4 wildcard routing-type Virtual Server *most* protocols tend to be fine unless its something in the middle of its transaction (e.g. database write, etc.)
I've helped customers deploy BIG-IP in Carrier Grade NAT scenarios (which is similar in some respects to a cloud-based environments) to be able to "seamlessly" fail devices or reboot so that subscribers are generally unaware their Internet is down: https://www.youtube.com/watch?v=hsb0OtqO_AM&list=PL5jC9WagzrjExq85JuWQHSUm9PegO3JmR&index=16
Thanks a lot for the comments and link to the video @shsingh . Like you said, it's nice to remember that ensuring your fastL4 servers have the ability to pass flows in flight with loose-initiation and loose-strict values is a good practice. I should have written that in the article. Also, like you said, SSH, RDP, and other protocols can have configurable keep-alives.
So if you're reading these comments and want to dive in further on your particular issue, leave a comment and/or reach out to us. Thanks for reading!