05-Jun-2022 18:00 - edited 08-Jun-2022 13:21
Having watched enterprises make the transition to modern apps in the cloud over the past 11+ years here at F5, one thing is abundantly clear: GSLB (Global Server Load Balancing), as a whole, is not well suited for the modern apps paradigm.
I know it's a bold statement, but if you're skeptical, as I was 6 years ago, please ask anyone who has built a cloud. Ask someone who has built a 5g infrastructure. These are truly massive applications by today's standards, but it needs to become clear to all of us that this is the way. Modern apps, like the people that develop them, will not wait. They move forward. They will only get wider.
At the beginning of the cloud journey, F5 saw people use the same tools they've always used, which truly work better than any other tool out there for Data Center centric applications. Our slide decks for this transition plan showed this method in very familiar ways, as well..
After years of helping our customers journey out to the cloud, we (technology manufacturers) started to learn some things about GSLB. It has finite scale. Some vendors have vastly more scale than others (OG GTM flex there, sorry!). Having built service provider DNS infrastructures for 12+ years of my career, I have not seen another product that can go as wide as F5 DNS in the GSLB market. I digress..
In addition, developers use the cloud VERY differently than they used the traditional data centers. If you're an application developer, is it best to keep your application locked down to 2 sites? Nothing burns a "classically trained" app-dev like a 2 data center outage, in my experience. Once developers started realizing that they could have 6 or so VPCs (AWS) or VNETs (Azure) for that same application, we saw devs able to defeat their biggest headache with ease. Unfortunately, however, this led to an adjustment in our Transition Plan slide..
Solving one issue for developers ends up creating a thing of nightmares for network engineers. The biggest issue here is health. If you know the guts of gtmd, you understand that there are only so many health monitor objects that can be remembered. Even though F5 DNS can select Prober Pools to optimize these checks, health is synchronized using iQuery. The object number is synchronized. This is not a problem that is unique to F5 DNS. It's an industry-wide issue that has yet to be resolved by any GSLB vendor, to date.. even the "cloudy" ones.
Speaking of "cloudy" ones, doesn't F5 have a resource for this in the Distributed Cloud (XC)? Well.. GSLB is coming to XC, but I can assure you that it will be seen 5 years from now as a transitional approach. XC uses anycast to scale application ingress. If the idea of this scares you, no need. The internet has been using it since 1989, the IETF ratified it as RFC 1546 in 1993 and it was put to use for scaling the root nameservers in 2001. It's 2022 as I'm writing this article. Trust me.. it's all good.
Let's look at LB for a minute to understand how XC differs from the rest of the portfolio in terms of how it load balances. To do that, let's first peek at some typifications of product:
Wait.. What does that mean with XC? If you heard that the cloud was just someone else's computer, you heard wrong. A cloud is a new way to design and deliver services. To truly understand it, you must understand serverless app concepts, migration of container workloads, messaging, resource scheduling.. SO many things that are not present on 'someone else's computer.' All of these concepts exist on thousands of other machines that all work in chorus to bring us the applications we need every day.
So how does this effect XC? A Customer Edge Node, or simply "CE," is not a load balancing appliance or even a virtual edition. It is an SDN router. The discrepancy here is substantial. One of the primary differences is that an SDN router is a software construct that connects software objects. It routes by object name, not just by IP. Your routes can be like, "I want to access 10.3.4.5 via aws-vpc2," instead of "... via 192.168.3.4." Inherently, this resolves overlapping IP issues in the cloud, but this is not the discrepancy that makes it really special in this use case..
An SDN Router is also a local agent of the greater construct. It can take on whatever task the control plane deems necessary. As it relates to application scale, this means that one SDN router can do health monitoring for the application endpoints in 4 geographically disperse regions and report the health back to the central "brains" of the SDN. Those same routers do not also have to advertise a VIP, however. Another SDN Router can do that instead!
Shown here is a screenshot of the XC console, looking at this exact relationship. For context, the boxes not labeled 'public' are the XC Regional Edges (REs), while the circles are CEs. When an object is reporting health, it is Green, Yellow or Red. Application ingress for this app is via the REs, but also from the CE on the right side.. which is Grey! It's only Grey because it is not reporting health of anything. Its role - as a component of the greater construct - is to advertise a VIP.. nothing more. Make no mistake, however, that this whole enchilada is the load balancing relationship for this application. Health in four regions of two different public clouds (Azure and GCP) is gained via four CEs, while ingress is provided by every RE (Anycast, remember?? Same IP, even!) and one solitary CE in a data center for internal traffic. And about that internal traffic...
So, remember some 20 paragraphs back (I was born with the -v flag on) when I said that GSLB will be seen as a transitional technology in a few years? Well, that CE on the right side that was presenting the VIP for our sample XC application has traffic driven to it via F5 DNS. As this customer migrates applications out to the cloud, the transition plan is easy.. put a CE cluster in each data center and have F5 DNS monitor the advertised VIPs on the CEs that represent each application in their respective data centers. GSLB allows us a comfortable way to use the tools we know to drive traffic to our applications - even if they live in new environments that we haven't fully mastered yet. For you old-school GTMers, this is simply identifying the CE advertised VIPs as "Generic Servers."
"Aubrey, how does that scale, though? What are the limiting factors for the platform?" Ah, yes. If you're not wondering that, I'd be shocked. Well.. again, a cloud is NOT just someone else's computer. A cloud is a massive chorus of compute nodes using complex messaging and daemons with all sorts of crazy acronyms you might not yet have heard. So compute is the limit. Luckily for us, we can add more:
It is interesting if an Edge Node can function like an internal DNS server with maybe topology GSLB load balancing for when you need only internal DNS resolution?
My understanding is that this is on the XC GSLB public roadmap, but it's been a bit since I had a date associated with it. If you need that functionality or if you would use it, I'd encourage you to talk to your Field SE and ask if their XC counterpart can be engaged. My last sales role was as an XC SE. They would have the most up to date understanding of that functionality.