Business and Application Owners want, demand, and should have...
Uptime! Resiliency! Scalability! Availability!
This article will provide information about BIG-IP and NGINX high availability (HA) topics that should be considered when leveraging the public cloud. There are differences between on-prem and public cloud such as modified or unique cloud provider L2 networking. These differences lead to challenges in how you address HA, failover time, peer setup, scaling options, and application state. I'll call out key items as well as provide recommendations based on my experience working with customers and testing in my lab.
Note: Cloud providers handles particular features differently than other providers. Where there are unique differences between cloud providers, there will be relevant links and/or notes to boost up your knowledge pointing to that specific cloud provider!
High availability can mean many things to different people. Depending on the application and traffic requirements, HA requires dual data paths, redundant storage, and compute. Oh, and don't forget redundant power supplies for on-prem stuff. HA as I’m defining it here means (among other reasons), the ability to survive a failure, maintenance windows should be seamless to user, and the user experience should never suffer...ever! I like to think of HA as providing that "always on" experience similar to electricity. When I flip on a light switch, electricity just...simply...works. Similarly when I visit my favorite web site or play my favorite online game, I expect it to just...simply...work.
So what should HA provide?
I was in a customer meeting recently, and the discussion was focused specifically on session state and sizing. The use case was common...
"gaming app, lots of persistent connections, client needs to hit same backend throughout entire game session"
The requirement of session state is common across many applications and can use methods like HTTP cookies, custom F5 iRule persistence, JSessionID, IP affinity, hash, and more. The type of session state used by the application can help you decide what migration type is right for that app. Is this an app more fitting for a lift-n-shift approach...Rehost? Is this an app that can be re-coded a bit to take advantage of some cloud-native service...Replatform? Can the app be TOTALLY redesigned to take advantage of all native IaaS and PaaS technologies...Refactor? If you are not aware of the 6 R’s I’m referencing here, please check out the reference below.
Reference: 6 R's of a Cloud Migration
The latter two items will be discussed in more detail in upcoming sections.
As for traffic sizing, well...you would hope the cloud takes care of most of that. It does a great job with things like auto scale, but there are still cloud provider limits that affect sizing and machine instance types to keep in mind. Since I'm talking specifically about BIG-IP and NGINX products, those are considered network virtual appliances (NVA)...or in other words just another compute instance.
The latter three items can vary results on VM count/size depending on cloud provider limits. Unfortunately, not all limits are documented. Key metrics for L7 proxies are typically SSL stats, throughput, connection type, and connection count. Collecting requirements around application behavior and traffic sizing can help you choose the instance size as well as instance count. We have a list of the F5 supported BIG-IP VE platforms on F5 CloudDocs.
Next, we'll dive into the HA capabilities of the various F5 products and ways to deploy.
This section will cover the BIG-IP and NGINX capabilities for HA.
BIG-IP supports the following HA cluster configurations:
Reference: BIG-IP High Availability Docs
NGINX supports the following HA cluster configurations:
Reference: NGINX High Availability Docs
In the following sections, I will illustrate 5 common deployment configurations for BIG-IP in public cloud.
When failover methods use API calls, the results are dependent upon the cloud provider processing that request, how fast, and in what fashion (bulk, sequentially). We use the F5 Cloud Failover Extension (CFE) for BIG-IP failover with the API method. I suggest you head over to the CFE page and take a look!
In the following sections, I will illustrate 2 common deployment configurations for NGINX in public cloud.
Reference: Active-Passive HA for NGINX Plus on AWS
As a means to make this topic a little more real, here is a common customer scenario that shows you the decisions that go into moving an application to the public cloud. Sometimes it's as easy as a lift-n-shift, other times you might need to do a little more work. In general, public cloud is not on-prem and things might need some tweaking. Hopefully this example will give you some pointers and guidance on your next app migration to the cloud.
Requirements for Successful Cloud Migration:
Recommended Design for Cloud Phase #1:
This example comes up in customer conversations often. Based on customer requirements, in-house skillset, current operational model, and time frames there is one option that is better than the rest. A second design phase lends itself to more of a Replatform or Refactor migration type. In that case, more options can be leveraged to take advantage of cloud-native features. For example, changing the application persistence type from iRule UIE to cookie would allow BIG-IP to avoid keeping track of state. Why? With cookies, the client keeps track of that session state. Client receives a cookie, passes the cookie to L7 proxy on successive requests, proxy checks cookie value, sends to backend pool member. The requirement for L7 proxy to share session state is now removed.
Here is another customer scenario. This time the application is a full suite of multimedia content. In contrast to the first scenario, this one will illustrate the benefits of rearchitecting various components allowing greater flexibility when leveraging the cloud. You still must factor in-house skill set, project time frames, and other important business (and application) requirements when deciding on the best migration type.
Requirements for Success Cloud Migration:
Recommended Design for Cloud Phase #1:
This is a great example of a Repurchase in which application characteristics can allow the various teams to explore alternative cloud migration approaches. In this scenario, it describes a phase one migration of converting BIG-IP devices to NGINX Plus devices. This example assumes the BIG-IP configurations can be somewhat easily converted to NGINX Plus, and it also assumes there is available skillset and project time allocated to properly rearchitect the application where needed.
OK! Brains are expanding...hopefully? We learned about high availability and what that means for applications and user experience. We touched on the importance of application behavior and traffic sizing. Then we explored the various F5 products, how they handle HA, HA designs, and my favorite...my own personal recommendations. These are of course my own recommendations and not F5 official recommendations. These recommendations are based on my own lab testing and interactions with customers. Every scenario will carry its own requirements, and all options should be carefully considered when leveraging the public cloud. Finally, we looked at a customer scenario, discussed requirements, and design proposal. Fun!
Read the following articles for more guidance specific to the various cloud providers. The information provided earlier is meant to be more general across all clouds.
AWS and BIG-IP: Advanced Topologies and More on Highly Available Services
Azure and BIG-IP: Lightboard Lessons - BIG-IP Deployments in Azure
Google and BIG-IP: Failing Faster in the Cloud
F5 CloudDocs: BIG-IP VE on Public Cloud
Google and NGINX Plus: High-Availability Load Balancing with NGINX Plus on Google Cloud Platform
AWS and NGINX Plus: Using AWS Quick Starts to Deploy NGINX Plus
Azure and NGINX Plus: NGINX on Azure