Multi-cluster Kubernetes/Openshift with GSLB-TOOL

Overview

This is article 1 of 2.
GSLB-TOOL is an OSS project around BIG-IP DNS (GTM) and F5 CloudServices’ DNS LB GSLB products to provide GSLB functionality to Openshift/Kubernetes. GSLB-TOOL is a multi-cluster enabler.

 

Doing multi-cluster with GSLB has the following advantages:

  • Cross-cloud. Services are published in a coordinated manner while being hosted in any public cloud or private cloud.
  • High degree of control. Publishing is done based on service name instead of IP address. Traffic is directed to specific data center based on operational decisions such as service load and also allowing canary, blue/green, and A/B deployments across data centers.
  • Stickiness. Regardless the topology changes in the network, clients will be consistently directed to the same data center.
  • IP Intelligence. Clients can be redirected to the desired data center based on client’s location and gather stats for analytics.

 

The use cases covered by GSLB-TOOL are:

  • Multi-cluster deployments
  • Data center load distribution
  • Enhanced customer experience
  • Advanced Blue/Green, A/B and Canary deployment options
  • Disaster Recovery
  • Cluster Migrations
  • Kubernetes <-> Openshift migrations
  • Container's platform version migration. For example, OCP 3.x to 4.x or OCP 4.x to 4.y.

 

GSLB-TOOL is implemented as a set of Ansible scripts and roles that can be used from the cli or from a Continious Delivery tool such as Spinnaker or Argo CD. The tool operates as a glue between the Kubernetes/Openshift API and the GSLB API. GSLB-TOOL uses GIT as source of truth to store its state hence the GSLB state is not in any specific cluster. The next figure shows an schema of it.

 

It is important to emphasize that GSLB-TOOL is cross-vendor as well since it can use any Ingress Controller or Router implementation. In other words, It is not necessary to use BIG-IP or NGINX for this. Moreover, a given cluster can have several Router/Ingress controller instances from difference vendors. This is thanks of only using the Openshift/Kubernetes APIs when inquiring about the container routes deployed

 

Usage

To better understand how GSLB-TOOL operates it is important to remark the following characteristics:

  • GSLB-TOOL operates with project/namespace granularity, in a per cluster bases. When operating with a cluster's project/namespace it operates with all the L7 routes of the cluster's project/namespace at once.

For example, the following command:

$ project-retrieve shop onprem

Will retrieve all the L7 routes of the namespace shop from the onprem cluster. Having a cluster/namespace simplifies management and mimics the behavior of RedHat’s Cluster’s Application Migration tool.

In the next figure we can see the overal operations of GSLB-TOOL. At the top we can see in bold the name of the clusters (onprem and aws). In the figure these are only Openshift (aka OCP) clusters but it could be any other Kubernetes as well. We can also see two sample project/namespaces (Project A and Project B). Different clusters can have different namespaces as well. There are two types of commands/actions:

  • The project-* commands operate on the Kubernetes/Openshift API and in the source of truth/GIT repository.

These commands operate with a project/namespace granularity. GSLB-TOOL doesn't modify your Openshift/K8s cluster, it only performs read-only operations.

  • The gslb-* commands operates on the source of truth/GIT repository and with the GSLB API of choice, either BIG-IP or F5 Cloud Services.

These commands operate with all the project/namespaces of all clusters at once either submitting or rolling back the changes in the GSLB backends. When GSLB-TOOL pushes the GSLB configuration either performs all changes or doesn’t perform any. Thanks to the use of GIT the gslb-rollback command easily reverts the configuration if desired. Actually, creating the Backup of the previous step is only useful when using GSLB-TOOL without GIT which is possible too.

 

 

GSLB-TOOL flexibility

GSLB-TOOL has been designed with flexibility in mind. This is reflected in many features it has:

  • It is agnostic of the Router/Ingress Controller implementation.
  • In the same GSLB domain, it supports concurrently vanilla Kubernetes and Openshift clusters.
  • It is possible to have multiple Routers/Ingress Controllers in the same Kubernetes/Openshift cluster.
  • It is possible to define multiple Availability Zones for a given Router/Ingress Controller.
  • It can be easily modified given that it is written in Ansible. Furthermore, the Ansible playbooks make use of template files that can be modified if desired.
  • Multple GSLB backends. At present GSLB-TOOL can use either F5 Cloud Service’s DNS LB (a SaaS offering) or F5 BIG-IP DNS (aka GTM) by simply changing the value of the backend configuration option to either f5aas or bigip. All operations, configuration files, etc… remain the same. At present it is recommended F5 BIG-IP DNS because currently offers better monitoring options.
  • Easiness to PoC. F5 Cloud Service’s DNS LB can be used to test the tool and later on switch to F5 BIG-IP DNS by simply changing the backend configuration option.

 

GSLB-TOOL L7 route inquire/config flexibility

It is specially important to have flexibility when configuring the L7 routes in our source of truth. We might be interested in the following scenarios for a given namespace:

  • Homogeneous L7 routes across clusters - In occasions we expect that all clusters have the same L7 routes for a given namespace. This happens, for example, when all applications are the same in all clusters.
  • Heterogeneous L7 routes across clusters - In occasions we expect that each cluster might have different L7 routes for a given namespace, when there are different versions of the applications (or different applications). This happens, for example, when we are testing new versions of the applications in a cluster and other clusters use the previous version.

To handle these scenarios, we have two strategies when populating the routes:

  • project-retrieve – We use the information from the cluster’s Route/Ingress API to populate GSLB.
  • project-populate – We use the information from another cluster’s Route/Ingress API to populate GSLB. The cluster from where we take the L7 routes is referred as the reference cluster.

We exemplify these strategies in the following figure where we use a configuration of two clusters (onprem and aws) and a single project/namespace. The L7 routes (either Ingress or Route resources) in these are different: the cluster onprem has two addional L7 routes (/shop and /checkout).

 

We are going to populate our GSLB backend in three different ways:

  • In Example 1, we perform the following actions in sequence:

 

  1. With project-retrieve web onprem we retrieve from the cluster onprem the L7 routes of the project web and these are stored in the Git repository or source of truth.
  2. Analogously, with project-retrieve web aws we retrieve from the cluster aws the L7 routes (only one in this case) and these are treieved in the Git repository or source of truth.
  3. We submit this configuration into the GSLB backend with gslb-commit. The GSLB backend expects that the onprem cluster has 3 routes and the aws backend 1 route. If the services are available the health check results for both clusters will be Green. Therefore the FQDN will return the IP addresses of both clusters' Routers/Ingress Controllers.

 

  • In Example 2, we use the project-populate strategy:
  1. We perform the same first action as in Example 1.
  2. With project-populate web onprem aws we indicate that we expect that the L7 routes defined in onprem are also available in the aws cluster which is not the case. In other words, the onprem cluster is used as the reference cluster for aws.
  3. After we submit the configuration in GSLB with gslb-commit, the healthchecks in the onprem cluster will succeed and will fail on aws because /shop and /checkout don't exist (an HTP/404 is returned). Therefore for the FQDN www.f5bddemos.io will return only the IP address of onprem. This will be green automatically, once we update the L7 routes and applications in aws.

 

  • In Example 3, we use again the project-populate strategy but we use aws are reference cluster.
  1. Unlike in the previous examples, with project-retrieve web aws we retrieve the routes from the cluster aws.
  2. With project-populate web aws onprem we do the reverse as in step b of the Example 2: we use the aws as reference for onprem instead.
  3. After submission of the config with gslb-commit. Given that onprem has the L7 route that aws has, the health checking will succeed.

 

For sake of simplicity, In the examples above it has been shown projects/namespaces with only a single FQDN for all their L7 routes but for a given namespace it is possible to have an arbitrary number of L7 routes and FQDNs. There is no limitation on this either.

 

Additional information

If you want to see GSLB-TOOL in practice please check this video.For more information on this tool, please visit the GSLB-TOOL Home Page and it's Wiki Page for additional documentation.

Published Dec 01, 2020
Version 1.0
No CommentsBe the first to comment