Multicloud Networking
68 TopicsKubernetes architecture options with F5 Distributed Cloud Services
Summary F5 Distributed Cloud Services (F5 XC) can both integrate with your existing Kubernetes (K8s) clusters and/or host a K8s workload itself. Within these distinctions, we have multiple architecture options. This article explores four major architectures in ascending order of sophistication and advantages. Architecture #1: External Load Balancer (Secure K8s Gateway) Architecture #2: CE as a pod (K8s site) Architecture #3: Managed Namespace (vK8s) Architecture #4: Managed K8s (mK8s) Kubernetes Architecture Options As K8s continues to grow, options for how we run K8s and integrate with existing K8s platforms continue to grow. F5 XC can both integrate with your existing K8s clusters and/or run a managed K8s platform itself. Multiple architectures exist within these offerings too, so I was thoroughly confused when I first heard about these possibilities. A colleague recently laid it out for me in a conversation: "Michael, listen up: XC can either integrate with your K8s platform, run inside your K8s platform, host virtual K8s (Namespace-aaS), or run a K8s platform in your environment." I replied, "That's great. Now I have a mental model for differentiating between architecture options." This article will overview these architectures and provide 101-level context: when, how, and why would you implement these options? Side note 1: F5 XC concepts and terms F5 XC is a global platform that can provide networking and app delivery services, as well as compute (K8s workloads). We call each of our global PoP's a Regional Edge (RE). RE's are highly meshed to form the backbone of the global platform. They connect your sites, they can expose your services to the Internet, and they can run workloads. This platform is extensible into your data center by running one or more XC Nodes in your network, also called a Customer Edge (CE). A CE is a compute node in your network that registers to our global control plane and is then managed by a customer as SaaS. The registration of one or more CE's creates a customer site in F5 XC. A CE can run on a hypervisor (VMWare/KVM/Etc), a Hyperscaler (AWS, Azure, GCP, etc), baremetal, or even as a k8s pod, and can be deployed in HA clusters. XC Mesh functionality provides connectivity between sites, security services, and observability. Optionally, in addition, XC App Stack functionality allows a large and arbitrary number of managed clusters to be logically grouped into a virtual site with a single K8s mgmt interface. So where Mesh services provide the networking, App Stack services provide the Kubernetes compute mgmt. Our first 2 architectures require Mesh services only, and our last two require App Stack. Side note 2: Service-to-service communication I'm often asked how to allow services between clusters to communicate with each other. This is possible and easy with XC. Each site can publish services to every other site, including K8s sites. This means that any K8s service can be reachable from other sites you choose. And this can be true in any of the architectures below, although more granular controls are possible with the more sophisticated architectures. I'll explore this common question more in a separate article. Architecture 1: External Load Balancer (Secure K8s Gateway) In a Secure Kubernetes Gateway architecture, you have integration with your existing K8s platform, using the XC node as the external load balancer for your K8s cluster. In this scenario, you create a ServiceAccount and kubeconfig file to configure XC. The XC node then performs service discovery against your K8s API server. I've covered this process in a previous article, but the advantage is that you can integrate with existing K8s platforms. This allows exposing both NodePort and ClusterIP services via the XC node. XC is not hosting any workloads in this architecture, but it is exposing your services to your local network, or remote sites, or the Internet. In the diagram above, I show a web application being accesssed from a remote site (and/or the Internet) where the origin pool is a NodePort service discovered in a K8s cluster. Architecture 2: Run a site within a K8s cluster (K8s site type) Creating a K8s site is easy - just deploy a single manifest found here. This file deploys multiple resources in your cluster, and together these resources work to provide the services of a CE, and create a customer site. I've heard this referred to as "running a CE inside of K8s" or "running your CE as a pod". However, when I say "CE node" I'm usually referring to a discreet compute node like a VM or piece of hardware; this architecture is actually a group of pods and related resources that run within K8s to create a XC customer site. With XC running inside your existing cluster, you can expose services within the cluster by DNS name because the site will resolve these from within the cluster. Your service can then be exposed anywhere by the F5 XC platform. This is similar to Architecture 1 above, but with this model, your site is simply a group of pods within K8s. An advantage here is the ability to expose services of other types (e.g. ClusterIP). A site deployed into a K8s cluster will only support Mesh functionality and does not support AppStack functionality (i.e., you cannot run a cluster within your cluster). In this architecture, XC acts as a K8s ingress controller with built-in application security. It also enables Mesh features, such as publishing of other sites' services on this site, and publishing of this site's discovered services on other sites. Architecture 3: vK8s (Namespace-as-a-Service) If the services you use include AppStack capabilities, then architectures #3 and #4 are possible for you. In these scenarios, our XC node actually runs your K8s on your workloads. We are no longer integrating XC with your existing K8s platform. XC is the platform. A simple way to run K8s workloads is to use a virtual k8s (vK8s) architecture. This could be referred to as a "managed Namespace" because by creating a vK8s object in XC you get a single namespace in a virtual cluster. Your Namespace can be fully hosted (deployed to RE's) or run on your VM's (CE's), or both. Your kubeconfig file will allow access to your Namespace via the hosted API server. Via your regular kubectl CLI (or via the web console) you can create/delete/manage K8s resources (Deployments, Services, Secrets, ServiceAccounts, etc) and view application resource metrics. This is great if you have workloads that you want to deploy to remote regions where you do not have infrastructure and would prefer to run in F5's RE's, or if you have disparate clusters across multiple sites and you'd like to manage multiple K8s clusters via a single centralized, virtual cluster. Best practice guard rails for vK8s With a vK8s architecture, you don't have your own cluster, but rather a managed Namespace. So there are some restrictions (for example, you cannot run a container as root, bind to a privileged port, or to the Host network). You cannot create CRD's, ClusterRoles, PodSecurityPolicies, or Namespaces, so K8s operators are not supported. In short, you don't have a managed cluster, but a managed Namespace on a virtual cluster. Architecture 4: mK8s (Managed K8s) In managed k8s (mk8s, also known as physical K8s or pk8s) deployment, we have an enterprise-level K8s distribution that is run at your site. This means you can use XC to deploy/manage/upgrade K8s infrastructure, but you manage the Kubernetes resources. The benefits include what is typical for 3rd-party K8s mgmt solutions, but also some key differentiators: multi-cloud, with automation for Azure, AWS, and GCP environments consumed by you as SaaS enterprise-level traffic control natively allows a large and arbitrary number of managed clusters to be logically managed with a single K8s mgmt interface You can enable kubectl access against your local cluster and disable the hosted API server, so your kubeconfig file can point to a global URL or a local endpoint on-prem. Another benefit of mK8s is that you are running a full K8s cluster at your site, not just a Namespace in a virtual cluster. The restrictions that apply to vK8s (see above) do not apply to mK8s, so you could run privileged pods if required, use Operators that make use of ClusterRoles and CRDs, and perform other tasks that require cluster-wide access. Traffic management controls with mK8s Because your workloads run in a cluster managed by XC, we can apply more sophisticated and native policies to K8s traffic than non-managed clusters in earlier architectures: Service isolation can be enforced within the cluster, so that pods in a given namespace cannot communicate with services outside of that namespace, by default. More service-to-service controls exist so that you can decide which services can reach with other services with more granularity. Egress control can be natively enforced for outbound traffic from the cluster, by namespace, labels, IP ranges, or other methods. E.g.: Svc A can reach myapi.example.com but no other Internet service. WAF policies, bot defense, L3/4 policies, etc—all of these policies that you have typically applied with network firewalls, WAF's, etc—can be applied natively within the platform. This architecture took me a long time to understand, and longer to fully appreciate. But once you have run your workloads natively on a managed K8s platform that is connected to a global backbone and capable of performing network and application delivery within the platform, the security and traffic mgmt benefits become very compelling. Conclusion: As K8s continues to expand, management solutions of your clusters make it possible to secure your K8s services, whether they are managed by XC or exist in disparate clusters. With F5 XC as a global platform consumed as a service—not a discreet installation managed by you—the available architectures here are unique and therefore can accommodate the diverse (and changing!) ways we see K8s run today. Related Articles Securely connecting Kubernetes Microservices with F5 Distributed Cloud Multi-cluster Multi-cloud Networking for K8s with F5 Distributed Cloud - Architecture Pattern Multiple Kubernetes Clusters and Path-Based Routing with F5 Distributed Cloud11KViews29likes5CommentsF5 Distributed Cloud - Customer Edge Site - Deployment & Routing Options
F5 Distributed Cloud Customer Edge (CE) software deployment models for scale and routing for enterprises deploying multi-cloud infrastructure. Today's service delivery environments are comprised of multiple clouds in a hybrid cloud environment. How your multi-cloud solution attaches to your existing on-prem and cloud networks can be the difference between a successful overlay fabric, and one that leave you wanting more out of your solution. Learn your options with F5 Distributed Cloud Customer Edge software.14KViews19likes3CommentsF5 Distributed Cloud - Regional Decryption with Virtual Sites
In this article we discuss how the F5 Distributed Cloud can be configured to support regulatory demands for TLS termination of traffic to specific regions around the world. The article provides insight into the F5 Distributed Cloud global backbone and application delivery network (ADN). The article goes on to inspect how the F5 Distriubted Cloud is able to achieve these custom topologies in a multi-tenant architecture while adhearing to the "rules of the internet" for route summarization. Read on to learn about the flexibility of F5's SaaS platform providing application delivery and security solutions for your applications.7KViews17likes2CommentsF5 Distributed Cloud - Listener Logic
In a proxy, there is a client-side and server-side connection. In this article, we'll focus on how the proxy "picks-up" or "listens" for traffic on the client-side. There are many options and creative ideas that adapt to enterprises business needs. First, we need to know the mechanics and what is possible, and this article covers those basics.2.9KViews16likes1CommentCommunity Learning Path: Multi-Cloud Networking
This Learning Path article will serve as your guide to content that will build your skills in Multi-Cloud Networking. The content is organized starting with Foundational Topics to get you familiar with concepts. This is followed by content that will help you with Basic Configuration. After that, there is content listed for specific Use Case Configurations. This Learning Path is a living document and will be updated as new content is developed. Foundational Topics What is Multi-Cloud Networking? What is Multi-Cloud Networking - Brightboard Lesson Basic Configuration F5 Distributed Cloud Multi Cloud App Demo - Video Experience F5 Distributed Cloud with Multi-Cloud Sites and Distributed Apps Demo Guide & Video Series for F5 Distributed Cloud Network Connect (Multi-Cloud Networking) Build It Live! - Multi-Cloud Networking Live Streams Building an F5 Distributed Cloud Customer Edge, from Hawaii! - Video Multi-Cloud Networking Demo Guide - Github Repo Use Case Configurations Using F5 Distributed Cloud private connectivity orchestration for secure multi-cloud infrastructure Using F5 Distributed Cloud Network Connect to transit, route, & secure private cloud environments When using F5 Distributed Cloud Platform, never deal with Site to Site IP conflicts again! Using F5 Distributed Cloud private connectivity orchestration for secure multi-cloud infrastructure Governance and Automation - Distributed Apps for Hybrid Cloud Architecture Protect an application spread across several locations with F5 XC WAAP and Multi-Cloud Networking2.9KViews13likes1CommentF5 Distributed Cloud – CE High Availability Options: A Comparative Exploration
This article explores an alternative approach to achieve HA across single CE nodes, catering for use cases requiring higher performance and granular control over redundancy and failover management. Introduction F5 Distributed Cloud offers different techniques to achieve High Availability (HA) for Customer Edge (CE) nodes in an active-active configuration to provide redundancy, scaling on-demand and simplify management. By default, F5 Distributed Cloud uses a method for clustering CE nodes, in which CEs keep track of peers by sending heartbeats and facilitating traffic exchange among themselves. This method also handles the automatic transfer of traffic, virtual IPs, and services between CE peers —excellent for simplified deployment and running App Stack sites hosting Kubernetes workloads. However, if CE nodes are deployed mainly to manage L3/L7 traffic and application security, this default model might lack the flexibility needed for certain scenarios. Many of our customers tell us that achieving high availability is not so straightforward with the current clustering model. These customers often have a lot of experience in managing redundancy and high availability across traditional network devices. They like to manage everything themselves—from scheduling when to switch over to a redundant pair (planned failover), to choosing how many network paths (tunnels) to use between CEs to REs (Regional Edges) or other CEs. They also want to handle any issues device by device, decide the number of CE nodes in a redundancy group, and be able to direct traffic to different CEs when one is being updated. Their feedback inspired us to write this article, where we explore a different approach to achieve high availability across CEs. The default clustering model is explained in this document: https://docs.cloud.f5.com/docs/ves-concepts/site#cluster-of-nodes Throughout this article, we will dive into several key areas: An overview of the default CE clustering model, highlighting its inherent challenges and advantages. Introduction to an alternative clustering strategy: Single Node Clustering, including: An analysis of its challenges and benefits. Identification of scenarios where this approach is most applicable. A guide to the configuration steps necessary to implement this model. An exploration of failover behavior within this framework. A comparison table showing how this new method differs from the default clustering method. By the end of this article, readers will gain an understanding of both clustering approaches, enabling informed decisions on the optimal strategy for their specific needs. Default CE Clustering Overview In a standard CE clustering setup, a cluster must have at least three Master nodes, with subsequent additions acting as Worker nodes. A CE cluster is configured as a "Site," centralizing operations like pool configuration and software upgrades to simplify management. In this clustering method, frequent communication is required between control plane components of the nodes on a low latency network. When a failover happens, the VIPs and services - including customer’s compute workloads - will transition to the other active nodes. As shown in the picture above, a CE cluster is treated as a single site, regardless of the number of nodes it contains. In a Mesh Group scenario, each mesh link is associated with one single tunnel connected to the cluster. These tunnels are distributed among the master nodes in the cluster, optimizing the total number of tunnels required for a large-scale Mesh Group. It also means that the site will be connected to REs only via 2 tunnels – one to each RE. Design Considerations for Default CE Clustering model: Best suited for: 1- App Stack Sites: Running Kubernetes workloads necessitates the default clustering method for container orchestration across nodes. 2- Large-scale Site-Mesh Groups (SMG) 3- Cluster-wide upgrade preference: Customers who favour managing nodes collectively will find cluster-wide upgrades more convenient, however without control over the upgrade sequence of individual nodes. Challenges: o Network Bottleneck for Ingress Traffic: A cluster connected to two Regional Edge (RE) sites via only 2 tunnels can lead to only two nodes processing external (ingress) traffic, limiting the use of additional nodes to process internal traffic only. o Three-master node requirement: Some customers are accustomed to dual-node HA models and may find the requirement for three master nodes resource-intensive. o Hitless upgrades: Controlled, phased upgrades are preferred by some customers for testing before widespread deployment, which is challenging with cluster-wide upgrades. o Cross-site deployments: High network latency between remote data centers can impact cluster performance due to the latency sensitivity of etcd daemon, the backbone of cluster state management. If the network connection across the nodes gets disconnected, all nodes will most likely stop the operation due to the quorum requirements of etcd. Therefore, F5 recommends deploying separate clusters for different physical sites. o Service Fault Sprawl and limited Node fault tolerance: Default clusters can sometimes experience a cascading effect where a fault in a node spreads throughout the cluster. Additionally, a standard 3-node cluster can generally only tolerate the failure of two nodes. If the cluster was originally configured with three nodes, functionality may be lost if reduced to a single active node. These limitations stem from the underlying clustering design and its dependency on etcd for maintaining cluster state. The Alternative Solution: HA Between Multiple Single Nodes The good news is that we can achieve the key objectives of the clustering – which are streamlined management and high availability - without the dependency on the control plane clustering mechanisms. Streamlined management using “Virtual Site”: F5 Distributed Cloud provides a mechanism called “Virtual Site” to perform operations on a group of sites (site = node or cluster of nodes), reducing the need to repeat the same set of operations for each site. The “Virtual Site” acts as an abstraction layer, grouping nodes tagged with a unique label and allows collectively addressing these nodes as a single entity. Configuration of origin pools and load balancers can reference Virtual Sites instead of individual sites/nodes, to facilitate cluster-like management for two or more nodes and enabling controlled day 2 operations. When a node is disassociated from Virtual Site by removing the label, it's no longer eligible for new connections, and its listeners are simultaneously deactivated. Upgrading nodes is streamlined: simply remove the node's label to exclude it from the Virtual Site, perform the upgrade, and then reapply the label once the node is operational again. This procedure offers you a controlled failover process, ensuring minimal disruption and enhanced manageability by minimizing the blast radius and limiting the cope of downtime. As traffic is rerouted to other CEs, if something goes wrong with an upgrade of a CE node, the services will not be impacted. HA/Redundancy across multiple nodes: Each single node in a Virtual Site connect to dual REs through IPSec or SSL/TLS tunnels, ensuring even load distribution and true active-active redundancy. External (Ingress) Traffic: In the Virtual Site model, the Regional Edges (REs) distribute external traffic evenly across all nodes. This contrasts with the default clustering approach where only two CE nodes are actively connected to the REs. The main Virtual Site advantage lies in its true active/active configuration for CEs, increasing the total ingress traffic capacity. If a node becomes unavailable, the REs will automatically reroute the new connections to another operational node within the Virtual Site, and the services (connection to origin pools) remain uninterrupted. Internal (East-West) Traffic: For managing internal traffic within a single CE node in a Virtual Site (for example, when LB objects are configured to be advertised within the local site), all network techniques applicable to the default clustering model can be employed in this model as well, except for the Layer 2 attachment (VRRP) method. Preferred load distribution method for internal traffic across CEs: Our preferred methods for load balancing across CE nodes are either DNS based load balancing or Equal-Cost Multi-Path (ECMP) routing utilizing BGP for redundancy. DNS Load Balancer Behavior: If a node is detached from a Virtual Site, its associated listeners and Virtual IPs (VIPs) are automatically withdrawn. Consequently, the DNS load balancer's health checks will mark those VIPs as down and prevent them from receiving internal network traffic. Current limitation for custom VIP and BGP: When using BGP, please note a current limitation that prevents configuring a custom VIP address on the Virtual Site. As a workaround, custom VIPs should be advertised on individual sites instead. The F5 product team is actively working to address this gap. For a detailed exploration of traffic routing options to CEs, please refer to the following article here: https://community.f5.com/kb/technicalarticles/f5-distributed-cloud---customer-edge-site---deployment--routing-options/319435 Design Considerations for Single Node HA Model: Best suited for: 1- Customers with high throughput requirement: This clustering model ensures that all Customer Edge (CE) nodes are engaged in managing ingress traffic from Regional Edges (REs), which allows for scalable expansion by adding additional CEs as required. In contrast, the default clustering model limits ingress traffic processing to only two CE nodes per cluster, and more precisely, to a single node from each RE, regardless of the number of worker nodes in the cluster. Consequently, this model is more advantageous for customers who have high throughput demands. 2- Customers who prefer to use controlled failover and software upgrades This clustering model enables a sequential upgrade process, where nodes are updated individually to ensure each node upgrades successfully before moving on to the other nodes. The process involves detaching the node from the cluster by removing its site label, which causes redirecting traffic to the remaining nodes during the upgrade. Once upgraded, the label is reapplied, and this process is repeated for each node in turn. This is a model that customers have known for 20+ years for upgrade procedures, with a little wrinkle with the label. 3- Customers who prefer to distribute the load across remote sites Nodes are deployed independently and do not require inter-node heartbeat communication, unlike the default clustering method. This independence allows for their deployment across various data centers and availability zones while being managed as a single entity. They are compatible with both Layer 2 (L2) spanned and Layer 3 (L3) spanned data centers, where nodes in different L3 networks utilize distinct gateways. As long as the nodes can access the origin pools, they can be integrated into the same "Virtual Site". This flexibility caters to customers' traditional preferences, such as deploying two CE nodes per location, which is fully supported by this clustering model. Challenges: Lack of VRRP Support: The primary limitation of this clustering method is the absence of VRRP support for internal VIPs. However, there are some alternative methods to distribute internal traffic across CE nodes. These include DNS based routing, BGP with Equal-Cost Multi-Path (ECMP) routing, or the implementation of CEs behind another Layer 4 (L4) load balancer capable of traffic distribution without source address alteration, such as F5 BIG-IPs or the standard load balancers provided by Azure or AWS. Limitation on Custom VIP IP Support: Currently, the F5 Distributed Cloud Console has a restriction preventing the configuration of custom virtual IPs for load balancer advertisements on Virtual Sites. We anticipate this limitation will be addressed in future updates to the F5 Distributed Cloud platform. As a temporary solution, you can advertise the LB across multiple individual sites within the Virtual Site. This approach enables the configuration of custom VIPs on those sites. Requires extra steps for upgrading nodes Unlike the Default clustering model where upgrades can be performed collectively on a group of nodes, this clustering model requires upgrading nodes on an individual basis. This may introduce more steps, especially in larger clusters, but it remains significantly simpler than traditional network device upgrades. Large-Scale Mesh Group: In F5 Distributed Cloud, the "Mesh Group" feature allows for direct connections between sites (whether individual CE sites or clusters of CEs) and other selected sites through IPSec tunnels. For CE clusters, tunnels are established on a per-cluster basis. However, for single-node sites, each node creates its own tunnels to connect with remote CEs. This setup can lead to an increased number of tunnels needed to establish the mesh. For example, in a network of 10 sites configured with dual-CE Virtual Sites, each CE is required to establish 18 IPSec tunnels to connect with other sites, or 19 for a full mesh configuration. Comparatively, a 10-site network using the default clustering method—with a minimum of 3 CEs per site—would only need up to 9 tunnels from each CE for full connectivity. Opting for Virtual Sites with dual CEs, a common choice, effectively doubles the number of required tunnels from each CE when compared to the default clustering setup. However, despite this increase in tunnels, opting for a Mesh configuration with single-node clusters can offer advantages in terms of performance and load distribution. Note: Use DC Groups as an alternative solution to Secure Mesh Group for CE connectivity: For customers with existing private connectivity between their CE nodes, running Site Mesh Group (SMG) with numerous IPsec tunnels can be less optimal. As a more scalable alternative for these customers, we recommend using DC Cluster Group (DCG). This method utilizes IP-in-IP tunnels over the existing private network, eliminating the need for individual encrypted IPsec tunnels between each node and streamlining communication between CE nodes via IP-n-IP encapsulations. Configuration Steps The configuration for creating single node clusters involves the following steps: Creating a Label Creating a Virtual Site Applying the label to the CE nodes (sites) Review and validate the configuration The detailed configuration guide for the above steps can be found here: https://docs.cloud.f5.com/docs/how-to/fleets-vsites/create-virtual-site Example Configuration: In this example, you can create a label called "my-vsite" to group CE nodes that belong to the same Virtual Site. Within this label, you can then define different values to represent different environments or clusters, such as specific Azure region or an on-premise data center. Then a Virtual Site of “CE” type can be created to represent the CE cluster in “Azure-AustraliaEast-vSite" and tied to any CE that is tagged with the label “my-vsite=Azure-AustraliaEast-vSite”: Now, any CE node that should join the cluster (Virtual Site), should get this label: Verification: To confirm the Virtual Site configuration is functioning as intended, we joined two CEs (k1-azure-ce2 and k1-azure-ce03) into the Virtual Site and evaluated the routing and load balancing behavior. Test 1: Public Load Balancer (Virtual Site referenced in the pool) The diagram shows a public "Load Balancer" advertised on the RE referencing a pool that uses the newly created Virtual Site to access the private application: As shown below, the pool member was configured to be accessed through the Virtual Site: Analysis of the request logs in the Performance dashboard confirmed that all requests to the public website were evenly distributed across both CEs. Test 2: Internal Load Balancer (LB advertised on the Virtual Site) We deployed an internal Load Balancer and advertised it on the newly created Virtual Site, utilizing the pool that also references the same Virtual Site (k1-azure-ce2 and k1-azure-ce03). As shown below, the Load Balancer was configured to be advertised on the Virtual Site. Note: Here we couldn't use a "shared" custom VIP across the Virtual Site due to a current platform constraint. If a custom VIP is required, we should use "site" as opposed to "Virtual Site" and advertise the Load Balancer on all sites, like below picture: Request logs revealed that when traffic reached either CE node within the Virtual Site, the request was processed and forwarded locally to the pool member. In the example below: src_site: Indicates the CE (k1-azure-ce2) that processed the request. src_ip: Represents the client's source IP address (192.168.1.68). dst_site: Indicates the CE (k1-azure-ce2) from which the pool member is accessed. dst_ip: Represents the IP address of the pool member (192.168.1.6). Resilience Testing: To assess the Virtual Site's resilience, we intentionally blocked network access from k1-azure-ce2 CE to the pool member (192.168.1.6). The CE automatically rerouted traffic to the pool member via the other CE (k1-azure-ce03) in the Virtual Site. Note: By default, CEs can communicate with each other via the F5 Global Network. This can be customized to use direct connectivity through tunnels if the CEs are members of the same DC Cluster Group (IP-n-IP tunneling) or Secure Mesh Groups (IPSec tunneling). The following picture shows the traffic flow via F5 Global Network. The following picture shows the traffic flow via the IP-n-IP tunnel when a DC Clustering Group (DCG) is configured across the CE nodes. Failover Behaviour When a CE node is tied to a Virtual Site, all internal Load Balancers (VIPs) advertised on that Virtual Site will be deployed in the CE. Additionally, the Regional Edge (RE) begins to use this node as one of the potential next hops for connections to the origin pool. Should the CE become unavailable, or if it lacks the necessary network access to the origin server, the RE will almost seamlessly reroute connections through the other operational CEs in the Virtual Site. Uncontrolled Failover: During instances of uncontrolled failover, such as when a node is unexpectedly shut down from the hypervisor, we have observed a handful of new connections experiencing timeouts. However, these issues were resolved by implementing health checks within the origin pool, which prevented any subsequent connection drops. Note: Irrespective of the clustering model in use, it's always recommended to configure health checks for the origin pool. This practice enhances failover responsiveness and mitigates any additional latency incurred during traffic rerouting. Controlled Failover: The moment a CE node is disassociated from the Virtual Site — by the removal of its label— the CE node will not be used by RE to connect to origin pools anymore. At the same time, all Load Balancer listeners associated with that Virtual Site are withdrawn from the node. This effectively halts traffic processing for those applications, preventing the node from receiving related traffic. During controlled failover scenarios, we have observed seamless service continuity on externally advertised services (to REs). On-Demand Scaling: F5 Distributed Cloud provides a flexible solution that enables customers to scale the number of active CE nodes according to demand. This allows you to easily add more powerful CE nodes during peak periods (such as promotional events) and then remove them when demand subsides. With the Virtual Sites method, you can even mix and match node sizes within your cluster (Virtual Site), providing granular control over resources. It's advisable to monitor CE node performance and implement node related alerts. These alerts notify you when nodes are operating at high capacity, allowing for timely addition of extra nodes as needed. Moreover, you can monitor node’s health in the dashboard. CPU, Memory and Disk utilizations of nodes can be a good factor in determining if more nodes are needed or not. Furthermore, the use of Virtual Sites makes managing this process even easier, thanks to labels. Node Based Alerts: Node-based alerts are essential for maintaining efficient CE operations. Accessing the alerts in the Console: To view alerts, go to Multi-Cloud Network Connect > Notifications > Alerts. Here, you can see both "Active Alerts" and "All Alerts." Alerts related to node health fall under the "infrastructure" alert group. The following screenshot shows alerts indicating high loads on the nodes. Configuring Alert Policies: Alert policies determine the notification process for raised alerts. To set up an alert policy, navigate to Multi-Cloud Network Connect > Alerts Management > Alert Policies. An alert policy consists of two main elements: the alert receiver configuration and the policy rules. Configuring Alert Receiver: The configuration allows for integration with platforms like Slack and PagerDuty, among others, facilitating notifications through commonly used channels. Configuring Alert Rules: For alert selection, we recommend configuring notifications for alerts with severity of “Major” or “Critical” at a minimum. Alternatively, the “infrastructure” group which includes node-based alerts can be selected. Comparison Table Criteria Default Cluster Single Node HA Minimum number of nodes in HA 3 2 Upgrade operations Per cluster Per Node Network redundancy and client side routing for east-west traffic VRRP, BGP, DNS, L4/7 LB DNS, L4/7 LB, BGP* Tunnels to RE 2 tunnels per cluster 2 tunnels per node Tunnels to other CEs (SMG or DCG) 1 tunnel from each cluster 1 tunnel from each node External traffic processing Limited to 2 nodes All nodes will be active Internal traffic processing All nodes can be active All nodes can be active Scale management in Public Cloud Sites Straightforward, by configuring ingress interfaces in Azure/AWS/GCP sites Straightforward, by adding or removing the labels Scale management in Secure Mesh Sites Requires reconfiguring the cluster (secure mesh site) - may cause interruption Straightforward, by adding or removing the labels Custom VIP IP Available Not Available (Planned to be available in future releases), workaround available. Node sizes All nodes should be same size. Upgrading node size in a cluster is a disruptive operation. Any node sizes or clusters can join the Virtual Site * When using BGP, please note a current limitation that prevents configuring custom VIP address on the Virtual Site. Conclusion: F5 Distributed Cloud offers a flexible approach to High Availability (HA) across CE nodes, allowing customers to select the redundancy model that best fits their specific use cases and requirements. While we continue to advocate for default clustering approach due to their operational simplicity and shared VRRP VIP or, unified network configuration benefits, especially for routine tasks like upgrades, the Virtual Site and single node HA model presents some great use cases. It not only addresses the limitations and challenges of the default clustering model, but also introduces a solution that is both scalable and adaptable. While Virtual Sites offer their own benefits, we recognize they also present trade-offs. The overall benefits, particularly for scenarios demanding high ingress (RE to CE) throughput and controlled failover capabilities cater to specific customer demands. The F5 product and development team remains committed to addressing the limitations of both default clustering and Virtual Sites discussed throughout this article. Their focus is on continuous improvement and finding the solutions that best serve our customers' needs. References and Additional Links: Default Clustering model: https://docs.cloud.f5.com/docs/ves-concepts/site#cluster-of-nodes Configuration guide for Virtual Sites: https://docs.cloud.f5.com/docs/how-to/fleets-vsites/create-virtual-site Routing Options for CEs: https://community.f5.com/kb/technicalarticles/f5-distributed-cloud---customer-edge-site---deployment--routing-options/319435 Configuration guide for DC Clustering Group: https://docs.cloud.f5.com/docs/how-to/advanced-networking/configure-dc-cluster-group4KViews8likes2CommentsWhat is Multi-Cloud Networking?
What is Multi-Cloud Networking? Multi-cloud networking (MCN), as a technology, aims to provide easy network connectivity between cloud environments. For the purpose of our definition, we need to imagine our datacenter as a cloud. You can loosely define a cloud environment as 'anywhere you run workloads.' The concept is nebulous… literally. Clouds come in all shapes and sizes, from 'running on Pi' to AWS / GCP / Azure. MCN is to clouds, as Internet is to network. AWS’s DirectConnect, Azure ExpressRoute and GCP’s Direct Link were early forms of MCN, aimed at joining portions of their own clouds together with customer datacenters. Insertion of transport virtual appliances in clouds has become another mechanism for MCN through time. Its strength is its flexibility and agility. One other notable MCN concept is the transport provider. Some circuit providers offer 'short-hop' transport to various cloud providers by routing. This option offers significant throughput versus the SDN router but lacks the agility. This is a popular option for hybrid cloud enterprises. With all of these options, you can make individual connections to each cloud, potentially in a hub and spoke fashion or full mesh. Challenges With Multi-Cloud Networking The top-most concern should be scalability, in every way. You need to be concerned about scale in routing, licensing, metered cloud costs, not to mention the knowledge to understand all of the nuance features of each cloud provider and so many more things. All of this is operational overhead, which can be significant. Another serious challenge is IP addressing. The sheer volume of it is one thing. Anyone who works with modern applications today can tell you that it's hard to even find a workload sometimes, with how massive things can get. DNS is one possible option to assist, but you've got to account for all of the native cloud workloads, too.. with their different DNS interfaces. Another common challenge is IP overlap. If you're curious what I mean, lets say your employer acquires a piece of software that lives in GCP, but you're already in AWS. You start going down the path of routing when you suddenly notice that both cloud environments are 10.1.x.x/16. This means localized routing all over the place and we know how much router people love one-offs, am I right? The next challenge is one I've already hinted at: How many indepth nerd knobs do you want to know by how many security vendors? You've got to strategize to minimize this sort of potential sprawl and standardize on the vendors that can do the most for you. Advantages of Multi-Cloud Networking The greatest advantage is really multi-cloud transit. Understanding so many different and new technologies is a daunting task. With multi-cloud transit, data centers route through the same SDN routers as your cloud application flows, allowing you to see each cloud provider as a metered resource for app consumption. No need to worry about addressing, DNS, or routing for each environment. Another substantial benefit is the enablement of a shared security model. When you can route between these environments, you can also easily aggregate logs, integrate with SIEMs and manage automated security policies with ease. Network fluidity is another substantial benefit. When your COO comes to you and says that you need to integrate a newly acquired network segment, you have no problems. One of the very cool benefits of SDN is the ability to route by software object. When we think of routing in traditional networks we want our packet to get to 10.10.10.4 by way of 192.168.3.1, but an SDN router sends our packet to 10.10.10.4 by way of f5xc_gcp_router4. This also means that your app developer can stamp out their app in AWS to send another packet to 10.10.10.4 by way of f5xc_aws_router16 or such. Overlap no longer matters when you route through an SDN core. Conclusion Giving your modern application networks the flexibility to grow on demand, to assimilate new application network segments in minutes instead of months... Ultimately, I really believe that MCN - when done right - like Chuck Mangione said (well, with a flugel horn), 'Feels So Good.' The designs you can build with it are SO much more scalable and translate everything from physical data centers to clouds in a clean, easy to manage fashion.4.1KViews8likes0CommentsUsing F5 Distributed Cloud private connectivity orchestration for secure multi-cloud infrastructure
Introduction Enterprise businesses use modern apps that access services in many locations. Users running productivity apps, like Office365, must connect to services in the cloud from on-prem locations. To keep this running well, enterprises must provide connectivity that’s fast, reliable, and private. Traditionally, it has taken many steps to create private connections to a public cloud subscription and route application specific traffic to it. F5 Distributed Cloud Platform orchestrates ExpressRoutes in Azure and Direct Connect services in AWS, eliminating many of the steps needed for routing end-to-end. Distributed Cloud private connectivity orchestration makes it easier than ever to connect and configure routing over existing private and dedicated circuits from on-prem locations to cloud services running in AWS and in Azure. The illustration below outlines the basic components to an ExpressRoute service in Azure but there’s a lot more you’ll need to know about just under the cover. Without orchestration, many steps are needed to enable routing between on-prem sites and Azure. This requires expert knowledge of Azure Networking, numerous dependent resources to be built, and advanced routing protocols knowledge -- specifically the Border Gateway Protocol (BGP). Extend on-prem network to a colo provider Create and provision the ExpressRoute Circuit Create a Virtual Network Gateway Create a connection between ExpressRoute Circuit & Virtual Network Gateway (VNG) Configure a Route Server to propagate routes between VNG and on-prem Configure user-defined routes on each subnet on each VNet in Azure Using Distributed Cloud to orchestrate ExpressRoutes in Azure and Direct Connect in AWS, the total number of steps is effectively reduced to just an essential few. Additional benefits include no longer needing to be an expert in Azure Networking or in BGP routing, and you get the ability to control connectivity with intent-based policies natively built into the Distributed Cloud Platform. An example of an intent-based policy is to configure VNet tagging in Azure to use with a firewall policy that just allow access to specific apps or by select users. Additional policies that support tagging include Distributed Cloud WAAP and Distributed Cloud App Infrastructure Protection. The following details cover the key components needed to support direct connectivity and show how to create the services and deploy a privately routed app in Distributed Cloud. Building ExpressRoute to Azure Extend on-prem network to a colo provider Create and provision the ExpressRoute Circuit Enable the ExpressRoute orchestration feature on an Azure VNet Site configured in Distributed Cloud To create an ExpressRoute orchestrated configuration in Distributed Cloud, navigate to Multi-Cloud Network Connect > Site Management > Azure VNET Sites > Add Azure VNET Site or Manage Configuration for an existing Site. Enter the required parameters, and when you reach the “Ingress Gateway or Ingress/Egress Gateway”, select “Ingress/Egress Gateway (Two Interface) …””. Here you have the option to deploy on a Recommended Region or an Alternate Region. This selection depends entirely on your business’ cloud deployment model. After choosing the model that best fits your environment, configure the number of Availability Zones for the Gateway and subnets (new/existing) that it will join and Apply the settings. Now scroll down to Advanced Options (enabling Advanced Fields) and Select VNet type: Hub VNet. Click “View Configuration”, and any existing VNet’s from your Azure Subscription that should inherit orchestrated routing. Next, change the “Express Route Configuration” to Enabled to expand the dropdown to access the ExpressRoute Circuit and Virtual Network Gateway settings. Under “* Connections”, add the ExpressRoute Circuit configuration for your Azure subscription(s). The required fields are the Name and the Express Route Circuit, this is the Resource ID for the circuit in Azure. Note: When configuring more than one circuit, you may want to also configure the Routing Weight for circuit preference. When configuring an express route circuit from another subscription (not shown below), you’ll also need an Authorization Key. For ease of deployment, it’s recommended to use the default values for the remaining fields, including for the Gateway SKU, Subnet for Azure VNet Gateway, and Subnet for Azure Route Server, including ASN Configuration for BGP between Site and Azure Route Servers. After the configuration is fully saved and deployed, with site status Applied on the Cloud Sites page, all resources in Azure will now be set to use ExpressRoute Circuit(s) for all designated L3 routed traffic. Next, we’ll configured the orchestration of Direct Connect in an AWS VPC connected site. AWS TGW connected sites are also support. Building Direct Connect for AWS To create a Direct Connect orchestrated configuration in Distributed Cloud, navigate to Multi-Cloud Network Connect > Site Management > AWS VPC Sites > Add AWS VPC Site or Manage Configuration for an existing Site. Enter the required parameters, and when you reach the “Ingress Gateway or Ingress/Egress Gateway”, choose the form factor that meets your deployment requirements. Scroll down to Advanced Configuration, enable Advanced Fields, and then Enable Direct Connect. Configuring the Direct Connect connection feature, choose either Hosted VIF or Standard VIF mode. Use Hosted VIF when you’re already using the Direct Connect connection for other purposes in AWS or when the VIF is in another AWS subscription. Otherwise, choosing Standard VIF allows Distributed Cloud to automatically create the VIF, and dependent services in AWS mentioned below to access the Direct Connect connection. Standard VIF mode creates the following additional resources in AWS: Virtual Gateway (VGW): associating it to the VPC and enabling route propagation to inside route tables Direct Connect Gateway (DCG): associating it to the VGW Note: In Standard VIF mode, at the end of the deployment admins may copy the direct connect gateway ID and use it to create other VIF’s. Admins may also copy the ASN. This is the AWS side of the ASN that’s needed by network ops teams to configure BGP peering. Note: In Hosted VIF mode, independent site deployment is responsible for: Creating VGW, and associating it to the VPC and enabling route propagation to inside route tables Creating DCG and associating it to the VGW Accepting the Hosted VIF and linking to the DCGW VIF Optionally, you may configure the Custom ASN if needed to work with an existing BGP configuration or choose Auto to let Distributed Cloud figure it out. Apply the config, save changes, and exit to the general Sites page. After the configuration is fully saved and deployed and having site status Applied on the Cloud Sites page, all resources in AWS will now be set to use the Direct Connect Gateway for all designated L3 routed traffic. Adding Private Connectivity On-Prem The final part to this deployment is routing both the ExpressRoute and Direct Connect circuits to an on-prem site. Both circuits must terminate at a colo space, and then standard IT/NetOps teams handle the routing outside the realm of Distributed Cloud to the destination. Building a Distributed App w/ Private Connectivity With Distributed Cloud having orchestrated the routing to each site’s workload, and IT/NetOps configured routing on-prem, including propagating the on-prem routes on BGP, an app with components that work independently can now be accessed as one unified interface. An example of a distributed app that run perfectly in this environment is the demo app, Arcadia Finance. This app has four components: Main – Frontend Web interface API – An App module accessed by Main to support money transfers Refer-A-Friend (Not used) – An App module interface accessed by Main to invite friends Backend – A DB server that stores money transfer accounts used by the API module, stock portfolio positions used by the Main module, and email addresses saved by the Refer-A-Friend module. Functionally, the connection flow is as follows: Users access a VIP advertised by an F5 Global Network Regional Edge to the Internet User traffic is connected to the Main (frontend) app running in AWS via the F5 Global Network Main App connects to API in Azure to load the money-transfer side frame, and then to the Backend DB on-prem to load the stocks portfolio balances. These connections transit the private connectivity links created in this article. API App in Azure connects to the Backend DB on-prem to retrieve money transfer accounts. This connection transits the private connectivity links created in this article. To support this topology and configuration, the apps are divided and run as follows: AWS Frontend (nginx) Main (Web) Refer-A-Friend Azure API (App) On-Prem Backend (DB) To make the app reachable to users, use the Distributed Cloud console Sites Distributed Apps feature to create one HTTP Load Balancer with the VIP advertised to the Internet, and with the origin pool of the Frontend (nginx) app. Note: This step assumes that you have previously created a fully connected AWS CE Site with connectivity to your VPC’s and a Direct Connect circuit in the section above. Navigate to Multi-Cloud App Connect > Manage > Load Balancers > Origin Pools, create a new origin pool. In the pool creation menu, at the top, select “JSON” and change the format to YAML, then paste the following example, changing the specific values, such as the namespace, to match your environment: metadata: name: mcn-aws-workload-pool namespace: mcn-privatelinks labels: ves.io/app_type: arcadia annotations: {} disable: false spec: origin_servers: - private_ip: ip: 10.100.2.238 site_locator: site: tenant: acmecorp-tnxbsial namespace: system name: soln-eng-aws-dc kind: site inside_network: {} labels: {} no_tls: {} port: 8000 same_as_endpoint_port: {} loadbalancer_algorithm: LB_OVERRIDE endpoint_selection: LOCAL_PREFERRED With the origin pool created, navigate to Distributed Apps > Manage > Load Balancers > HTTP Load Balancers, and add a new one with the following YAML provided as an example: metadata: name: mcn-arcadia-frontend namespace: mcn-privatelinks labels: ves.io/app_type: arcadia annotations: {} disable: false spec: domains: - mcn-arcadia-frontend.demo.internal http: dns_volterra_managed: false port: 80 downstream_tls_certificate_expiration_timestamps: [] advertise_on_public_default_vip: {} default_route_pools: - pool: tenant: acmecorp-tnxbsial namespace: mcn-privatelinks name: mcn-aws-workload-pool kind: origin_pool weight: 1 priority: 1 endpoint_subsets: {} Internally verify end-to-end connectivity Opening a command line shell to the Frontend Web App, a variation of traceroute with the tool hping3 and using curl, reveals each hop identified as privately connected along with connectivity established directly to the destination working without an intermediary. The following IP addresses are used to support a TCP connection from the Frontend (Web) app running in AWS to the API app running in Azure: 10.100.2.238 (Source): Frontend (Web) 172.18.0.1: Container host node 192.168.1.6: AWS Direct Connect Gateway 192.168.1.5: On-Prem router 192.168.1.22: Azure ExpressRoute Circuit endpoint 10.101.1.5 (Destination): App (API) In the CLI output, note each hop and the value in the HTTP Response header “Server”: root@1e40062cb314:/etc/nginx# hping3 -ST -p 8080 api HPING api (eth2 10.101.1.5): S set, 40 headers + 0 data bytes hop=1 TTL 0 during transit from ip=172.18.0.1 name=UNKNOWN hop=1 hoprtt=7.7 ms hop=2 TTL 0 during transit from ip=192.168.1.6 name=UNKNOWN hop=2 hoprtt=31.4 ms hop=3 TTL 0 during transit from ip=192.168.1.5 name=UNKNOWN hop=3 hoprtt=67.3 ms hop=4 TTL 0 during transit from ip=192.168.1.22 name=UNKNOWN hop=4 hoprtt=67.2 ms ^C --- api hping statistic --- 8 packets transmitted, 4 packets received, 50% packet loss round-trip min/avg/max = 7.7/43.4/67.3 ms root@1e40062cb314:/etc/nginx# curl -v api:8080 * Rebuilt URL to: api:8080/ * Hostname was NOT found in DNS cache * Trying 10.101.1.5... * Connected to api (10.101.1.5) port 8080 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.35.0 > Host: api:8080 > Accept: */* > < HTTP/1.1 200 OK * Server nginx/1.18.0 (Ubuntu) is not blacklisted < Server: nginx/1.18.0 (Ubuntu) < Date: Thu, 12 Jan 2023 18:59:32 GMT < Content-Type: text/html < Content-Length: 612 < Last-Modified: Fri, 11 Nov 2022 03:24:47 GMT < Connection: keep-alive < ETag: "636dc07f-264" < Accept-Ranges: bytes Conclusion As more services continue to be deployed to and run in the cloud, dedicated, reliable, and secure private connectivity is increasingly required by Enterprises. Establishing connectivity is not a rudimentary task and requires the assistance of many hands in different departments. Distributed Cloud private connectivity orchestration helps streamline this process by eliminating many of the steps required in each cloud provider, including no longer requiring dedicated cloud and routing protocol experts just to configure these services manually. To see all of this in action and to see how all the parts come together, watch the following video, a companion to this article. Visit the following resources for more information about this feature and other Distributed Cloud services: Multi-Cloud Network Connect Product Information Direct Connect orchestration for AWS TGW Sites Direct Connect orchestration for AWS VPC Sites ExpressRoutes orchestration for Azure VNet Sites YouTube Video3KViews8likes0Comments