Site-to-Site Connectivity in F5 Distributed Cloud Network Connect – Reference Architecture
Purpose
This guide describes the reference architecture for deploying F5 Distributed Cloud’s (XC) Multicloud Network Connect service to interconnect their workload across private connectivity or the internet. It enumerates the options available to an F5 Distributed Cloud user to configure site-to-site connectivity using F5 XC Customer Edges (CEs) and explains them in detail to help the user make informed decisions to choose the correct topology for their use case.
Audience
This guide is for technical readers, including network admins and architects who want to understand how the Multicloud Network Connect service works and what network topology they must use to interconnect their workloads across data centers, branches, and public clouds.
This guide assumes the reader is familiar with networking concepts like routing, default gateway configurations, IPSec and SSL encryption, and private connectivity solutions provided by public clouds like AWS Direct Connect and Azure Express Route.
Introduction
Ensuring workload reachability across data centers, branches, and/or public cloud can be challenging and operationally complex if done in the traditional way. The network teams must design, configure, and maintain multiple networking and security equipment and need expertise across many vendor solutions that provide functionality like NAT, SD-WAN, VPN, firewalls, access control lists (ACLs), etc. This gets even more nuanced while connecting two networks that have overlapping IP address CIDRs which is often in the case of hybrid cloud deployments and during mergers and acquisitions.
F5 XC Multicloud Network Connect provides a simple way to configure these interconnections and manage access and security policies across multiple heterogeneous environments, from a single console. It abstracts the complexities by taking the user intent and automating the underlying networking and security while providing the flexibility to choose to connect over a private network or the public internet.
Customer Edge as a Gateway
To provide site-to-site reachability and ensure enforcement of the security policies, traffic must flow through the CE site. For this, the CE’s Site Local Inside (SLI) IP address must be used as the default gateway or as the next hop to reach the networks on other sites.
Figure: Using CE as the gateway
Physical vs. Logical Connectivity
Two or more CE sites can be physically connected in multiple ways. But this does not automatically allow networks on different sites to be L3 routable to each other by default. For this, the user must associate networks with segments.
The physical connection dictates the path the packets will take while going from one site to the other, and the logical connection connects the VLANs (on-prem) and the VPCs/VNETs (in the public cloud) using network overlay and provides segmentation.
Note: Multicloud Network Connect provides Layer 3 connectivity between networks. Configuring L3 connectivity is not required to have app-to-app connectivity across the sites. This is done using the distributed load balancer feature under App Connect.
Physical Transit Options
Over F5 Global Network Backbone
A CE site is always connected to the two nearest Regional Edges (REs) for redundancy, using IPSec or SSL tunnels. The REs across different regions are connected via a global, private F5 backbone network. F5’s global network backbone provides high-speed, private transit across regions where Regional Edges are located. Users can use the CE-RE tunnels over the internet to securely connect to this backbone locally and leverage the private connectivity to connect across regions.
Figure: Default CE-CE connectivity over REs and F5 network backbone
Pros:
- No need to manage underlay networking if using CE-RE tunnels over the internet.
- High-speed private transit between geographically distant regions, at no extra cost.
- End-to-end encryption of traffic between sites.
- Option to have end-to-end private connectivity
Cons:
- Throughput is limited to the bandwidth of the two tunnels per site.
When To Use:
- The easiest way to connect when private connectivity is not available between data centers or to the cloud VPCs in the case of hybrid cloud.
- IPSec/SSL tunnels over the internet are acceptable, but you do not want to manage multiple VPN tunnels or SD-WAN devices.
- Connecting geographically distant sites and you need better end-to-end latency and reliability than going over the internet.
Direct Site-to-Site Over Internet
If security regulations prevent the use of F5’s private backbone network, users can connect the CE sites directly to each other using IPSec tunnels (SSL encryption is not supported in this case). This is done using the Site Mesh Group (SMG) feature.
Note: Even when the CEs are a part of Site Mesh Group, they will still connect to the REs using encrypted tunnels as this is required for control plane connectivity.
When the sites are in an SMG, the data path traffic flows through the CE-CE tunnels as the preferred path. If this link fails, as a backup, the traffic gets routed to the REs and over the F5 network backbone to the other site.
The number of tunnels on each link between two CE sites depends on the number of control nodes they have. Two single-node sites are connected using only one tunnel. If any one of these sites has three control nodes, it forms three tunnels to the other site.
Figure: Number of tunnels between sites in a SMG
Pros:
- Sites are directly connected, so the data path is not dependent on RE.
- Easy connectivity over the internet
- Traffic is always encrypted in transit
- Eliminates the need to manually configure cross-site VPNs or SD-WAN.
Cons:
- Encryption and decryption require more CPU resources on CE nodes as performance requirements increase.
Note: L3 Mode Enhanced Performance can be enabled on the sites to get more performance from available CPU and memory resources, but this must be enabled when the site is used only for L3 connectivity as it reduces the resources available for L7 features.
Direct Site-to-Site Over Customer’s Network Backbone
The CE-CE tunnels can also be configured to be connected over a private network if end-to-end private connectivity is required. Customer can leverage their existing private connectivity between data centers provisioned using private NaaS providers like Equinix.
The sites can either be connected directly using SMG where the connections are encrypted or using the DC Cluster Group (DCG) feature, which connects the sites using IP-in-IP tunnels (no encryption).
The number of tunnels on each link between two CE sites depends on the number of control nodes they have. Two single-node sites are connected using only one tunnel. If any one of these sites has three control nodes, it forms three tunnels to the other site.
A DCG will give better performance, while an SMG is more secure. Unlike SMG, DCG does not fall back to sending RE-CE tunnels if private connectivity fails.
Pros:
- Data path confined within the customer’s private perimeter.
- Sites are directly connected, so the data path is not dependent on RE.
- Option to choose encrypted or unencrypted transit.
- Simplifies the ACLs on the physical network and allows users to manage segmentation using the F5 XC console.
Cons:
- Customer needs to manage the private connectivity across data centers or from the data center to the public cloud.
Direct Site-to-Site Connectivity Topologies
For direct site-to-site connectivity, sites can be grouped into Site Mesh Group (SMG) or DC Cluster Group (DCG). These allow sites to connect either in Full Mesh or Hub-Spoke topologies as described below:
Full Mesh Site Mesh Group
All sites that are part of a full-mesh SMG are connected to every other site using IPSec tunnels, forming a full-mesh topology.
Figure: Sites in full mesh Site Mesh Group
When To Use:
- In Hybrid cloud use cases.
- When all sites have equal functionality (e.g. connecting workloads across data centers).
- High fault tolerance is required for site-to-site connectivity (not dependent on any one site for transit).
Hub-Spoke Site Mesh Group
This mode allows the sites to be grouped into a hub SMG and a spoke SMG. The sites within the hub SMG are connected using full mesh topology. The sites within the spoke SMG are connected to the sites in hub SMG only and not to other sites in the spoke SMG.
Figure: Sites in Hub-Spoke Site Mesh Group
Some characteristics of Hub-Spoke SMG:
- The hub can have multiple sites for redundancy, but it usually has one site in most customer use cases.
- A hub site can be a spoke site for a different Hub-Spoke SMG.
- A CE site can be a spoke for multiple hubs.
When To Use:
- For data center/cloud to edge/branch connectivity use cases.
Full Mesh DC Cluster Group
DC Cluster Group only supports full mesh topology. Every site in a DCG is connected to every other site using IP-in-IP tunnels. Traffic is not encrypted in transit, but DCG is only supported when sites can be connected over a private network.
Figure: Sites in DC Cluster Group
When To Use:
- Connecting VLANs on a data center to VLANs on other data centers or to public cloud VPCs/VNETs, when there is private connectivity between them.
- When the security regulations allow unencrypted traffic over the private transit.
Offline Survivability
CE Sites require control plane connectivity to the REs and Global Controller (GC) to exchange routes, renew certificates, and decrypt blindfolded secrets. To enable business continuity during an upstream outage, the Offline Survivability feature can be enabled on all sites in a Full Mesh SMG or DCG. The feature is not supported for Hub-Spoke SMG. With this feature enabled the sites can continue normal operations for 7 days without connecting to the REs and the GC.
With the offline survivability feature enabled on a CE site, the local control plane becomes the certificate authority in case of connectivity loss. The decrypted secrets and certificates are cached locally on the CE. So, this feature is not turned on by default to allow the user to decide if enabling it aligns with the security regulations of the company.
Logical Connectivity
Once the physical transit is configured and the connection topology is chosen the workloads on the networks across the sites can be connected using segments or the applications on one site can be delivered to any other site by configuring a distributed load balancer.
Connect Networks - Segmentation
Users can create segments and add data center VLANs or public cloud VPCs/VNETs to them. All such networks added to a segment become part of a common routing domain and all workloads on these networks can reach each other using the CE sites as gateways. Users must ensure the networks added to a segment do not overlap.
Segments are isolated layer 3 domains. So, workloads on one segment cannot access workloads from other segments by default. However, users can configure Segment Connectors to allow traffic from one segment to another.
Figure: Segmentation
Connect Applications - Distributed Load Balancing
Instead of allowing the workloads to route directly to one another, the user can configure a distributed load balancer to publish a service from one site to other sites. This is done by adding the service endpoints to an origin pool of a load balancer object and advertising it using a custom VIP to another site or multiple sites. This allows the client to connect to the service as a local resource.
Using Distributed Load Balancing, an LB admin can configure policies to expose the required API of the application only to the required sites. This reduces the attack surface and increases app security.
Figure: Distributed Load Balancing
E.g. in the figure above, the on-prem database is advertised to client apps on AWS and Azure which can access the DB using their local VIP, and the on-prem application is advertised to the client on Azure only.
Decision Flow to Choose Physical Connectivity Options
Once you understand the various physical and logical connectivity options, the below chart can help you to make an informed decision based on the connectivity requirements, infrastructure/platform available, and security restrictions.
Once the connectivity is decided, you can choose to connect the network or only publish apps to sites where required based on the application requirements.