reference architecture

4 Topics

Customer Edge Site High Availability for Application Delivery - Reference Architecture
Purpose This guide describes the reference architecture for deploying a highly available F5 Distributed Cloud (F5XC) Customer Edge (CE) site. It explains the networking options available to deploy a highly available multi-node CE site in an on-premise data center, branch location, or on the public cloud when deployed manually. Audience This guide is for technical readers, including NetOps and Solution Architect teams who want to better understand the various options for deploying a highly available F5 Distributed Cloud Customer Edge (CE) site. The guide assumes the reader is familiar with basic networking concepts like routing protocols, DNS, and data center network architecture. Also, the reader must be aware of various F5XC concepts such as Load Balancing, BGP configuration, Sites and Virtual Sites, and Site Local Inside (SLI) and Site Local Outside (SLO) interfaces. Introduction To create a resilient network architecture, all components on the network must be deployed in a redundant topology to handle device and connectivity failures. A CE acts as an L7 gateway and sits in the path of the network traffic, hence it needs a redundant architecture. For a production setup, it is recommended to deploy the site as a three-node cluster. These three nodes are the control nodes. Additional worker nodes can be added for higher L7 and security performance. Clustering on CE Site A CE can be deployed as a multi-node site for redundancy and scaling performance. CE runs kubernetes on its nodes and inherits k8s HA architecture of having either one or three control nodes and optional worker nodes. Production deployments are recommended to have 3 control nodes for redundancy and additional worker nodes to meet the performance requirements of the site. A multi-node site can tolerate one control node failure as it needs at least 2 nodes to form the quorum for HA. It's important to ensure that multiple control nodes don't fail simultaneously in each site. Worker node failures do not cause the whole site to fail. It only reduces the total throughput the site can handle. Note: The control nodes may also be addressed as master nodes in legacy documentation. Although they are called control nodes, they run both the control plane and data plane functions. Figure: CE Clustering In a multi-node setup, two CE control nodes form tunnels to the two closest REs. If one of the control nodes with a tunnel fails, it gets reassigned to the remaining control node. In a single node site, the same node forms tunnels to two different REs. Worker nodes are not supported for sites with a single control node. Figure: RE – CE connectivity CE Site HA Options In a regular deployment, a multi-node CE site is used to achieve redundancy. A load balancer configured on the CE site uses the IP address of SLI, SLO, or both interfaces as VIP by default. But this means the load balancer domain/hostname will need to resolve to multiple IP addresses across the nodes of the CE. To simplify this, F5XC also allows users to specify a custom IPv4 address as the VIP for each load balancer. An alternative topology is to use multiple single-node sites deployed across different availability zones in the data center or public cloud. In this case, the sites can be grouped into a Virtual Site. A load balancer can be configured with a custom VIP advertised to this Virtual Site. Both of these options are explained in detail in the sections below. High Availability Options for Single CE Site This section describes the deployment options available to direct the traffic across the CE nodes and lists the pros and cons of each option. The feasibility of these options may vary by environment (on-premise or public cloud) and networking tools available. These nuances are also explained for each option. For L4 and L7 load balancer VIPs on the CE site, all nodes (control and worker nodes) can actively receive the traffic. The site bandwidth scales linearly with the number of nodes. So, multiple worker nodes can be deployed based on the performance requirements of the site. For public load balancers, the VIP is on RE, and the bandwidth is limited to the bandwidth of the tunnel connecting the two CE control nodes to the REs. Layer 3 Redundancy Using Static Routing With ECMP This is the simplest way to configure redundancy for Load Balancer VIP on the CE cluster. The application admin can configure the LB with a user-specified VIP and the Network admin can configure equal-cost static routes for this VIP IP addresses, with the SLI/SLO IP addresses of the CE nodes as the next hop. The router uses Equal Cost Multi-Path (ECMP) to spread the traffic across the CE nodes. It is recommended to use consistent hashing ECMP configuration on the router to ensure an active session to a CE node is not rehashed in case another node fails. Figure: Static Routing Pros: VIP IP can be from any valid subnet. It is not restricted to the SLO or SLI subnet where it is advertised. Simple L3 routing configuration. Can scale with worker nodes with minimal route configuration change. All active nodes can receive the traffic. Cons: Needs routing configuration changes external to F5XC, every time a new LB VIP is created/deleted. Traffic will get blackholed when a CE node fails, until the node’s route is removed from the route configuration, or the node is restored. When To Use: When the NetOps team does not have access to routing devices with dynamic routing protocol capabilities like BGP. In use cases where the number of load balancers on the site is small and doesn’t change often, the operational overhead of configuring and managing the routes is less. Layer 3 Redundancy Using BGP Routing With ECMP BGP peering can be configured between F5XC CE and the router. This configuration requires LBs to be created with user-specified VIPs. The CE advertises equal cost, /32 routes to the VIP with the SLO/SLI as the next hop. The router uses Equal Cost Multi-Path (ECMP) to spread the traffic across the CE nodes. It is recommended to use sticky/persistent ECMP configuration on the router to ensure an active session to a CE node is not rehashed to a different node in case of a node failure. Note: Separate BGP peers must be configured for VIPs on SLO and SLI. Users can select the peer interface on CE while configuring the peers. For more information check BGP. Figure: BGP Routing Pros: VIP IP can be from any valid subnet. It is not restricted to the SLO or SLI subnet where it is advertised. Can automatically scale with worker nodes. Automatically revokes the route for failed CE node. Faster failover than any other method. All active nodes can receive the traffic. Cons: Needs advanced network configuration on the router. The router must support BGP. When To Use: The site has a large number of load balancers configured. Load balancers are frequently created and deleted. The application requires fast failover and minimal disruption in case of node failure. With network overlay technologies like Cisco ACI. Layer 2 redundancy using VRRP/GARP A user can enable VRRP on the CE site. This configuration requires LBs to be created with user-specified VIPs. Only control nodes participate in the VRRP redundancy group and one of them is elected as the leader for a VIP. VIPs will be randomly placed on different nodes. Only the leader node for the VIP sends out Gratuitous ARP (GARP) broadcasts for the VIP. If the leader node fails, a new leader is elected, and VIP is placed on it. Figure: VRRP/GARP Pros: No network configuration is required external to F5XC. Automatic failover of VIP when VRRP leader node fails. Cons: Only control nodes can receive the traffic. Only one node is actively receiving traffic at a given time. VIPs will be placed on different control nodes randomly. Equal distribution of VIPs across control nodes is not guaranteed. Failover can be slow depending on the ARP resolution time on the network. When To Use: The application team does not have access to routers, DNS servers, or load balancers on the network. (see other deployment options for details) The application does not require high throughput. Some traffic loss can be tolerated in case of node failure (e.g. for non-critical applications) Does not work in public cloud deployments as the cloud networking blocks GARP requests. External Proxy Load Balancing Network admin can configure an external load balancer (LB), with the CE SLO/SLI IP addresses in its origin pool, to spread the traffic across CE nodes. This can be a TCP or HTTP load balancer. For an external TCP LB, the client IP will be lost as the LB will SNAT the request before forwarding it to the CE nodes. F5XC does not support proxy protocol on the client side so it cannot be used to convey the client IP to the load balancer on the CE. For an external HTTP LB, the traffic will still get SNAT-ed, but the client IP can be persisted to the CE nodes if the external LB can add the X-Forwarded-For header to the request. If the LB on the CE site is a HTTPS LB or TCP LB with TLS enabled, the external LB will have to host the TLS certificate as it will terminate the client TLS sessions. A wild card certificate can be used to simplify this issue, but this may not always be a viable option for the applications. In case of public cloud deployments where L2 ARP and routing protocols may not work, in addition to TCP or HTTP load balancers, users also have the option to use Network Load Balancer on AWS and Google Cloud, Standard Load Balancer on Azure or similar feature on other public clouds, that does not SNAT the traffic but just forwards it to the CE nodes just like a router running ECMP. Note: For multi-node public cloud sites created using the F5XC console, it will automatically create the required cloud native LB. But we also support manual CE deployment in the cloud in which case the user will have to create the LB. Figure: External LB Pros: All active nodes can receive the traffic. Health probes can be configured to get F5XC LB health to avoid traffic blackholing. Can scale with worker nodes. Works for public cloud deployments. Cons: Managing certificates on external LB can be operationally challenging for TLS traffic. No source IP retention in the case of TCP LB Adds additional proxy hop. External LB can become a performance bottleneck even if CE is scaled out using worker nodes. When To Use: There is an existing load balancer (usually in DMZ) in the traffic path, but CE is used for additional services like WAAP, DDoS, etc. In public clouds where cloud LBs can be used to load balance to the nodes. DNS Load Balancing Network admins can use DNS to resolve application hostnames to the SLI/SLO IP addresses on the CE Nodes. The DNS can be configured to respond with one IP at a time in a round robin manner. Alternatively, a private DNS LB or Global Server Load Balancer (GSLB) can be used which can make load-based intelligent decisions to distribute the traffic more evenly. User-specified VIPs must not be used in this case as the hostname must resolve to the individual node’s SLI/SLO IP address for the traffic to get routed to the node. Figure: DNS LB Pros: Can be configured using existing DNS servers. Can scale with worker nodes. Works for public cloud deployments. (Not recommended as better options are available) This does not add a L4/L7 hop to the traffic path. Cons: Needs DNS configuration changes external to F5XC, every time a new LB VIP is created/deleted, or a new worker node is added. Traffic will get blackholed when a CE node fails, until the node’s IP is removed from the DNS configuration, or the node is restored. Intelligent distribution of traffic requires GSLB which can be expensive. Subject to DNS cache and TTL causing clients to resolve to a down CE node. When To Use: The application team only has access to GSLB or DNS server and does not want to limit the traffic to only one node at a time as in the case of the VRRP/GARP option High performance is not a requirement as multiple clients may resolve to the same node even if the site has multiple nodes Can be used in the public cloud if the user does not want to create an external LB. High Availability Using Multiple Single Node Sites Across Availability Zones Instead of deploying a single multi-node site, customers can opt to deploy two (or more) single-node sites and use them together (as individual sites or grouped into a Virtual Site) to advertise a VIP. This can be useful if the data center has two AZs so it’s more logical to deploy a CE on each AZ than deploying a three-node CE with one node in one AZ and two nodes in the other. By upgrading one site at a time, it is guaranteed that at least one site will always be online to serve the traffic, providing resiliency against upgrade failures. This is very useful in case of critical applications demanding zero downtime. All the deployment options above, other than the VRRP/GARP method, can be used in this case. It is recommended to use a consistent hash configuration for ECMP on the router to ensure all packets in a TCP session from a client are always routed to the same site. In this deployment, each CE has two tunnels to the nearest REs. Hence, this method is also beneficial when you want to publish an app to the internet using F5XC Regional Edges (REs) as you can scale throughput by adding CEs and hence more tunnels. Note: This is a big advantage of this topology over a multi-node site as the latter is limited to only two tunnels. For Public load balancers, the VIP is on the Regional Edges (REs) on the F5XC global network. The load balancing happens on the REs and the CEs provide secure connectivity with auto SNAT between REs and private origins. So, to get the most out of the available compute, the CEs in this case can be configured for Enhanced L3 performance mode as all the L7 processing happens on the RE. Figure: RE-CE tunnels for multiple single-node site deployment Conclusion This guide should help the reader learn about the various HA options available in F5XC to make an informed decision on which method to choose based on their requirements and the networking tools available. For a more detailed explanation of the above options with config examples also see: F5 Distributed Cloud – CE High Availability Options: A Comparative Exploration Related Articles F5XC Load Balancing and Distributed Proxy Concepts F5XC Virtual Network Concepts F5XC Site BGP Configurations on F5XC F5 Distributed Cloud - Customer Edge Site - Deployment & Routing Options F5 Distributed Cloud - Listener Logic
bhushanpai
Jun 26, 2024 Place Technical Articles
2.1KViews
4likes
0Comments
Intelligent DNS Animated Whiteboard
DNS will become even more important as additional sensors, monitors, actuators and other 'things' connect to the internet. It helps those devices like refrigerators and automobiles get their updates and helps us people find those things in our digital world. Here is a short Whiteboard explaining how F5 can help solve DNS challenges. And check out our Intelligent DNS Scale Reference Architecture which delivers the peace of mind that comes with knowing that your web applications will respond to all DNS queries—keeping your content and applications available to your users wherever and whenever they want to access them. ps Related CloudExpo 2014: The DNS of Things GartnerDC 2013: Intelligent DNS Scale Reference Architecture The DNS of Things DNS Does the Job A Living Architecture Technorati Tags: dns,f5,iot,things,reference architecture,availability,silva Connect with Peter: Connect with F5:
PSilva
Feb 18, 2015 Place Technical Articles
866Views
0likes
0Comments
F5 Synthesis: The Reference Architectures
The next-generation App-Focused, Solution Driven model for supporting all of your business applications. Your business uses countless applications in a given day. At F5, we built a reputation as an industry leader by helping organizations deliver the most secure, fast, and reliable applications to anyone anywhere at any time. Our pioneering focus on application services gives us a unique advantage in designing the solutions that drive business forward. F5 Synthesis isn’t built on new products and features. It’s built on comprehensive solutions. We took a big-picture look at the trends affecting businesses today—from security to mobility to performance and beyond—and designed architectures that pull together specific device, network and application scenarios to help you better identify and understand which solutions meet your network needs. The F5 Synthesis ™ architectural vision helps customer improve service velocity and accelerate time to market through automated provisioning and intelligent service orchestration of application services. The F5 Synthesis elastic, high-performance services fabric reduces the cost and complexity of deploying software defined application services ™ (SDAS ™ ) across all types of systems and environments, including software defined networks (SDN), virtual infrastructures, and cloud. F5’s prescriptive reference architectures, optimized licensing models, and deployment options give organizations the tools to align services with user and business expectations for applications, overcoming persistent IT challenges around availability, optimization, security, and mobility. Here are the Architectural overview videos of the F5 Synthesis Reference Architectures ps Related: f5 Synthesis F5 Introduces Synthesis Architecture F5 Synthesis: Software Defined Application Services F5 Synthesis: The Time is Right F5's Partner Ecosystem Supports Synthesis F5 and Cisco: Application-Centric from Top to Bottom and End to End When Applications Drive the Network F5 Synthesis Aims To Fill SDN Gap F5 Introduces Synthesis App Delivery Architecture for Cloud, Data Center Technorati Tags: synthesis,sdas,big-ip,f5,video,reference architecture Connect with Peter: Connect with F5:
PSilva
Nov 11, 2013 Place Technical Articles
547Views
0likes
3Comments
A Living Architecture
You often hear people say, 'oh, this is a living document,' to indicate that the information is continually updated or edited to reflect changes that may occur during the life of the document. Your infrastructure is also living and dynamic. You make changes, updates or upgrades to address the ever changing requirements of your employees, web visitors, customers, partners, networks, applications and anything else tied to your systems. This is also true for F5's Reference Architectures. They too are living architectures. F5's Reference Architectures are the proof-points or customer scenarios that drive Synthesis to your data center and beyond. When we initially built out these RA's, we knew that they'd be continuously updated to not only reflect new BIG-IP functionality but also show new solutions to the changing challenges IT faces daily. We've recently updated the Intelligent DNS Scale Reference Architecture to include more security (DNSSEC) and to address the highly hybrid nature of enterprise infrastructures with Distributed DNS. F5’s end-to-end Intelligent DNS Scale reference architecture enables organizations to build a strong DNS foundation that maximizes the use of resources and increases service management, while remaining agile enough to support both existing and future network architectures, devices, and applications. It also provides a more intelligent way to respond and scale to DNS queries and takes into account a variety of network conditions and situations to distribute user application requests and application services based on business policies, data center conditions, network conditions, and application performance. It ensures that your customers—and your employees—can access your critical web, application, and database services whenever they need them. In this latest DNS RA rev, DNSSEC can protect your DNS infrastructure, including cloud deployments, from cache poisoning attacks and domain hijacks. With DNSSEC support, you can digitally sign and encrypt your DNS query responses. This enables the resolver to determine the authenticity of the response, preventing DNS hijacking and cache poisoning. Also included is Distributed DNS. Meaning, all the DNS solution goodness also applies to cloud deployments or infrastructures where DNS is distributed. Organizations can replicate their high performance DNS infrastructure in almost any environment. Organizations may have Cloud DNS for disaster recovery/business continuity or even a Cloud DNS service with signed DNSSEC zones. F5 DNS Services enhanced AXFR support offers zone transfers from BIG-IP to any DNS service allowing organizations to replicate DNS in physical, virtual, and cloud environments. The DNS replication service can be sent to other BIG-IPs or other general DNS servers in Data Centers/Clouds that are closest to the users. In addition, Organizations can send users to a site that will give them the best experience. F5 DNS Services uses a range of load balancing methods and intelligent monitoring for each specific app and user. Traffic is routed according to your business policies and current network and user conditions. F5 DNS Services includes an accurate, granular geolocation database, giving you control of traffic distribution based on user location. DNS helps make the internet work and we often do not think of it until we cannot connect to some resource. With the Internet of Nouns (or Things if you like) hot on our heels, I think Port 53 will continue to be a critically important piece of the internet puzzle. ps Related: Intelligent DNS Scale Resources F5 Synthesis DNS Reimagined keeps your Business Online DNS Does the Job The DNS of Things DNS Doldrums The Internet of Things and DNS Technorati Tags: f5,big-ip,dns,reference architecture,dnssec,iot,things,name_resolution,silva,security,cloud,synthesis Connect with Peter: Connect with F5:
PSilva
Jun 03, 2014 Place Technical Articles
404Views
0likes
0Comments