The Power of &: F5 Hybrid DNS solution
While some organizations prioritize the advantages of a SaaS solution like scalability, others value the benefits of an on-premises solution, such as data control and migration flexibility. This is why having the option to deploy a hybrid model can be beneficial, not just for redundancy, but also for allowing organizations to blend the best of both worlds. Understanding theArchitecture’scomponents F5 BIG-IP DNS - (formerly BIG-IP GTM) is a well-known on-premise solution for delivering high-performance DNS services such as DNSExpress and DNS Caching. It is also recognized for offering intelligent DNS responses that are based on various factors such as LDNS’ Geolocation (GSLB) and health status of applications. F5 Distributed Cloud DNS (F5 XC DNS) – It is F5’s SaaS-based DNS solution which is built on a global data plane, ensuring automatic scalability to meet high-volume demand. Additionally, it also provides GSLB and security such as DNS DoS protection. On the diagram above, BIG-IP DNS will be the hidden primary DNS, acting as the source of truth for DNS records. This setup ensures centralized control and adds an extra layer of security by reducing exposure to potential attacks. F5XC DNS will function as the secondary DNS server, receiving DNS records from BIG-IP via Zone Transfer. It will be responsible for handling public DNS queries and providing domain name resolution services to clients. In the first part of this article, we will show you how to set up and configure BIG-IP DNS as the hidden primary and F5XC DNS as the authoritative secondary DNS server. For some, this setup is sufficient for their requirements, but for others, there may be additional requirements to consider in this hybrid design. In the later part of this article, we will demonstrate how we can address these challenges by leveraging F5's platform features and capabilities! Steps on Implementing F5 Hybrid DNS Solution Step 1: Configure BIG-IP DNS First, we need to configure BIG-IP DNS to be able to perform a zone transfer to F5XC DNS. For more details on the configuration, you can check this link: https://community.f5.com/kb/technicalarticles/configuring-big-ip-for-zone-transfer-and-dnssec/330359 Step 2: Configure F5XC DNS Now after you've configured BIG-IP DNS, we need to configure F5 XC DNS to be a secondary DNS server. For more details, check the steps below: Log into XC Console, selectDNS Managementoption, clickAdd Zone. In Domain Name field, enter the domain/subdomain. In our example, it will bef5sg.com Zone Type:Secondary DNS Configuration Under the Secondary DNS Configuration field, clickConfigure On the DNS primary server IP field, enter the public IP address of the Primary DNS. In our example it will be the Public IP of BIG-IP DNS. On TSIG key, enter the name we used to generate TSIG earlier in BIG-IP. In our example, usedexample. On the TSIG Key algorithm field and select an algorithm from the drop-down. Selecthmac-sha256. Click Configure in the TSIG key value in base 64 format section, On the Secret Type field, selectClear Secretand paste the secret in the Secret field. Use the same secret we generated earlier in BIG-IP DNS. ClickApply. You should see the DNS records transferred from BIG-IP DNS to F5XC DNS Step 3: Configure Domain Registrar In this example, the domain registrar I'm using is Namecheap. I'll configure it so that the authoritative name server for the domain f5sg.com is set to F5XC (ns1.f5clouddns.com and ns2.f5clouddns.com). The steps will vary depending on which domain registrar you are using. Refer to the documentation of your registrar. See the screenshots below for how I configured it in Namecheap. F5 XC DNS should now be able to answer DNS queries since it is set to be the authoritative DNS. Now, let's do some testing! On my local machine, I will perform a dig on the f5sg.com domain. See below: You can see that on the dig result, the NS for f5sg.com is set to ns1.f5clouddns.com & ns2.f5clouddns.com! I can also resolve sales.f5sg.com! We have successfully implemented BIG-IP as Hidden Primary and F5XC as Authoritative Secondary DNS! Challenges and Considerations Now let's discuss the additional requirements or challenges that we might encounter with this hybrid setup solution: Security: We need to comply with security compliance. Nowadays, there are laws requiring the implementation of DNSSEC (DNS Security Extensions). We need to consider this in the design and implement it without adding complexity. Resiliency: Although F5XC DNS Infrastructure is built to be resilient, we still want a backup plan to failover to the BIG-IP Primary DNS in case of unforeseen events. This process will be manual, as we need to change the NS records at the registrar to promote the hidden BIG-IP Primary DNS as the authoritative NS for the domain once F5XC is unavailable. Synchronization: BIG-IP will not be able to synchronize the GSLB functionality with F5XC because Wide-IP records are non-standard and cannot be transferred as part of zone transfers. Solution to Challenges Now comes the fun part: tackling the challenges we’ve laid out! Fortunately, F5 Distributed Cloud is an API-first platform that enables us to automate configuration. At the same time, we have the power of the BIG-IP platform, where you can run custom scripts that will enable us to integrate it with F5XC through API. Solution to Challenge #1: This is easy. DNSSEC records like RRSIG, DNSKEY, DS, NSEC, and NSEC3 are standardized and can be synchronized as part of a zone transfer. Since BIG-IP DNS is our primary DNS and supports DNSSEC, we can enable it. The records will synchronize to F5XC DNS and still respond with signed records, maintaining the integrity and security of our DNS infrastructure. How do you enable it? Check the last part of the technical article below: https://community.f5.com/kb/technicalarticles/configuring-big-ip-for-zone-transfer-and-dnssec/330359 Solution to Challenge #2: We need to automate failover! But when automating tasks, you need two things: a trigger and an action. In our scenario, our trigger should be the availability of F5XC DNS to resolve DNS queries, and the action should be to change our nameserver to BIG-IP on the domain registrar. If you can create and run a script in BIG-IP, it means you can continuously monitor the health of F5XC DNS, allowing us to determine the trigger. But what about the action to change the domain name server records in the registrar? It's easy—check if it can be configured via API, then the problem is solved! Let's explore using Namecheap as our registrar for this example. We will use the BIG-IP EAV (external) monitor to run the script. If you're unfamiliar with the BIG-IP external monitor and its capabilities, check this out →https://my.f5.com/manage/s/article/K71282813 A dummy pool configured with an external monitor will run at intervals. The attached script is designed to monitor F5XC and check if it can resolve DNS queries. If it cannot, the script will trigger an API call to Namecheap (our domain registrar) to change the nameservers back to BIG-IP DNS. Simultaneously, the script will update the domain's NS records from F5XC to BIG-IP. Step 1: Create an external monitor using the custom script. Refer to article K71282813 how to create the external monitor. See the codeshare link for the sample custom script I used: Namecheap and BIG-IP Integration via API | DevCentral Step 2: Create a dummy pool and attach the custom external monitor Let's do some tests! See the results in the later part of this article. Solution to Challenge #3: We can't use Zone Transfer to synchronize GSLB configurations? No problem! Instead, we'll harness the power of APIs. We can run a custom script in BIG-IP to convert Wide-IP configurations into F5XC DNSLB records via API. Let's see below how we can do this. On BIG-IP DNS, configure the zone records for the domainf5sg.comto delegate the subdomains needed for GSLB. For example, we need to perform GSLB forwww.f5sg.com,we will configure the zone like below: www.f5sg.comCNAMEwww.gslb.f5sg.com gslb.f5sg.com NS ns1.f5clouddns.com On BIG-IP we will create Wide-IP configuration forwww.gslb.f5sg.comwhich should hold the A records. These Wide-IP configurations can be converted by a script to F5XC DNSLB configurations. Check the sample script on this codeshare link: BIG-IP Wide-IP to F5XC DNSLB converter | DevCentral Testing and Result Challenge #2:Failover Testing To simulate the scenario in which F5XC is unable to respond to DNS queries, we designed the script to execute a dig command to F5XC for a TXT record. If F5XC responds with "RESPONSE-OK," no further action is needed. However, if it fails to respond correctly or does not respond at all, the script will trigger a failover action. Scenario 1: When F5XC responds to DNS queries(TXT record value is RESPONSE-OK) Namecheap dashboard shows F5XC nameservers BIG-IP DNS zone records shows F5XC nameservers F5XC zone records shows F5XC nameservers Result when performing dig to resolve sales.f5sg.com -> it shows that F5XC nameservers are Authoritative Scenario: When F5XC doesn't respond to DNS queries (TXT record value is RESPONSE-NOT-OK) We changed the TXT record value to 'RESPONSE-NOT-OK,' which should mark the monitor as down. The dummy pool went down, which means the script inside the monitor detected that the dig result was not what it expected. You can see from the zone records below that the NS records have now changed to GTM (gtm1.f5sg.com and gtm2.f5sg.com) When we check our domain registrar, Namecheap, we can see that the nameservers are now automatically set to BIG-IP GTMs. When I issue a dig command from my workstation, I can see that the nameserver responding to my query isgtm1.f5sg.com Online DNS tools (like MXToolbox) also report that gtm1.f5sg.com is the authoritative NS that responds to the DNS queries for sales.f5sg.com, which resolves to 2.2.2.2 We have now solved one of the challenges by implementing a backup failover plan using custom monitors and automations, made possible by the power of BIG-IP and APIs! Challenge #3:Synchronization Testing Using this script, we can convert and synchronize the BIG-IP Wide-IP configuration to its F5XC equivalent configuration Note: The sample script is limited to handling a Wide-IP with a single GTM pool. Inside the pool is where you will define the IP addresses that you want to load balance. The pool load balancing method is also limited to Round Robin, Ratio, Static Persist, and Global Availability. The script is designed to run at intervals. There are several ways to execute it: you can use external monitors (as we did earlier) or utilize a cronjob, etc. For testing and simplicity, I will use a cronjob set to run every 10 minutes. Let's begin creating our GSLB configuration. If you've configured BIG-IP GTM/DNS before, one of the first objects you need to create is a GTM server. I've configured two Generic Servers representing the application in two different Data Centers. Next is we create aGTM Poolwhich we will associate theVirtual Serverinside the GTM server we created earlier. (i.e. I'm assigning 1.1.1.1 and 2.2.2.2 as the members of the pool) Lastly, we will create theWide-IP recordand attach the GTM Pool we created earlier After this, the script should get triggered and convert this BIG-IP DNS Wide-IP configuration into F5XC DNS configuration. We should see that a new Primary Zone will be created in F5XC (gslb.f5sg.com) When you view the resource records, we should see a DNSLB record which has the record name equivalent to the subdomain of the wide-IP record. (BIG-IP DNS Wide-IP record is www.glsb.f5sg.com, In F5XC DNS zonegslb.f5sg.com, the record name iswwwand pointing toDNSLB object) The load balancing rules should have the DNSLB pool(pool-www) which is the equivalent of theGTM Pool(pool_www) configured in BIG-IP DNS TheDNSLB pool memberswill include the same IP addresses we defined asGTM Pool members in BIG-IP DNS. There are four load balancing methods available in F5XC, and there is an equivalent BIG-IP DNS load balancing method. The script was created to match this methods but if you configure the BIG-IP DNS pool load balancing method to something other than these four, it will default to Round Robin. BIG-IP DNS F5XC DNS Round Robin Round-Robin Ratio Ratio-Member Static Persist Static-Persist Global Availability Priority Based on the results above, we have successfully converted and synchronize BIG-IP DNS Wide-IP configuration into F5XC DNSLB records! Conclusion We have resolved DNS challenges using the power and integration of F5 solutions! By utilizing both BIG-IP and F5XC platforms, which can sign and serve DNSSEC records, we can seamlessly implement DNSSEC in a hybrid setup without complexity. Furthermore, our scalable F5XC Cloud DNS will shield you from myriad DNS DoS attacks, which are continually evolving, especially with the rise of AI. In terms of DNS resiliency, with the power of our API-first platforms and automation, we can create a DNS hybrid solution capable of automatically failing over from Cloud DNS to on-prem DNS. Lastly, we can synchronize the configurations of both platforms using standards like Zone Transfer and APIs. This capability allows us to convert and synchronize GSLB configurations between our on-prem DNS and Cloud DNS, making administration easier, and establishing a single source of truth.733Views2likes0CommentsSite-to-Site Connectivity in F5 Distributed Cloud Network Connect – Reference Architecture
Purpose This guide describes the reference architecture for deploying F5 Distributed Cloud’s (XC) Multicloud Network Connect service to interconnect their workload across private connectivity or the internet. It enumerates the options available to an F5 Distributed Cloud user to configure site-to-site connectivity using F5 XC Customer Edges (CEs) and explains them in detail to help the user make informed decisions to choose the correct topology for their use case. Audience This guide is for technical readers, including network admins and architects who want to understand how the Multicloud Network Connect service works and what network topology they must use to interconnect their workloads across data centers, branches, and public clouds. This guide assumes the reader is familiar with networking concepts like routing, default gateway configurations, IPSec and SSL encryption, and private connectivity solutions provided by public clouds like AWS Direct Connect and Azure Express Route. Introduction Ensuring workload reachability across data centers, branches, and/or public cloud can be challenging and operationally complex if done in the traditional way. The network teams must design, configure, and maintain multiple networking and security equipment and need expertise across many vendor solutions that provide functionality like NAT, SD-WAN, VPN, firewalls, access control lists (ACLs), etc. This gets even more nuanced while connecting two networks that have overlapping IP address CIDRs which is often in the case of hybrid cloud deployments and during mergers and acquisitions. F5 XC Multicloud Network Connect provides a simple way to configure these interconnections and manage access and security policies across multiple heterogeneous environments, from a single console. It abstracts the complexities by taking the user intent and automating the underlying networking and security while providing the flexibility to choose to connect over a private network or the public internet. Customer Edge as a Gateway To provide site-to-site reachability and ensure enforcement of the security policies, traffic must flow through the CE site. For this, the CE’s Site Local Inside (SLI) IP address must be used as the default gateway or as the next hop to reach the networks on other sites. Figure:Using CE as the gateway Physical vs. Logical Connectivity Two or more CE sites can be physically connected in multiple ways. But this does not automatically allow networks on different sites to be L3 routable to each other by default. For this, the user must associate networks with segments. The physical connection dictates the path the packets will take while going from one site to the other, and the logical connection connects the VLANs (on-prem) and the VPCs/VNETs (in the public cloud) using network overlay and provides segmentation. Note: Multicloud Network Connect provides Layer 3 connectivity between networks. Configuring L3 connectivity is not required to have app-to-app connectivity across the sites. This is done using the distributed load balancer feature under App Connect. Physical Transit Options Over F5 Global Network Backbone A CE site is always connected to the two nearest Regional Edges (REs) for redundancy, using IPSec or SSL tunnels. The REs across different regions are connected via a global, private F5 backbone network. F5’s global network backbone provides high-speed, private transit across regions where Regional Edges are located. Users can use the CE-RE tunnels over the internet to securely connect to this backbone locally and leverage the private connectivity to connect across regions. Figure: Default CE-CE connectivity over REs and F5 network backbone Pros: No need to manage underlay networking if using CE-RE tunnels over the internet. High-speed private transit between geographically distant regions, at no extra cost. End-to-end encryption of traffic between sites. Option to have end-to-end private connectivity Cons: Throughput is limited to the bandwidth of the two tunnels per site. When To Use: The easiest way to connect when private connectivity is not available between data centers or to the cloud VPCs in the case of hybrid cloud. IPSec/SSL tunnels over the internet are acceptable, but you do not want to manage multiple VPN tunnels or SD-WAN devices. Connecting geographically distant sites and you need better end-to-end latency and reliability than going over the internet. Direct Site-to-Site Over Internet If security regulations prevent the use of F5’s private backbone network, users can connect the CE sites directly to each other using IPSec tunnels (SSL encryption is not supported in this case). This is done using the Site Mesh Group (SMG) feature. Note: Even when the CEs are a part of Site Mesh Group, they will still connect to the REs using encrypted tunnels as this is required for control plane connectivity. When the sites are in an SMG, the data path traffic flows through the CE-CE tunnels as the preferred path. If this link fails, as a backup, the traffic gets routed to the REs and over the F5 network backbone to the other site. The number of tunnels on each link between two CE sites depends on the number of control nodes they have. Two single-node sites are connected using only one tunnel. If any one of these sites has three control nodes, it forms three tunnels to the other site. Figure:Number of tunnels between sites in a SMG Pros: Sites are directly connected, so the data path is not dependent on RE. Easy connectivity over the internet Traffic is always encrypted in transit Eliminates the need to manually configure cross-site VPNs or SD-WAN. Cons: Encryption and decryption require more CPU resources on CE nodes as performance requirements increase. Note: L3 Mode Enhanced Performance can be enabled on the sites to get more performance from available CPU and memory resources, but this must be enabled when the site is used only for L3 connectivity as it reduces the resources available for L7 features. Direct Site-to-Site Over Customer’s Network Backbone The CE-CE tunnels can also be configured to be connected over a private network if end-to-end private connectivity is required. Customer can leverage their existing private connectivity between data centers provisioned using private NaaS providers like Equinix. The sites can either be connected directly using SMG where the connections are encrypted or using the DC Cluster Group (DCG) feature, which connects the sites using IP-in-IP tunnels (no encryption). The number of tunnels on each link between two CE sites depends on the number of control nodes they have. Two single-node sites are connected using only one tunnel. If any one of these sites has three control nodes, it forms three tunnels to the other site. A DCG will give better performance, while an SMG is more secure. Unlike SMG, DCG does not fall back to sending RE-CE tunnels if private connectivity fails. Pros: Data path confined within the customer’s private perimeter. Sites are directly connected, so the data path is not dependent on RE. Option to choose encrypted or unencrypted transit. Simplifies the ACLs on the physical network and allows users to manage segmentation using the F5 XC console. Cons: Customer needs to manage the private connectivity across data centers or from the data center to the public cloud. Direct Site-to-Site Connectivity Topologies For direct site-to-site connectivity, sites can be grouped into Site Mesh Group (SMG) or DC Cluster Group (DCG). These allow sites to connect either in Full Mesh or Hub-Spoke topologies as described below: Full Mesh Site Mesh Group All sites that are part of a full-mesh SMG are connected to every other site using IPSec tunnels, forming a full-mesh topology. Figure:Sites in full mesh Site Mesh Group When To Use: In Hybrid cloud use cases. When all sites have equal functionality (e.g. connecting workloads across data centers). High fault tolerance is required for site-to-site connectivity (not dependent on any one site for transit). Hub-Spoke Site Mesh Group This mode allows the sites to be grouped into a hub SMG and a spoke SMG. The sites within the hub SMG are connected using full mesh topology. The sites within the spoke SMG are connected to the sites in hub SMG only and not to other sites in the spoke SMG. Figure: Sites in Hub-Spoke Site Mesh Group Some characteristics of Hub-Spoke SMG: The hub can have multiple sites for redundancy, but it usually has one site in most customer use cases. A hub site can be a spoke site for a different Hub-Spoke SMG. A CE site can be a spoke for multiple hubs. When To Use: For data center/cloud to edge/branch connectivity use cases. Full Mesh DC Cluster Group DC Cluster Group only supports full mesh topology. Every site in a DCG is connected to every other site using IP-in-IP tunnels. Traffic is not encrypted in transit, but DCG is only supported when sites can be connected over a private network. Figure: Sites in DC Cluster Group When To Use: Connecting VLANs on a data center to VLANs on other data centers or to public cloud VPCs/VNETs, when there is private connectivity between them. When the security regulations allow unencrypted traffic over the private transit. Offline Survivability CE Sites require control plane connectivity to the REs and Global Controller (GC) to exchange routes, renew certificates, and decrypt blindfolded secrets. To enable business continuity during an upstream outage, the Offline Survivability feature can be enabled on all sites in a Full Mesh SMG or DCG. The feature is not supported for Hub-Spoke SMG. With this feature enabled the sites can continue normal operations for 7 days without connecting to the REs and the GC. With the offline survivability feature enabled on a CE site, the local control plane becomes the certificate authority in case of connectivity loss. The decrypted secrets and certificates are cached locally on the CE. So, this feature is not turned on by default to allow the user to decide if enabling it aligns with the security regulations of the company. Logical Connectivity Once the physical transit is configured and the connection topology is chosen the workloads on the networks across the sites can be connected using segments or the applications on one site can be delivered to any other site by configuring a distributed load balancer. Connect Networks - Segmentation Users can create segments and add data center VLANs or public cloud VPCs/VNETs to them. All such networks added to a segment become part of a common routing domain and all workloads on these networks can reach each other using the CE sites as gateways. Users must ensure the networks added to a segment do not overlap. Segments are isolated layer 3 domains. So, workloads on one segment cannot access workloads from other segments by default. However, users can configure Segment Connectors to allow traffic from one segment to another. Figure: Segmentation Connect Applications - Distributed Load Balancing Instead of allowing the workloads to route directly to one another, the user can configure a distributed load balancer to publish a service from one site to other sites. This is done by adding the service endpoints to an origin pool of a load balancer object and advertising it using a custom VIP to another site or multiple sites. This allows the client to connect to the service as a local resource. Using Distributed Load Balancing, an LB admin can configure policies to expose the required API of the application only to the required sites. This reduces the attack surface and increases app security. Figure: Distributed Load Balancing E.g. in the figure above, the on-prem database is advertised to client apps on AWS and Azure which can access the DB using their local VIP, and the on-prem application is advertised to the client on Azure only. Decision Flow to Choose Physical Connectivity Options Once you understand the various physical and logical connectivity options, the below chart can help you to make an informed decision based on the connectivity requirements, infrastructure/platform available, and security restrictions. Once the connectivity is decided, you can choose to connect the network or only publish apps to sites where required based on the application requirements. Related Articles F5XC Site Site Mesh Group DC Cluster Group Segmentation Distributed Load Balancer224Views1like0Comments