f5 xc
12 TopicsUsing Distributed Cloud DNS Load Balancer with Geo-Proximity and failover scenarios
Introduction To have both high performance and responsive apps available on the Internet, you need a cloud DNS that’s both scalable and one that operates at a global level to effectively connect users to the nearest point of presence. The F5 Distributed Cloud DNS Load Balancer positions the best features used with GSLB DNS to enable the delivery of hybrid and multi-cloud applications with compute positioned right at the edge, closest to users. With Global Server Load Balancing (GSLB) features available in a cloud-based SaaS format, the Distributed Cloud DNS Load Balancer has a number distinct advantages: Speed and simplicity: Integrate with DevOps pipelines, with an automation focus and a rich and intuitive user interface Flexibility and scale: Global auto-scale keeps up with demand as the number of apps increases and traffic patterns change Security: Built-in DDoS protection, automatic failover, and DNSSEC features help ensure your apps are effectively protected. Disaster recovery: With automatic detection of site failures, apps dynamically fail over to individual recovery-designated locations without intervention. Adding user-location proximity policies to DNS load balancing rules allows the steering of users to specific instances of an app. This not only improves the overall experience but it guarantees and safeguards data, effectively silo’ing user data keeping it region-specific. In the case of disaster recovery, catch-all rules can be created to send users to alternate destinations where restrictions to data don’t apply. Integrated Solution This solution uses a cloud-based Distributed Cloud DNS to load balance traffic to VIP’s that connect to region-specific pools of servers. When data privacy isn’t a requirement, catch-all rules can further distribute traffic should a preferred pool of origin servers become unhealthy or unreachable. The following solution covers the following three DNS LB scenarios: Geo-IP Proximity Active/Standby failover within a region Disaster Recovery for manually activated failovers Autonomous System Number (ASN) Lists Fallback pool for automated failovers The configuration for this solution assumes the following: The app is in multiple regions Users are from different regions Distributed Cloud hosts/manages/is delegated the DNS domain or subdomain (optional) Failover to another region is allowed Prerequisite Steps Distributed Cloud must be providing primary DNS for the domain. Your domain must be registered with a public domain name registrar with the nameservers ns1.f5clouddns.com and ns2.f5clouddns.com. F5 XC automatically validates the domain registration when configured to be the primary nameserver. Navigate to DNS Management > domain > Manage Configuration > Edit Configuration >> DNS Zone Configuration: Primary DNZ Configuration > Edit Configuration. Select “Add Item”, with Record Set type “DNS Load Balancer” Enter the Record Name and then select Add Item to create a new load balancer record. This opens the submenu to create DNS Load Balancer rules. DNS LB for Geo-Proximity Name the rule “app-dns-rule” then go to Load Balancing Rules > Configure. Select “Add Item” then under the Load Balancing Rule, within the default Geo Location Selection, expand the “Selector Expression” and select “geoip.ves.io/continent”. Select Operator “In” and then the value “EU”. Click Apply. Under the Action “Use DNS Load Balancer pool”, click “Add Item”. Name the pool “eu-pool”, and under Pool Type (A) > Pool Members, click “Add Item”. Enter a “Public IP”, then click “Apply”. Repeat this process to have a second IP Endpoint in the pool. Scroll down to Load Balancing Method and select “Static-Persist”. Now click Continue, and then Apply to the Load Balancing Rule, and then “Add Item” to add a second rule. In the new rule, choose Geo Location Selection value “Geo Location Set selector”, and use the default “system/global-users”. Click “Add Item”. Name this new pool “global-pool” and add then select “Add Item” with the following pool member: 54.208.44.177. Change the Load Balancing Mode to “Static-Persist”, then click Continue. Click “Continue”. Now set the Load Balancing Rule Score to 90. This allows the first load balancing rule, specific to EU users, to be returned as the only answer for users of that region unless the regional servers are unhealthy. Note: The rule with the highest score is returned. When two or more rules match and have the same score, answers for each rule is returned. Although there are legitimate reasons for doing this, matching more than one rule with the same score may provide an unanticipated outcome. Now click "Apply", “Apply”, and “Continue”. Click the final “Apply” to create the new DNS Zone Resource Record Set. Now click “Apply” to the DNS Zone configuration to commit the new Resource Record. Click “Save and Exit” to finalize everything and complete the DNS Zone configuration! To view the status of the services that were just created, navigate to DNS Management > Overview > DNS Load Balancers > app-dns-rule. Clicking on the rule “eu-pool”, you can find the status for each individual IP endpoint, showing the overall health of each pool’s service that has been configured. With the DNS Load Balancing rule configured to connect two separate regions, when one of the primary sites goes down in the eu-pool users will instead be directed to the global-pool. This provides reliability in the context of site failover that spans regions. If data privacy is also a requirement, additional rules can be configured to support more sites in the same region. DNS LB for Active-Passive Sites In the previous scenario, two members are configured to be equally active for a single location. We can change the weight of the pool members so that of the two only one is used when the other is unhealthy or disabled. This creates a backup/passive scenario within a region. Navigate to DNS Load Balancer Management > DNS Load Balancers. Go to the service name "app-dns-rule", then under Actions, select Manage Configuration. Click Edit Configuration for the DNS rule. Go to the Load Balancing Rules section, and Edit Configuration. On the Load Balancing Rules order menu, go to Actions > Edit for the eu-pool Rule Action. In the Load Balancing Rule menu for eu-pool, under the section Action, click Edit Configuration. In the rule for eu-pool, under Pool Type (A) > Pool Members click the Edit action In the IP Endpoint section, change the Load Balancing Priority to 1, then click Apply. Change the Load Balancing Mode to Priority, then exit and save all changes by clicking Continue, Apply, Apply, and then Save and Exit. DNS LB for Disaster Recovery Unlike with backup/standby where failover can happen automatically depending on the status of a service's health, disaster recovery (DR) can either happen automatically or be configured to require manual intervention. In the following two scenarios, I'll show how to configure manual DR failover within a region, and also how to manual failover outside the region. To support east/west manual DR failover within the EU region, use the steps above to create a new Load Balancing Rule with the same label selector as the EU rule (eu-pool) above, then create a new DNS LB pool (name it something like eu-dr-pool) and add new designated DR IP pool endpoints. Change the DR Load Balancing Rule Score to 80, and then click Apply. On the Load Balanacing Rules page, change the order of the rules and confirm that the score is such that it aligns to the following image, then click Apply, and then Save and Exit. In the previous active/standby scenario the Global rule functions as a backup for EU users when all sites in EU are down. To force a non-regional failover, you can change F5 XC DNS to send all EU users to the Global DNS rule by disabling each of the two EU DNS rule(s) above. To disable the EU DNS rules, Navigate to DNS Load Balancer Management > DNS Load Balancers, and then under Actions, select Manage Configuration. Click Edit Configuration for the DNS rule. Go to the Load Balancing Rules section, and Edit Configuration. On the Load Balancing Rules order menu, go to Actions > Edit for the eu-pool Rule Action. In the Load Balance Rule menu for eu-pool, under the section Action, click Edit Configuration. In the top section labeled Metadata, check the box to Disable the rule. Then click Continue, Apply, Apply, and then Save and Exit. With the EU DNS LB rules disabled, all requests in the EU region are served by the Global Pool. When it's time to restore regional services, all that's needed is to re-enter the configuration rule and uncheck the Disable box to each rule. DNS LB with ASN Lists ASN stands for Autonomous System Number. It is a unique identifier assigned to networks on the internet that operate under a single administration or entity. By mapping IP addresses to their corresponding ASN, DNS LB administrators can manage some traffic more effectively. To configure Distributed Cloud DNS LB to use ASN lists, navigate to DNS Management > DNS Load Balancer Management, then "Managed Configuration" for a DNS LB service. Choose "Add Item", and on the next page, select "ASN List", and enter one or more ASN's that apply to this rule, select a DNS LB pool, and optionally configure the score (weight). When the same ASN exists in multiple DNS LB rules, the rule having the highest score is used. Note: F5 XC uses ASPlain (4-byte) formatted AS numbers. Multiple numbers are configured one per item line. DNS LB with IP Prefix Lists and IP Prefix Sets Intermediate DNS servers are almost always involved in server name resolution. By default, DNS LB doesn't see originating IP address or subnet prefix of the client making the DNS request. To improve the effectiveness of DNS-based services like DNS LB by making more informed decisions about which server will be the closest to the client, RFC 7871 proposes a solution using the EDNS0 field to allow intermediate DNS servers to add to the DNS request the client subnet (EDNS Client Subnet or EDS). The IP Prefix List and IP Prefix Set in F5 XC DNS is used when DNS requests contain the client subnet and the prefix is within one of the prefixed defined in one or more DNS LB rule sets. To configure an IP Prefix rule, navigate to DNS Management > DNS Load Balancer Management, then "Manage Configuration" of your DNS LB service. Now click "Edit Configuration" at the top left corner, then "Edit Configuration" in the section dedicated to Load Balancing Rules. Inside the section for Load Balancing Rules, click "Add Item" and in the Client Selection box choose either "IP Prefix List" or "IP Prefix Sets" from the menu. For IP Prefix List, enter the IPv4 CIDR prefix, one prefix per line. For IP Prefix Sets, you have the option of choosing whether to use a pre-existing set created in the Shared Configuration space in your tenant or you can Add Item to create a completely new set. ::rt-arrow:: Note: IP Prefix Sets are intended to be part of much larger groups of IP CIDR block prefixes and are used for additional features in F5 XC, such as in L7 WAF and L3 Network Firewall access lists. IP Prefix Sets support the use of both IPv4 and IPv6 CIDR blocks. In the following example, the configured IP Prefix rule having client subnet 192.168.1.0/24 get an answer to our eu-dr-pool (1.1.1.1). Meanwhile, a request not having a client subnet in the defined prefix and is also outside of the EU region, get an answer for the pool global-poolx (54.208.44.177). Command line: ; <<>> DiG 9.10.6 <<>> @ns1.f5clouddns.com www.f5-cloud-demo.com +subnet=1.2.3.0/24 in a ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44218 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; CLIENT-SUBNET: 1.2.3.0/24/0 ;; QUESTION SECTION: ;www.f5-cloud-demo.com. IN A ;; ANSWER SECTION: www.f5-cloud-demo.com. 30 IN A 54.208.44.177 ;; Query time: 73 msec ;; SERVER: 107.162.234.197#53(107.162.234.197) ;; WHEN: Wed Jun 05 21:46:04 PDT 2024 ;; MSG SIZE rcvd: 77 ; <<>> DiG 9.10.6 <<>> @ns1.f5clouddns.com www.f5-cloud-demo.com +subnet=192.168.1.0/24 in a ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48622 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; CLIENT-SUBNET: 192.168.1.0/24/0 ;; QUESTION SECTION: ;www.f5-cloud-demo.com. IN A ;; ANSWER SECTION: www.f5-cloud-demo.com. 30 IN A 1.1.1.1 ;; Query time: 79 msec ;; SERVER: 107.162.234.197#53(107.162.234.197) ;; WHEN: Wed Jun 05 21:46:50 PDT 2024 ;; MSG SIZE rcvd: 77 Here we're able to see that a different answer is given based on the client-subnet provided in the DNS request. Additional use-cases apply. The ability to make a DNS LB decision with a client subnet improves ability of the F5 XC nameservers to deliver an optimal response. DNS LB Fallback Pool (Failsafe) The scenarios above illustrate how to designate alternate pools both regional and global when an individual pool fails. However, in the event of a catastrophic failure that brings all service pools are down, F5 XC provides one final mechanism, the fallback pool. Ideally, when implemented, the fallback pool should be independent from all existing pool-related infrastructure and services to deliver a failsafe service. To configure the Fallback Pool, navigate to DNS Management > DNS Load Balancer Management, then "Managed Configuration" of your DNS LB service. Click "Edit Configuration", navigate to the "Fallback Pool" box and choose an existing pool. If no qualified pool exists, the option is available to add a new pool. In my case, I've desginated "global-poolx" as my failsafe fallback pool which already functions as a regional backup. Best practice for the fallback pool is that it should be a pool not referenced elsewhere in the DNSLB configuration, a pool that exists on completely independent resources not regionally-bound. DNS LB Health Checks and Observability For sake of simplicity the above scenarios do not have DNS LB health checks configured and it's assumed that each pool's IP members are always reachable and healthy. My next article shows how to configure health checks to enable automatic failovers and ensure that users always reach a working server. Conclusion Using the Distributed Cloud DNS Load Balancer enables better performance of your apps while also providing greater uptime. With scaling and security automatically built into the service, responding to large volumes of queries without manual intervention is seamless. Layers of security deliver protection and automatic failover. Built-in DDoS protection, DNSSEC, and more make the Distributed Cloud DNS Load Balancer an ideal do-it-all GSLB distributor for multi-cloud and hybrid apps. To see a walkthrough where I configure first scenario above for Geo-IP proximity, watch the following accompanying video. Additional Resources Next article: Using Distributed Cloud DNS Load Balancer health checks and DNS observability More information about Distributed Cloud DNS Load Balancer available at: https://www.f5.com/cloud/products/dns-load-balancer Product Documentation: DNS LB Product Documentation DNS Zone Management6.7KViews4likes0CommentsF5 Distributed Cloud Site Lab on Proxmox VE with Terraform
Overview F5 Distributed Cloud (XC) Sites can be deployed in public and private cloud like VMWare and KVM using images available for download at https://docs.cloud.f5.com/docs/images . Proxmox Virtual Environment is a complete, open-source server management platform for enterprise virtualization based on KVM and Linux Containers. This article shows how a redundant Secure Mesh site protecting a multi-node App Stack site can be deployed using Terraform automation on Proxmox VE. Logical Topology ----------------- vmbr0 (with Internet Access) | | | +----+ +----+ +----+ | m0 | | m1 | | m2 | 3 node Secure Mesh Site +----+ +----+ +----+ | | | ------------------------------------- vmbr1 (or vmbr0 vlan 100) | | | | | | +----+ +----+ +----+ +----+ +----+ +----+ | m0 | | m1 | | m2 | | w0 | | w1 | | w2 | 6 node App Stack Site +----+ +----+ +----+ +----+ +----+ +----+ The redundant 3 node Secure Mesh Site connects via Site Local Outside (SLO) interfaces to the virtual network (vmbr0) for Internet Access and the Secure Mesh Site providing DHCP services and connectivity via its Site Local Inside (SLI) interfaces to the App Stack nodes connected to another virtual network (vmbr1 or vlan tagged vmbr0). Each node from the App Stack site is getting DHCP services and Internet connectivity via the Secure Mesh Sites SLI interfaces. Requirements Proxmox VE server or cluster (this example leverages a 3 node cluster) with a total capacity of at least 24 CPU (sufficient for 9 nodes, 4 vCPU each) 144 GB of RAM (for 9 nodes, 16GB each) ISO file storage for cloud-init disks (local or cephfs) Block storage for VM's and template (local-lvm or cephpool) F5 XC CE Template (same for Secure Mesh and App Stack site, see below on how to create one) F5 Distributed Cloud access and API credentials Terraform CLI Terraform example configurations files from https://github.com/mwiget/f5xc-proxmox-site The setup used to write this article consists of 3 Intel/ASUS i3-1315U with 64GB RAM and a 2nd disk each for Ceph storage. Each NUC has a single 2.5G Ethernet port, which are interconnected via a physical Ethernet Switch and internally connected to Linux Bridge vmbr0. The required 2nd virtual network for the Secure Mesh Site is created using the same Linux Bridge vmbr0 but with a VLAN tag (e.g. 100). There are other options to create a second virtual network in Proxmox, e.g. via Software Defined Networking (SDN) or dedicated physical Ethernet ports. Installation Clone the example repo $ git clone https://github.com/mwiget/f5xc-proxmox-site $ cd f5xc-proxmox-site Create F5 XC CE Template The repo contains helper scripts to download and create the template that can be executed directly from the Proxmox server shell. Modify the environment variable in the scripts according to your setup, mainly the VM template id to one that isn't used yet and the block storage location available on your Proxmox cluster (e.g. local-lvm or cephpool): $ cat download_ce_image.sh #!/bin/bash image=$(curl -s https://docs.cloud.f5.com/docs/images/node-cert-hw-kvm-images|grep qcow2\"| cut -d\" -f2) if test -z $image; then echo "can't find qcow2 image from download url. Check https://docs.cloud.f5.com/docs/images/node-cert-hw-kvm-images" exit 1 fi if ! -f $image; then echo "downloading $image ..." wget $image fi The following script must be executed on the Proxmox server (don't forget to adjust qcow2, id and storage): $ cat create_f5xc_ce_template.sh #!/bin/bash # adjust full path to downloaded qcow2 file, target template id and storage ... qcow2=/root/rhel-9.2024.11-20240523024833.qcow2 id=9000 storage=cephpool echo "resizing image to 50G ..." qemu-img resize $qcow2 50G echo "destroying existing VM $id (if present) ..." qm destroy $id echo "creating vm template $id from $image .." qm create $id --memory 16384 --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci qm set $id --name f5xc-ce-template qm set $id --scsi0 $storage:0,import-from=$qcow2 qm set $id --boot order=scsi0 qm set $id --serial0 socket --vga serial0 qm template $id Create terraform.tfvars Copy the example terraform.tfvars.example to terraform.tfvars and adjust the variables based on your setup: $ cp terraform.tfvars.example terraform.tfvars project_prefix Site and node names will use this prefix, e.g. your initials ssh_public_key Your ssh public key provided to each node pm_api_url Proxmox API URL pm_api_token_id Proxmox API Token Id, e.g. "root@pam!prox" pm_api_token_secret Proxmox API Token Secret pm_target_nodes List of proxmox servers to use iso_storage_pool Proxmox storage pool for the cloud init ISO disks pm_storage_pool Proxmox storage pool for the VM disks pm_clone Name of the created F5 XC CE Template pm_pool resource pool to which the VM will be added (optional) f5xc_api_url https://<tenant>.console.ves.volterra.io/api f5xc_api_token F5 XC API Token f5xc_tenant F5 XC Tenant Id f5xc_api_p12_file Path to the encrypted F5 XC API P12 file. The password is expected to be provided via environment variable VES_P12_PASSWORD Set count=1 for module "firewall" The examples defined in the various toplevel *.tf files use Terraform count meta-argument to enable or disable various site types to build. Setting `count=0` disables it and `count=1` enables it. It is even possible to create multiple sites of the same type by setting count to a higher number. Each site adds the count index as suffix to the site name. To re-created the setup documented here, edit the file lab-firewall.tf and set `count=1` in the firewall and appstack modules. Deploy sites Use terraform CLI to deploy: $ terraform init $ terraform plan $ terraform apply Terraform output will show periodic progress, including site status until ONLINE. Once deployed, you can check status via F5 XC UI, including the DHCP leases assigned to the App Stack nodes: A kubeconfig file has been automatically created and it can be sourced via env.sh and used to query the cluster: $ source env.sh $ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION C NTAINER-RUNTIME mw-fw-appstack-0-m0 Ready ves-master 159m v1.29.2-ves 192.168.100.114 <none> Red Hat Enterprise Linux 9.2024.11.4 (Plow) 5.14.0-427.16.1.el9_4.x86_64 cr i-o://1.26.5-5.ves1.el9 mw-fw-appstack-0-m1 Ready ves-master 159m v1.29.2-ves 192.168.100.96 <none> Red Hat Enterprise Linux 9.2024.11.4 (Plow) 5.14.0-427.16.1.el9_4.x86_64 cr i-o://1.26.5-5.ves1.el9 mw-fw-appstack-0-m2 Ready ves-master 159m v1.29.2-ves 192.168.100.49 <none> Red Hat Enterprise Linux 9.2024.11.4 (Plow) 5.14.0-427.16.1.el9_4.x86_64 cr i-o://1.26.5-5.ves1.el9 mw-fw-appstack-0-w0 Ready <none> 120m v1.29.2-ves 192.168.100.121 <none> Red Hat Enterprise Linux 9.2024.11.4 (Plow) 5.14.0-427.16.1.el9_4.x86_64 cr i-o://1.26.5-5.ves1.el9 mw-fw-appstack-0-w1 Ready <none> 120m v1.29.2-ves 192.168.100.165 <none> Red Hat Enterprise Linux 9.2024.11.4 (Plow) 5.14.0-427.16.1.el9_4.x86_64 cr i-o://1.26.5-5.ves1.el9 mw-fw-appstack-0-w2 Ready <none> 120m v1.29.2-ves 192.168.100.101 <none> Red Hat Enterprise Linux 9.2024.11.4 (Plow) 5.14.0-427.16.1.el9_4.x86_64 cr i-o://1.26.5-5.ves1.el9 Next steps Now it's time to explore the Secure mesh site and the App stack cluster: Deploy a service on the new cluster, create a Load Balancer and Origin pool to expose the service? Need more or less worker nodes? Simply change the worker node count in the lab-firewall.tf file and re-apply via `terraform apply`. Destroy deployment Use terraform again to destroy the site objects in F5 XC and the Virtual Machines and disks on Proxmox with $ terraform destroy Summary This article documented how to deploy a dual nic Secure Mesh site and a multi-node App Stack Cluster via Terraform on Proxmox VE. There are additional examples in secure-mesh-single-nic.tf,secure-mesh-dual-nic.tf and appstack.tf. You can explore and modify the provided modules based on your particular needs. Resources https://github.com/mwiget/f5xc-proxmox-site F5 Distributed Cloud Documentation Terraform Proxmox Provider by Telmate Proxmox Virtual Environment727Views0likes0CommentsNeed help to understand operation between RE and CE ?
Hi all, We have installed CE site in our network and this site has established IPSEC tunnels with RE nodes. The on-prem DC site has workloads (e.g actual web application servers that are serving the client requests). I have citrix netscaler background and the Citrix Netscalers ADCs are configured with VIPs which are the frontend for the client requests coming from outside (internet), when the request land on VIPs, it goes through both source NAT and destination NAT, its source address is changed to private address according to the service where the actual application servers are configured and then sent to the actual application server after changing the destination to IP address of the server. In XC, the request will land to the cloud first because the public IP, which is assigned to us will lead the request to RE. I have few questions regarding the events that will happen from here after Will there going to be any SNAT on the request or will it send it as it is to the site? And if there is SNAT then what IP address will it be ? and will it be done by the RE or on-prem CE There has to be destination NAT. Will this destination NAT is going to be performed by the XC cloud or the request will be sent to the site and site will do the destination NAT ? When the request will land the CE it will be landed in VN local outside so this means that we have to configure the network connector between the VN Local outside and the VN in which the actual workloads are configured, what type of that VN would be ? When the request will be responded by the application server in local on-prem the site the request has to go out to the XC cloud first, it will be routed via IPSEC tunnel so this means that we have to install the network connector between the Virtual network where the workloads are present and site local outside, do we have to install the default route in application VN ? Is there any document, post or article that actually help me to understand the procedure (frankly I read a lot of F5 documents but couldn’t able to find the answers599Views0likes10CommentsConverting a BIG-IP Maintenance Page iRule to Distributed Cloud using App Stack
If you are familiar with BIG-IP, you are probably also familiar with its flexible and robust iRule functionality. In fact, I would argue that iRules makes BIG-IP the swiss-army knife that it is. If there is ever a need for advanced traffic manipulation, you can usually come up with an iRule to solve the problem. F5 Distributed Cloud (XC) has its own suite of tools to help in this regard. If you need to do some sort of traffic manipulation/routing you can usually handle that with Service Policies or simply using Routes. Even with these features, however, there are going to be some cases where iRule functionality from the BIG-IP cannot be reproduced directly in XC. When this happens, we switch to using App Stack, which is XC’s version of a swiss army knife. In this article, I wanted to walk through an example of how you can leverage XC's App Stack for a specific iRule conversion use case: Displaying a Custom Maintenance Page when all pool members are down. For reference, here is the iRule: when LB_FAILED { if { [active_members [LB::server pool]] == 0 } { if { ([string tolower [HTTP::host]] contains "example.com")} { if { [HTTP::uri] ends_with "SystemMaintenance.jpg" } { HTTP::respond 200 content [ifile get "SystemMaintenance.jpg"] "Content-Type" "image/jpg" } else { HTTP::respond 200 content "<!DOCTYPE html> <html lang="en"> <head> <title>System Maintenance</title> <style type="text/css"> .base { font-family: 'Tahoma'; font-size: large; } </style> </head> <body> <br> <center><img alt="sad" height="200" src="SystemMaintenance.jpg" width="200" /></center><br> <center><span class="base">This application is currently under system maintenance.</span></center> <br> <center><span class="base">All services will be back online in a few mintues.</span> </body> </html>" } } } } When dissecting this iRule, you can see we have to solve for the following: Trigger the maintenance page when all pool members are down Serve local files (images, css, etc.) Display the static HTML page So, how do we do this? Well, App Stack allows us to deploy and host a container in Distributed Cloud. So we can easily create a simple container (using NGINX for bonus points!) that contains all these images, stylesheets, HTML files, etc. and manipulate our pools so that it uses this container when required! Let’s deep dive into the step-by-step process… Step by Step Walk-through: Container Creation First, we have to create our container. I'm not going to go too deep into how to create a container in this article, but I will highlight the main steps I took. To start, I simply extracted the HTML from the iRule above and saved all the required files (images, stylesheets, etc.) in one directory. Since I am adding NGINX to the container, I must also create and include a nginx.conf file in this directory. Below was my configuration: worker_processes 1; error_log /var/log/nginx/error.log warn; pid /tmp/nginx.pid; events { worker_connections 1024; } http { client_body_temp_path /tmp/client_temp; proxy_temp_path /tmp/proxy_temp_path; fastcgi_temp_path /tmp/fastcgi_temp; uwsgi_temp_path /tmp/uwsgi_temp; scgi_temp_path /tmp/scgi_temp; include /etc/nginx/mime.types; server { listen 8080; location / { root /usr/share/nginx/html/; index index.html; } location ~* \.(js|jpg|png|css)$ { root /usr/share/nginx/html/; } } sendfile on; keepalive_timeout 65; } There really isn’t much to the NGINX configuration for this example, but keep in mind that you can expand on this and make it much more robust for other use cases. (One note about the configuration above is that you will see /tmp paths mentioned. These are required since our container will run as a non-root user. For more information, see the NGINX documentation here: https://hub.docker.com/_/nginx) Finally, I included a Dockerfile with my requirements for NGINX and exposing port 8080. Once that was all set, I built my container and pushed it Docker Hub as a private repository. App Stack Deployment Now that we have the container created and uploaded to Docker Hub, we are ready to bring it to XC. Start by opening up the F5 XC Console and navigate to the Distributed Apps tile. Navigate to Applications -> Container Registries, then click Add Container Registry. Here we just have to add a name for the Container Registry, our Docker Hub Username, “docker.io” for the Server FQDN, and then blindfold our password for Docker Hub. After saving, we are now ready to configure our workload To do so, we have to navigate over to Applications -> Virtual K8s. I already had a Virtual Site and Virtual K8s created, but you'll need to create those if you don't already have them. For your reference, here are some links to a walk-through on each of these: Virtual Site Creation: https://docs.cloud.f5.com/docs/how-to/fleets-vsites/create-virtual-site Virtual K8s Creation: https://docs.cloud.f5.com/docs/how-to/app-management/create-vk8s-obj Select your Virtual K8s cluster: After selecting your cluster, navigate to the Workloads tab. Under Workloads, click on Add VK8s Workload. Give your workload a name and then change the Type of Workload to Service instead of Simple Service. Your configuration should look something like below: You'll notice we now have to configure the Service. Click Configure. The first step is to tell XC which container we want to deploy for this service. Under Containers, select Add Item: Give the container a name, and then input your Image Name. The format for the image name is "registry/image:tagname". If you leave the tag name blank, it defaults to “latest”. Under the Select Container Registry drop down, select Private Registry. This will bring up another drop-down where we will select the container registry we created earlier. Your configuration should end up looking similar to below: For this simple use case, we can skip the Configuration Parameters and move to our Deploy Options. Here, we have some flexibility on where we want to deploy our workload. You can choose All Regional Edges (F5 PoPs), specific REs, or even custom CEs and Virtual Sites. In my basic example, I chose Regional Edge Sites and picked the ny8-nyc RE for now: Next, we have to configure where we want to advertise this workload. We have the option to keep it internal and only advertise in the vK8s Cluster or we could advertise this workload directly on the Internet. Since we only want this maintenance page to be seen when the pool members are all down, we are going to keep this to Advertise In Cluster. After selecting the advertisement, we have to configure our Port Information. Click Configure. Under the advertisement configuration, you’ll see we are simply choosing our ports. If you toggle “Show Advanced fields” you can see we have some flexibility on the port we want to advertise and the actual target port for the container. In my case, I am going to use 8080 for both, but you may want to have a different combination (i.e. 80:8080). Click Apply once finished. Now that we have the ports defined, we can simply hit Apply on the Service configuration and Save and Exit the workload to kick off the deployment. We should now see our new maintenance-page workload in the list. You’ll notice that after refreshing a couple times, the Running/Completed Pods and Total Pods fields will be populated with the number of REs/CEs you chose to deploy the workload to. After a few minutes, you should have a matching number of Running/Completed Pods to your Total Pods. This gives us an indication that the workload is ready to be used for our application. (Note: you can click on the pod numbers in this list to see a more detailed status of the pods. This helps when troubleshooting) Pool Creation With our workload live and advertised in the cluster, it is time to create our pool. In the top left of the platform we’ll need to Select Service and change to Mulitcloud App Connect: Under Mulit-Cloud App Connect, navigate to Manage -> Load Balancers -> Origin Pools and Select Add Origin Pool. Here, we’ll give our origin pool a name and then go directly to Origin Servers. Under Origin Servers, click Add Item. Change the Type of the Origin Server to be K8s Service Name of Origin Server on given Sites. Under Service Name, we have to use the format "servicename.namespace:cluster-id" to point to our workload. In my case, it was "maintenance-page.bohanson:bohanson-test" since I had the following: Service Name: maintenance-page Namespace: bo-hanson VK8s Cluster: bohanson-test Under Site or Virtual Site, I chose the Virtual Site I already had created. The last step is to change the network to vK8s Networks on Site and Click Apply. The result should look like the below: We now need to change our Origin Server port to be the port we defined in the workload advertisement configuration. In my case, I chose port 8080. The rest of the configuration of the origin server is up to you, but I chose to include a simple http health check to monitor the service. Once the configuration finished, click Save and Exit. The final pool configuration should look like this: Application Deployment: With our maintenance container up and running and our pool all set, it is time to finally deploy our solution. In this case, we can select any existing Load Balancer configuration where we want to add the maintenance page. You could also create a new Load Balancer from scratch, of course, but for this example I am deploying to an existing configuration. Under Manage -> Load Balancers, find the load balancer of your choosing and then select Manage Configuration. Once in the Load Balancer view, select Edit Configuration in the top right. To deploy the solution, we just need to navigate to our Origins section and add our new maintenance pool. Select Add Item. At this point, you may be thinking, “Well that is great, but how am I going to get the pool to only show when all other pool members are down?” That is the beauty of the F5 Distributed Cloud pool configuration. We have two options that we can set when adding a pool: Weight and Priority. Both of those options are pretty self-explanatory if you have used a load balancer before, but what is interesting here is when you give these options a value of zero. Giving a pool a weight of zero would disable the pool. For a maintenance pool use case, that could be helpful since we can manually go into the Load Balancer configuration during a maintenance window, disable the main pool, and then bring up the maintenance pool until our change window is closed when we could then reverse the weights and bring the main pool back online. That ALMOST solves our iRule use case, but it would be manual. Alternatively, we can give a pool a Priority of zero. Doing so would mean that all other pools take priority and will be used unless they go down. In the event of the main pool going down, it would default to the lowest priority pool (zero). Now that is more like it! This means we can set our maintenance pool to a Priority of zero and it will automatically be used when the health of all our other pool members go down – which completely fulfills the original iRule requirement. So in our configuration, let's add our new maintenance pool and set: Weight: 1 Priority: 0 After clicking save, the final pool configuration should look something like this: Testing To test, we can simply switch our health check on the main pool to something that would fail. In my case, I just changed the expected status code on the health check to something arbitrary that I knew would fail, but this could be different in your case. After changing the health check, we can navigate to our application in a browser, and see our maintenance page dynamically appear! Changing the health check on the main pool back to a working one should dynamically turn off the maintenance page as well: Summary This is just one example of how you can use App Stack to convert some more advanced/dynamic iRules over to F5 Distributed Cloud. I only used a basic NGINX configuration in this example, but you can start to see how leveraging NGINX in App Stack can give us even more flexibility. Hopefully this helps!387Views2likes0CommentsConfigure Generic Webhook Alert Receiver using F5 Distributed Cloud Platform
Generic Webhook Alerts feature in F5 Distributed Cloud (F5 XC) gives feasibility to easily configure and send Alert notifications related to Application Infrastructure (IaaS) to specified URL receiver. F5 XC SaaS console platform sends alert messages to web servers to receive as soon as the events gets triggered.361Views2likes0CommentsGetting started with F5 Distributed Cloud (XC) Telemetry
Introduction: This is an introductory article on the F5 Distributed Cloud (XC) telemetry series covering the basics. Going forward, there will be more articles focusing on exporting and visualizing logs and metrics from XC platform to telemetry tools like ELK Stack, Loki, Prometheus, Grafana etc. What is Telemetry? Telemetry refers to the process of collection and transmission of various kinds of data from remote systems to some central receiving entity for monitoring, analyzing and improving the performance, reliability, and security of remote systems. Telemetry Data involves: Metrics: Quantitative data like request rates, error rates, request/response throughputs etc. collected at regular intervals over a period of time. Logs: Textual time and event-based records generated by applications like request logs, security logs, etc. Traces: Information regarding journey/flow of requests across multiple services in a distributed system. Alerts: Alerts use telemetry data to set limits and send real-time notifications allowing organizations to act quickly if their systems don’t behave as expected. This makes alerts a critical pillar of observability. Overview: The F5 Distributed Cloud platform is designed to meet the needs of today’s modern and distributed applications. It allows for delivery, security, and observability across multiple clouds, hybrid clouds, and edge environments. This will create telemetry data that can be seen in XC’s own dashboards. But there may be times when customers want to collect their application’s telemetry data from different platforms to their own SIEM systems. To fulfill this kind of requirement, XC has come up with the Global Log Receiver (GLR) which will send XC logs to customer’s log collection systems. Along with this XC also exposes API that contains metrics data which can be fetched by exporter scripts and can be parsed and processed in such a way that telemetry tools can understand. As shown in the above diagram, there are a few steps involved before raw telemetry data can be presented into the dashboards, which include data collection, storage, and processing from remote systems. Once done, only then will the telemetry data be sent to the visualization tools for real-time monitoring and observability. To achieve this, there are several telemetry tools available like Prometheus (which is used for collecting, storing, and analyzing metrics), ELK stack, Grafana etc. We have covered a brief description of a few such tools below. F5 XC Global Log Receiver: F5 XC Global Log Receiver facilitates sending XC logs (Request, Audit, Security event and DNS request logs) to an external log collection system. The sent logs include all system and application logs of F5 XC tenant. Global log receiver supports sending the logs for the following log collection systems: AWS Cloudwatch AWS S3 HTTP Receiver Azure Blob Storage Azure Event Hubs Datadog GCP Bucket Generic HTTP or HTTPs server IBM QRadar Kafka NewRelic Splunk SumoLogic More information on how to setup or configure XC GLR can be found in this document. Observability/Monitoring Tools: Note: Below is a brief description of a few commonly used monitoring tools used by organizations. Prometheus: Prometheus is an open-source monitoring and alerting tool designed for collecting, storing, and analyzing time-series data (metrics) from modern, cloud-native, and distributed systems. It scrapes metrics from targets via HTTP endpoints, stores them in its optimized time-series database, and allows querying using the powerful PromQL language. Prometheus integrates seamlessly with tools like Grafana for visualization and includes Alertmanager for real-time alerting. It can also be integrated with Kubernetes and can help in continuously discovering and monitoring services from remote systems. Loki: Loki is a lightweight, open-source log aggregation tool designed for storing and querying logs from remote systems. Unlike traditional log management systems, Loki focuses on processing logs alongside metrics and is often paired with Prometheus, making it more efficient. It does not index the log content; rather it sets labels for each log stream. Logs can be queried using LogQL, a PromQL-like language. It is best suited for debugging and monitoring logs in cloud-native or containerized environments like Kubernetes. Grafana: Grafana is an open-source visualization and analytics platform for creating real-time dashboards from diverse data sets. It integrates with tools like Prometheus, Loki, Elasticsearch, and more. Grafana enables users to visualize trends, monitor performance, and set up alerts using a highly customizable interface. ELK Stack: The ELK Stack (Elasticsearch, Logstash, Kibana) is a powerful open-source solution for log management, search, and analytics. Elasticsearch handles storing, indexing, and querying data. Logstash ingests, parses, and transforms logs from various sources. Kibana provides an interactive interface for visualizing data and building dashboards. Conclusion: Telemetry turns system data into actionable insights enabling real-time visibility, early detection of issues, and performance tuning, thereby ensuring system reliability, security, stability, and efficiency. In this article, we’ve explored some of the foundational building blocks and essential tools that will set the stage for the topics we’ll cover in the upcoming articles of this series! Related Articles: F5 Distributed Cloud Telemetry (Logs) - ELK Stack F5 Distributed Cloud Telemetry (Metrics) - ELK Stack F5 Distributed Cloud Telemetry (Logs) - Loki F5 Distributed Cloud Telemetry (Metrics) - Prometheus References: XC Global Log Receiver Prometheus ELK Stack Loki222Views1like3CommentsAccelerate Your Initiatives: Secure & Scale Hybrid Cloud Apps on F5 BIG-IP & Distributed Cloud DNS
It's rare now to find an application that runs exclusively in one homogeneous environment. Users are now global, and enterprises must support applications that are always-on and available. These applications must also scale to meet demand while continuing to run efficiently, continuously delivering a positive user experience with minimal cost. Introduction In F5’s 2024 State of Application Strategy Report, Hybrid and Multicloud deployments are pervasive. With the need for flexibility and resilience, most businesses will deploy applications that span multiple clouds and use complex hybrid environments. In the following solution, we walk through how an organization can expand and scale an application that has matured and now needs to be highly-available to internal users while also being accessible to external partners and customers at scale. Enterprises using different form-factors such as F5 BIG-IP TMOS and F5 Distributed Cloud can quickly right-size and scale legacy and modern applications that were originally only available in an on-prem datacenter. Secure & Scale Applications Let’s consider the following example. Bookinfo is an enterprise application running in an on-prem datacenter that only internal employees use. This application provides product information and details that the business’ users access from an on-site call center in another building on the campus. To secure the application and make it highly-available, the enterprise has deployed an F5 BIG-IP TMOS in front of each of endpoint An endpoint is the combination of an IP, port, and service URL. In this scenario, our app has endpoints for the frontend product page and backend resources that only the product page pulls from. Internal on-prem users access the app with internal DNS on BIG-IP TMOS. GSLB on the device sends another class of internal users, who aren’t on campus and access by VPN, to the public cloud frontend in AWS. The frontend that runs in AWS can scale with demand, allowing it to expand as needed to serve an influx of external users. Both internal users who are off-campus and external users will now always connect to the frontend in AWS through the F5 Global Network and Regional Edges with Distributed Cloud DNS and App Connect. Enabling the frontend for the app in AWS, it now needs to pull data from backend services that still run on-prem. Expanding the frontend requires additional connectivity, and to do that we first deploy an F5 Distributed Cloud Customer Edge (CE) to the on-prem datacenter. The CE connects to the F5 Global Network and it extends Distributed Cloud Services, such as DNS and Service Discovery, WAF, API Security, DDoS, and Bot protection to apps running on BIG-IP. These protections not only secure the app but also help reduce unnecessary traffic to the on-prem datacenter. With Distributed Cloud connecting the public cloud and on-prem datacenter, Service Discovery is configured on the CE on-prem. This makes a catalog of apps (virtual servers) on the BIG-IP available to Distributed Cloud App Connect. Using App Connect with managed DNS, Distributed Cloud automatically creates the fully qualified domain name (FQDN) for external users to access the app publicly, and it uses Service Discovery to make the backend services running on the BIG-IP available to the frontend in AWS. Here are the virtual servers running on BIG-IP. Two of the virtual servers, “details” and “reviews,” need to be made available to the frontend in AWS while continuing to work for the frontend that’s on-prem. To make the virtual servers on BIG-IP available as upstream servers in App Connect, all that’s needed is to click “Add HTTP Load Balancer” directly from the Discovered Services menu. To make the details and reviews sevices that are on-prem available to the frontend product page in AWS, we advertise each of their virtual servers on BIG-IP to only the CE running in AWS. The menu below makes this possible with only a few clicks as service discovery eliminates the need to find the virtual IP and port for each virtual server. Because the CE in AWS runs within Kubernetes, the name of the new service being advertised is recognized by the frontend product page and is automatically handled by the CE. This creates a split-DNS situation where an internal client can resolve and access both the internal on-prem and external AWS versions of the app. The subdomain “external.f5-cloud-demo.com” is now resolved by Distributed Cloud DNS, and “on-prem.f5-cloud-demo.com” is resolved by the BIG-IP. When combined with GSLB, internal users who aren’t on campus and use a VPN will be redirected to the external version of the app. Demo The following video explains this solution in greater detail, showing how to configure connectivity to each service the app uses, as well as how the app looks to internal and external users. (Note: it looks and works identically! Just the way it should be and with minimal time needed to configure it). Key Takeaways BIG-IP TMOS has long delivered best-in-class service with high-availability and scale to enterprise and complex applications. When integrated with Distributed Cloud, freely expand and migrate application services regardless of the deployment model (on-prem, cloud, and edge). This combination leverages cloud environments for extreme scale and global availability while freeing up resources on-prem that would be needed to scrub and sanitize traffic. Conclusion Using the BIG-IP platform with Distributed Cloud services addresses key challenges that enterprises face today: whether it's making internal apps available globally to workforces in multiple regions or scaling services without purchasing more fixed-cost on-prem resources. F5 has the products to unlock your enterprise’s growth potential while keeping resources nimble. Check out the select resources below to explore more about the products and services featured in this solution. Additional Resources Solution Overview: Distributed Cloud DNS Solution Overview: One DNS – Four Expressions Interactive Demo: Distributed Cloud DNS at F5 DevCentral: The Power of &: F5 Hybrid DNS solution F5 Hybrid Security Architectures: One WAF Engine, Total Flexibility219Views1like0CommentsF5 XC CE Debug commands through GUI cloud console and API
Why this feature is important and helpful? With this capability if the IPSEC/SSL tunnels are up from the Customer Edge(CE) to the Regional Edge(RE), there is no need to log into the CE, when troubleshooting is needed. This is possible for Secure Mesh(SM) and Secure Mesh V2 (SMv2) CE deployments. As XC CE are actually SDN-based ADC/proxy devices the option to execute commands from the SDN controller that is the XC cloud seems a logical next step. Using the XC GUI to send SiteCLI debug commands. The first example is sending the "netstat" command to "master-3" of a 3-node CE cluster. This is done under Home > Multi-Cloud Network Connect > Overview > Infrastructure > Sites and finding the site, where you want to trigger the commands. In the VPM logs it is possible to see the command that was send in API format by searching for it or for logs starting with "debug", as to automate this task. If you capture and review the full log, you will even see not only the API URL endpoint but also the POST body data that needs to be added. The VPM logs that can also be seen from the web console and API, are the best place to start investigating issues. XC Commands reference: Node Serviceability Commands Reference | F5 Distributed Cloud Technical Knowledge Troubleshooting Guidelines for Customer Edge Site | F5 Distributed Cloud Technical Knowledge Troubleshooting Guide for Secure Mesh Site v2 Deployment | F5 Distributed Cloud Technical Knowledge Using the XC API to send SiteCLI debug commands. The same commands can be send using the XC API and first the commands can be tested and reviewed using the API doc and developer portals. API documentation even has examples of how to run these commands with vesctl that is the XC shell client that can be installed on any computer or curl. Postman can also be used instead of curl but the best option to test commands through the API is the developer portal. Postman can also be used by the "old school" people 😉 Link reference: F5 Distributed Cloud Services API for ves.io.schema.operate.debug | F5 Distributed Cloud Technical Knowledge F5 Distributed Cloud Dev Portal ves-io-schema-operate-debug-CustomPublicAPI-Exec | F5 Distributed Cloud Technical Knowledge Summary: The option to trigger commands though the XC GUI or even the API is really useful if for example there is a need to periodically monitor the cpu or memory jump with commands like "execcli check-mem" or "execcli top" or even automating the tcpdump with "execcli vifdump xxxx". The use cases for this functionality really are endless.203Views0likes1CommentF5 Distributed Cloud Telemetry (Logs) - Loki
Scope This article walks through the process of integrating log data from F5 Distributed Cloud’s (F5 XC) Global Log Receiver (GLR) with Grafana Loki. By the end, you'll have a working log pipeline where logs sent from F5 XC can be visualized and explored through Grafana. Introduction Observability is a critical part of managing modern applications and infrastructure. F5 XC offers the GLR as a centralized system to stream logs from across distributed services. Grafana Loki, part of the Grafana observability stack, is a powerful and efficient tool for aggregating and querying logs. To improve observability, you can forward logs from F5 XC into Loki for centralized log analysis and visualization. This article shows you how to implement a lightweight Python webhook that bridges F5 XC GLR with Grafana Loki. The webhook acts as a log ingestion and transformation service, enabling logs to flow seamlessly into Loki for real-time exploration via Grafana. Prerequisites Access to F5 Distributed Cloud (XC) SaaS tenant with GLR setup VM with Python3 installed Running Loki instance (If not, check "Configuring Loki and Grafana" section below) Running Grafana instance (If not, check "Configuring Loki and Grafana" section below) Note – In this demo, an AWS VM is used with Python3 installed and running webhook (port - 5000), Loki (port - 3100) and Grafana (port - 3000) running as docker instance, all in the same VM. Architecture Overview F5 XC GLR → Python Webhook → Loki → Grafana F5 XC GLR Configuration Follow the steps mentioned below to set up and configure Global Log Receiver (GLR). F5 XC GLR Building the Python Webhook To send the log data from F5 Distributed Cloud Global Log Receiver (GLR) to Grafana Loki, we used a lightweight Python webhook implemented using the Flask framework. This webhook acts as a simple transformation and relay service. It receives raw log entries from F5 XC, repackages them in the structure Loki expects, and pushes them to a Loki instance running on the same virtual machine. Key Functions of the Webhook Listens for Log Data: The webhook exposes an endpoint (/glr-webhook) on port 5000 that accepts HTTP POST requests from the GLR. Each request can contain one or more newline-separated log entries. Parses and Structures the Logs: Incoming logs are expected to be JSON-formatted. The webhook parses each line individually and assigns a consistent timestamp (in nanoseconds, as required by Loki). Formats the Payload for Loki: The logs are then wrapped in a structure that conforms to Loki’s push API format. This includes organizing them into a stream, which can be labeled (e.g., with a job name like f5-glr) to make logs easier to query and group in Grafana. Pushes Logs to Loki: Once formatted, the webhook sends the payload to the Loki HTTP API using a standard POST request. If the request is successful, Loki returns a 204 No Content status. Handles Errors Gracefully: The webhook includes basic error handling for malformed JSON, network issues, or unexpected failures, returning appropriate HTTP responses. Running the Webhook python3 webhook.py > python.log 2>&1 & This command runs webhook.py using Python3 in the background and redirects all standard output and error messages to python.log for easier debugging. Configuring Loki and Grafana docker run -d --name=loki -p 3100:3100 grafana/loki:latest docker run -d --name=grafana -p 3000:3000 grafana/grafana:latest Loki and Grafana are running as docker instance in the same VM, private IP of the Loki docker instance along with port is used as data source in Grafana configuration. Once Loki is configured under Grafana Data sources, follow the below steps: Navigate to Explore menu Select “Loki” in data source picker Choose appropriate label and value, in this case label=job and value=f5-glr Select desired time range and click “Run query” Observe logs will be displayed based on “Log Type” selected in F5 XC GLR Configuration Note: Some requests need to be generated for logs to be visible in Grafana based on Log Type selected. Conclusion F5 Distributed Cloud's (F5 XC) Global Log Receiver (GLR) unlocks real-time observability by integrating with open-source tools like Grafana Loki. This reflects F5 XC's commitment to open source, enabling seamless log management with minimal overhead. A customizable Python webhook ensures adaptability to evolving needs. Centralized logs in Loki and visualized in Grafana empower teams with actionable insights, accelerating troubleshooting and optimization. F5 XC GLR's flexibility future-proofs observability strategies. This integration showcases F5’s dedication to interoperability and empowering customers with community-driven solutions.200Views0likes0CommentsF5 Distributed Cloud Telemetry (Metrics) - Prometheus
Scope This article walks through the process of collecting metrics from F5 Distributed Cloud’s (XC) Service Graph API and exposing them in a format that Prometheus can scrape. Prometheus then scrapes these metrics, which can be visualized in Grafana. Introduction Metrics are essential for gaining real-time insight into service performance and behaviour. F5 Distributed Cloud (XC) provides a Service Graph API that captures service-to-service communication data across your infrastructure. Prometheus, a leading open-source monitoring system, can scrape and store time-series metrics — and when paired with Grafana, offers powerful visualization capabilities. This article shows how to integrate a custom Python-based exporter that transforms Service Graph API data into Prometheus-compatible metrics. These metrics are then scraped by Prometheus and visualized in Grafana, all running in Docker for easy deployment. Prerequisites Access to F5 Distributed Cloud (XC) SaaS tenant VM with Python3 installed Running Prometheus instance (If not check "Configuring Prometheus" section below) Running Grafana instance (If not check "Configuring Grafana" section below) Note – In this demo, an AWS VM is used with Python installed and running exporter (port - 8888), Prometheus (host port - 9090) and Grafana (port - 3000) running as docker instance, all in same VM. Architecture Overview F5 XC API → Python Exporter → Prometheus → Grafana Building the Python Exporter To collect metrics from the F5 Distributed Cloud (XC) Service Graph API and expose them in a format Prometheus understands, we created a lightweight Python exporter using Flask. This exporter acts as a transformation layer — it fetches service graph data, parses it, and exposes it through a /metrics endpoint that Prometheus can scrape. Code Link -> exporter.py Key Functions of the Exporter Uses XC-Provided .p12 File for Authentication: To authenticate API requests to F5 Distributed Cloud (XC), the exporter uses a client certificate packaged in a .p12 file. This file must be manually downloaded from the F5 XC console (steps) and stored on the VM where the Python script runs. The script expects the full path to the .p12 file and its associated password to be specified in the configuration section. Fetches Service Graph Metrics: The script pulls service-level metrics such as request rates, error rates, throughput, and latency from the XC API. It supports both aggregated and individual load balancer views. Processes and Structures the Data: The exporter parses the raw API response to extract the latest metric values and converts them into Prometheus exposition format. Each metric is labelled (e.g., by vhost and direction) for flexibility in Grafana queries. Exposes a /metrics Endpoint: A Flask web server runs on port 8888, serving the /metrics endpoint. Prometheus periodically scrapes this endpoint to ingest the latest metrics. Handles Multiple Metric Types: Traffic metrics and health scores are handled and formatted individually. Each metric includes a descriptive name, type declaration, and optional labels for fine-grained monitoring and visualization. Running the Exporter python3 exporter.py > python.log 2>&1 & This command runs exporter.py using Python3 in background and redirects all standard output and error messages to python.log for easier debugging. Configuring Prometheus docker run -d --name=prometheus --network=host -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus:latest Prometheus is running as docker instance in host network (port 9090) mode with below configuration (prometheus.yml), scrapping /metrics endpoint exposed from python flask exporter on port 8888 every 60 seconds. Configuring Grafana docker run -d --name=grafana -p 3000:3000 grafana/grafana:latest Private IP of the Prometheus docker instance along with port (9090) is used as data source in Grafana configuration. Once Prometheus is configured under Grafana Data sources, follow below steps: Navigate to Explore menu Select “Prometheus” in data source picker Choose appropriate metric, in this case “f5xc_downstream_http_request_rate” Select desired time range and click “Run query” Observe metrics graph will be displayed Note : Some requests need to be generated for metrics to be visible in Grafana. A broader, high-level view of all metrics can be accessed by navigating to “Drilldown” and selecting “Metrics”, providing a comprehensive snapshot across services. Conclusion F5 Distributed Cloud’s (F5 XC) Service Graph API provides deep visibility into service-to-service communication, and when paired with Prometheus and Grafana, it enables powerful, real-time monitoring without vendor lock-in. This integration highlights F5 XC’s alignment with open-source ecosystems, allowing users to build flexible and scalable observability pipelines. The custom Python exporter bridges the gap between the XC API and Prometheus, offering a lightweight and adaptable solution for transforming and exposing metrics. With Grafana dashboards on top, teams can gain instant insight into service health and performance. This open approach empowers operations teams to respond faster, optimize more effectively, and evolve their observability practices with confidence and control.199Views3likes0Comments