devops
1599 TopicsF5 BIG-IP deployment with OpenShift - platform and networking options
Introduction This article is an architectural overview on how F5 BIG-IP can be used with Red Hat OpenShift. Several topics are covered, including: 1-tier or 2-tier arrangements, where the BIG-IP load balance workload PODs directly or load balance ingress controllers (such as NGINX+ or OpenShift's built-in router) respectively. Multi-cluster arrangements, where the BIG-IP can load-balance, or do route sharding across two or more clusters. multi-tenancy, and IP address management options. While this article has a NetOps/infrastructure focus, the follow-up article BIG-IP deployment with OpenShift—application publishing focuses in DevOps/applications. Overall architecture When using BIG-IP with Red Hat OpenShift, the container Container Ingress Services (CIS from now on) container is used to connect the BIG-IP APIs with the Kubernetes APIs. The source of truth is OpenShift. When a user configuration is applied or when a change occurs in the OpenShift cluster, then CIS automatically updates the configuration in the BIG-IP. Under the hood, CIS updates the BIG-IP configuration using the AS3 declarative API. It is not necessary to know if this applies, as all the configuration can be applied using Kubernetes resource types. IP Address Management (IPAM from now on) is important when it is desired that the DevOps teams operate independently from the infrastructure administrators. CIS supports IPAM by making use of the F5 IPAM Controller (FIC from now on), which is deployed as a container as well. It can be seen how these components fit together in the next picture. CIS and FIC are PODs deployed in the OpenShift cluster and AS3 is deployed in the BIG-IP. In the next sections, we cover the different deployment options and considerations to be taken into account. The full documentation can be found in F5 clouddocs. F5 BIG-IP container integrations are Open Source Software (OSS) and can be found in this github repository where you will find additional technical details and examples. Networking - CNI options Kubernetes' networking is provided by Container Networking Interface plugins (CNI from now on) and F5 BIG-IP supports all Openshift's native CNIs: OVNKubernetes - This is the preferred option. GA since Openshift 4.6, makes use of Geneve encapsulation, but BIG-IP interacts with this CNI in a routed mode in which the packets from/to the BIG-IP don't use encapsulation. Additionally, POD's cluster IPs are discovered dynamically by CIS when OpenShift nodes are added or removed. This latter makes this method also the easiest from BIG-IP management point of view. Check CIS configuration for OVNKubernetes for details. OpenshiftSDN - supported since Openshift 3.x, it is being phased out in favour of OVNKubernetes. It makes use of VXLAN encapsulation between the nodes and between the nodes and the BIG-IPs. This requires manual configuration of VXLAN tunnels in the BIG-IPs when OpenShift nodes are added or removed. Check CIS configuration for OpenShiftSDN for details. Feature-wise these CNIs we can compare them from the next table from the Openshift documentation. Besides the above features, performance should also be taken into consideration. The NICs used in the Openshift cluster should do encapsulation off-loading to reduce the CPU load in the nodes. Increasing the MTU is recommended specially for encapsulating CNIs; this is suggested in OpenShift's documentation as well, and needs to be set at installation time in the install-config.yaml file. See this OpenShift.com link for details. Networking - the importance of supporting clusters' CNI There are basically two modes to interact with a Kubernetes workload from outside the cluster: Using NodePort Service type. In this case, external hosts access the PODs using any of the cluster's nodes IPs. When a request reaches a node, Kubernetes' kube-proxy is reponsible for forwarding the request to a POD in the local or remote node. When sending to a remote node, it adds noticeable overhead. In two-tier deployments externalTrafficPolicy: local and could be used with appropriate monitoring to avoid this additional hop. NodePort is popular for other external Load Balancers because it is an easy method to access the PODs without having to support the CNI, as the name indicates by using Kubernete's nodes. IP address. This has the drawback of an additional indirection. This drawback is specially relevant for 1-tier deployments because application PODs cannot be accessed directly, eliminating the advantages of this deployment type. On the other hand, BIG-IP supports OpenShift CNI's, both OpenShiftSDN and OVNKubernetes. Using LoadBalancer Service type. The packet path in this mode is equivalent to NodePort, in which the external load balancers need an intermediate kube-proxy hop before reaching the POD. An alternative to bypassing kube-proxy is the use of hostNetwork access, but this is discouraged in general because of its security implications. Using ClusterIP Service type. This is the preferred mode because when sending a request, this is sent directly to the destination POD. This requires to support OpenShfit's CNIs, which is the case of BIG-IP. It is worth noting that BIG-IP also supports other CNIs such as Calico or Cilium. This arrangement can be seen next. Please note in the above figure the traffic path from the BIG-IP, where the arrow reaches the inside of the CNI area. This is to indicate that it can address the ingress controllers or the workload POD's IPs within the cluster network. Using this Service type Cluster IP is also more flexible because it allows CIS to use 1-tier and 2-tier arrangements simultaneously. Networking - Load Balancer arrangement options There are basically two arrangement options, 1 and 2 tier. In a nutshell: A 2-tier arrangement is the typical way in which Kubernetes clusters are deployed. In this arrangement, the BIG-IP has only the role of External Load Balancer (first tier only) and sends the client requests to the Ingress Controller Instances (second tier). The Ingress Controllers ultimately forward the requests to the workload PODs. In a 1-tier arrangement, the BIG-IP sends the requests to the workload PODs directly. This is a much simplified arrangement, in which the BIG-IP performs the role of both External Load Balancer and Ingress Controller. Next, we will see the advantages of each arrangement. Please note that when using ClusterIP, this selection can be done on a per-Service basis. From BIG-IP point of view, it is irrelevant what are the endpoints. Load Balancer arrangement option - 2-tier arrangement Unlike most External Load Balancers, the BIG-IP can expose services with either Layer 4 functionalities or Layer 7 functionalities. In Layer 7 mode, SSL/TLS off-loading, HSM, Advanced WAF, and other advanced services can be used. A tier-2 arrangement provides greater scalability compared to 1-tier arrangements in terms of number of L7 routes exposed or number Kubernetes PODs because the control plane workload (the related Kubernetes events that are generated for these PODs and Routes) is split between BIG-IP/CIS and the in-cluster Ingress Controller. This arrangement also has strong isolation between the two tiers, ideal when each tier is managed by different teams (i.e.: platform and developer teams). A BIG-IP 2-tier arrangement is shown next: Load Balancer arrangement option - 1-tier arrangement In this arrangement, the BIG-IP typically operates in L7 mode and sends the traffic directly to the final workload POD. This is done by sending traffic to Services in ClusterIP mode. In this arrangement, persistence is handled easily and the worker's PODs can be directly monitored by the BIG-IP, providing an accurate view of the application's health. A BIG-IP 1-tierrangement is shown next: This arrangement is simpler to troubleshoot, has less latency and potentially higher per-session performance. An isolation between platform and developer teams can be achieved with CIS and FIC, yet this is not as strong isolated compared to 2-tier arrangements. This is described in BIG-IP deployment with OpenShift — application publishing options. BIG-IP platform flexibility: deployment, scalability, and multi-tenancy options Using BIG-IP, the deployment options are independent of the BIG-IP being an appliance, a scale-out chassis, or a Virtual Edition. The configuration is always the same down to the L2 (vlan/tunnel) config level. Only the L1 (physical interface) configuration changes. This platform flexibility also opens the possibilities of using different options for scalability, multi-tenancy, hardware accelerators, or Hardware Security Modules (HSMs). These latter are specially important to keep the SSL/TLS private keys in an FIPS compliant manner. The HSMs can be onboard, on-prem Network HSMs, or cloud SaaS HSMs. Multi-tenancy Options In this section, multi-tenancy refers to the case in which different projects from one or more OpenShift clusters are serviced by a single BIG-IP. Next, it is outlined the different CIS deployment options: A CIS instance can manage all namespaces on a given OpenShift cluster or a subset of these. Namespaces can be specified with a list or a label selector (i.e.: envionment=test or environment=production). Multiple CIS instances, handling different namespaces, can share a single or different BIG-IPs. Each CIS instance will own a dedicated partition in a BIG-IP. For example, it is feasible to setup an OpenShift cluster with devevelopment, pre-production, and production labeled namespaces and these be serviced by different CIS instances in the same or different BIG-IPs for each environment. Multiple CIS instances in a single BIG-IP can also handle different OpenShift clusters. This is thanks to the soft isolation provided by BIG-IP partitions. Network isolation between these partitions can be achieved with routed domains. Some of these deployment options are shown next: IP address management (IPAM) CIS has the capability of dynamically allocating IP addresses using the F5 IPAM Controller (FIC) companion. At the time of writing, it is possible to retrieve IP addresses from the following providers: Infoblox F5 local DB provider, which makes use of a PVC for persistence. For the DevOps team, it is transparent which provider is used; it is only required to specify an ipamLabel attribute in the exposed L7 or L4 service. The DevOps team can also have the ability of indicating when it wants to share IP addresses between different L7 or L4 services by means of the HosGroup attribute. This is described in the follow-up article. BIG-IP data plane scalability options A single BIG-IP cluster can scale up horizontally with up to 8 BIG-IP instances and have the different projects distributed in these. This is referred to as Scale-N in the BIG-IP documentation. This mode is often not used because it requires additional orchestration or manual operation for optimal load distribution. In this mode, projects would have soft-isolation between projects by means of BIG-IP partitions. When ultimate scalability or hard isolation is required, then TMOS vCMP technology or in newer versions F5OS tenants facilities can be used in larger appliances and scale-out chassis. These multi-tenant facilities allow running independent BIG-IP instances, isolated at hardware level, even allowing using different versions of BIG-IP. The tenant BIG-IP instances can get allocated different amounts of hardware resources. In the next picture, the different tenants are shown in different colored bars using several blades (grey bars). Using chassis-based platforms allows to scale data plane performance and increase redundancy by adding blades to the systems without the need of a reconfiguration in the CIS/OpenShift side of things. BIG-IP control plane scalability options When using very large OpenShfit clusters with either a large number of services exposed or a large number of Pods and there is a high number of changes, these will trigger many events in the Kubernetes API. These events are processed by CIS and ultimately in the BIG-IP's control plane. In these cases, the following strategies can be used to improve BIG-IP's control plane scalability: Dissagregate the different projects in different BIG-IPs. These might be multiple BIG-IP VEs or instances in F5 vCMP or F5OS tenants when using hardware platforms. Use a 2-tier architecture, which reduces the number of Kubernetes objects and events that the BIG-IP is exposed to. In the upcoming months, CIS will be available in BIG-IP Next. This is a re-architecture of BIG-IP and incorporates major scalability improvements in the control plane. Multi-cluster OpenShift Since CIS version 2.14 it is also possible that BIG-IP load balances between 2 or more clusters in Active-Active, Active-Standby, or Ratio modes. 1-tier or 2-tier arrangements are possible. Next, it shows a single BIG-IP exposing workloads from 2 OpenShift clusters. Please note that OpenShift clusters don't require to be running with the same version, so this arrangement is also interesting for performing OpenShift upgrades. When using CIS in multi-cluster mode, an additional CIS instance in a secondary cluster is needed for redundancy. If there are more than 2 OpenShift clusters, no additional CIS instances are needed. Therefore, a typical BIG-IP cluster of 2 units load balancing 2 or more OpenShift clusters will always require 4 CIS instances. For each BIG-IP, one of the CIS instances has the (P)rimary role and is in charge of making changes in the BIG-IP by default. The (S)econdary CIS will be on standby. Both CIS instances access all OpenShift clusters. A more comprehensive view of this can be seen in the next diagram, which considers having more than 2 OpenShift clusters. OpenShift clusters that don't host a CIS instance are referred to as remotely managed. Conclusion F5 BIG-IPs provides unmatched deployment options and features with Openshift; these include: The support of OpenShift's CNIs which allows sending the traffic directly instead of using hostNetwork (which implies a security risk) or using the common NodePort which incurs the additional kube-proxy indirection. Both 1-tier or 2-tier arrangements (or both types simultaneously) are possible. F5´s Container Ingress Services provides the ability to handle multiple OpenShift clusters, exposing its services in a single VIP. This is a unique feature in the industry. To complete the circle, this integration also provides IP address management (IPAM) which provides great flexibility to DevOps teams. All these are available regardless. The BIG-IP is a Virtual Edition, an appliance or a chassis platform allowing great scalability and multi-tenancy options. The follow-up article BIG-IP deployment with OpenShift—application publishing focuses on DevOps and applications. In this, it is described how CIS can also unleash all traffic management and security features in a Kubernetes native way. We are driven by your requirements. If you have any, please provide feedback through this post's comments section, your sales engineer, or via our github repository.5.3KViews5likes17CommentsOpenShift Service Mesh 2.x/3.x with F5 BIG-IP
Overview OpenShift Service Mesh (OSSM) is Red Hat´s packaged version of Istio Service Mesh. Istio has the Ingress Gateway component to handle incoming traffic from outside of the cluster. Like other ingress controllers, it requires an external load balancer to get the traffic into the ingress PODs. This follows the canonical Kubenetes 2-tier arrangement for getting the traffic inside the cluster. This is depicted in the next figure: This article covers the configuration of OpenShift Service Mesh 2.x/3.x and expose it to the BIG-IP, and how to properly monitor its health, either using BIG-IP´s Container Ingress Services (CIS) or without using it. Exposing OSSM in BIG-IP - VIP configuration It is a customer choice how to publish OSSM in the BIG-IP: A Layer 4 (L4) Virtual Server is more simple and certificate management is done in OpenShift. The advantages of using this mode are the potential higher performance and scalability, including connection mirroring, yet mirroring is not usually used for HTTP traffic due to the typical retry mechanism of HTTP applications. Connection persistence is limited to the source IP. When using CIS, this is done with a TransportServer CR, which creates a fastL4 type virtual server in the BIG-IP. A Layer 7 (L7) Virtual Server requires additional configuration because TLS termination is required. In this mode, OpenShift can take advantage of BIG-IP´s TLS off-loading capabilities and Hardware/Network/SaaS/Cloud HSM integrations, which store private keys securely, including FIPS level support. Working at L7 also allows to do per-application traffic management, including headers and payload rewrites, cookie persistence, etc. It also allows to do per-application multi-cluster. The above features are provided by the LTM (load balancing) module in BIG-IP. The possibilities are further expanded when using modules such as ASM (Advanced WAF) and Access (authentication). When using CIS, this is done with a VirtualServer CR, which creates a standard-type virtual server in the BIG-IP. Exposing OSSM to BIG-IP - pool configuration There are two options to expose Istio Ingress Gateways to BIG-IP: Using ClusterIP addresses, these are POD IPs which are dynamic. This requires the use of CIS for discovering the IP addresses of the Ingress Gateway PODs. Using NodePort addresses, these are reachable from the outside network. When using these, it is not strictly necessary to use CIS, but it is recommended. Exposing OpenShift Service Mesh using ClusterIP This requires the use of CIS with the following parameters --orchestration-cni=ovn --static-routing-mode=true These make CIS create IP routes in the BIG-IP for reaching the POD IPs inside the OpenShift cluster. Please note that this only works if all the OpenShift nodes are directly connected in the same subnet as the BIG-IP. Additionally, it is required following parameter. It is the one that actually makes CIS populate pool members with Cluster (POD) IPs: --pool-member-type=cluster It is not needed to change any configuration in OSSM because ClusterIP mode is the default mode in Istio Ingress Gateways. Exposing OpenShift Service Mesh using NodePort Using NodePort allows to have known IP addresses for the Ingress Gateways, reachable from outside the cluster. Note that when using nodePort, only one Ingress Gateway replica will run per node. The behavior of NodePort varies using the externalTrafficPolicy field: Using the Cluster value, any OpenShift node will accept traffic and will redirect the traffic to any node that has an Ingress Gateway POD, in a load balancing fashion. This is the easiest to setup, but because each request might go to a different node makes health checking not reliable (it is not known which POD goes down). Using the Local value, only the OpenShift nodes that have an Ingress Gateway PODs will accept traffic. The traffic will be delivered to the local Ingress Gateway PODs, without further indirection. This is the recommended way when using NodePort because of its deterministic behaviour and therefore reliable health checking. Next, it is described how to setup a NodePort using the Local externalTrafficPolicy. There are two options for configuring OSSM: Using the ServiceMeshControlPlane CR method: this is the default method in OSSM 2.x for backwards compatibility, but it doesn’t allow to fine tune the configuration of the proxy. See this OSSM 2.x link for further details. This is deprecated and not available in OSSM 3.x. Using Gateway injection method: this is the only method possible in OSSM 3.x and the current recommendation from Red Hat for OSSM 2.x. Using this method allows you to tune the proxy settings. In this article, it will be shown how this tuning is of special interest because at present the Ingress Gateway doesn’t have good default values for allowing reliable health checking. These will be discussed in the Health Checking section. When using ServiceMeshControlPlane CR method, the above will be configured as follows: apiVersion: maistra.io/v2 kind: ServiceMeshControlPlane [...] spec: gateways: ingress: enabled: false runtime: deployment: replicas: 2 service: externalTrafficPolicy: Local ports: - name: status-port nodePort: 30021 port: 15021 targetPort: 15021 - name: http2 nodePort: 30080 port: 80 targetPort: 8080 - name: https nodePort: 30443 port: 443 targetPort: 8443 type: NodePort When using the Gateway injection method (recommended), the Service definition is manually created analogously to the ServiceMeshControlPlane CR: apiVersion: v1 kind: Service [...] spec: externalTrafficPolicy: Local type: NodePort ports: - name: status-port nodePort: 30021 port: 15021 protocol: TCP targetPort: 15021 - name: http2 nodePort: 30080 port: 80 protocol: TCP targetPort: 8080 - name: https nodePort: 30443 port: 443 protocol: TCP targetPort: 8443 Where the ports section is optional but recommended in order to have deterministic ports, and required when not using CIS (because it requires static ports). The nodePort values can be customised. When not using CIS, it is needed to manually configure the pool members in the BIG-IP. It is typical in OpenShift to have the Ingress components (OpenShift Router or Istio) in dedicated infra nodes. See this Red Hat solution for details. When using the ServiceMeshControlPlane method, the configuration is as follows: apiVersion: maistra.io/v2 kind: ServiceMeshControlPlane [...] spec: runtime: defaults: pod: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra value: reserved - effect: NoExecute key: node-role.kubernetes.io/infra value: reserved When using the Gateway injection method, the configuration is added to the Deployment file directly: apiVersion: apps/v1 kind: Deployment [...] spec: template: metadata: spec: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra value: reserved - effect: NoExecute key: node-role.kubernetes.io/infra value: reserved The configuration above is also a good practice when using CIS. Additionally, CIS by default adds all nodes IPs to the Service pool regardless of whether the externalTrafficPolicy is set to Cluster or Local value. The health check will discard nodes where there are no Ingress Gateways. It can be limited to the scope of the nodes discovered by CIS with the following parameter: --node-label-selector Health Checking and retries for the Ingress Gateway Ingress Gateway Readiness The Ingress Gateway has the following readinessProbe for Kubernete´s own health checking: readinessProbe: failureThreshold: 30 httpGet: path: /healthz/ready port: 15021 scheme: HTTP initialDelaySeconds: 1 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 3 where the failureThreshold value of 30 is considered way too large and only marks down the Ingress Gateway as not Ready after 90 seconds (tested to be failureThreshold *timeoutSeconds). In this article, it is recommended to mark down an Ingress Gateway no later than 16 seconds. When using CIS, Kubernetes informs whenever a POD is not Ready and CIS automatically, removes its associated pool member from the pool. In order to achieve the desired behaviour of marking down the Ingress Gateway before 16 seconds, it is required to change the default failureThreshold value in the Deployment file by adding the following snippet: apiVersion: apps/v1 kind: Deployment [...] spec: template: metadata: spec: containers: - name: istio-proxy image: auto readinessProbe: failureThreshold: 5 httpGet: path: /healthz/ready port: 15021 scheme: HTTP initialDelaySeconds: 1 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 3 Which keeps all other values equal and sets failureThreshold to 5, therefore marking down the Ingress Gateway after 15 seconds. When not using CIS, a HTTP health check has to be configured manually in the BIG-IP. An example health check monitor is shown next: Connection draining When an Ingress Gateway POD is deleted (because of an upgrade, scale-down event, etc...), it immediately returns HTTP 503 in the /healthz/ready endpoint and keeps serving connections until it is effectively deleted. This is called the drain period and by default is extremely short (3 seconds) for any external load balancer. This value has to be increased so the Ingress Gateway PODs being deleted continue serving connections until the Ingress Gateway POD is removed from the external load balancer (the BIG-IP) and the outstanding connections finalised. This setting can only be tuned using the Gateway injection method and it is applied by adding the following snippet in the Deployment file: apiVersion: apps/v1 kind: Deployment [...] spec: template: metadata: annotations: proxy.istio.io/config: | terminationDrainDuration: 45s In the example above, it has been used as the default drain period of the OpenShift Router (45 seconds). The value can be customised, keeping in mind that: When using CIS, it should allow CIS to update the configuration in the BIG-IP and drain the connections. When not using CIS, it should allow the health check to detect the condition of the POD and drain the connections. Additional recommendations The next recommendations apply to any ingress controller or API manager and have been previously suggested when using OpenShift Router. Handle non-graceful errors with the pool’s reselect tries To deal better with non-graceful shutdowns or transient errors, this mechanism will reselect a new Ingress Gateway POD when a request fails. The recommendation is to set the number of tries to the number of Ingress Gateway PODs -1. When using CIS, this can be set in the VirtualServer or TransportServer CRs with the reselectTries parameter. Set an additional TCP monitor for Ingress Gateway´s application traffic sockets This complementary TCP monitor (for both HTTP and HTTPS listeners) validates that Ready instances can actually receive traffic in the application’s traffic sockets. Although this is handled with the reselect tries mechanism, this monitor will provide visibility that such types of errors are happening. Conclusion and closing remarks We hope this article highlights the most important aspects of integrating OpenShift Service Mesh with BIG-IP. A key aspect for having a reliable Ingress Gateway integration is to modify OpenShift Service Mesh’s terminationDrainDuration and readinessProbe.failureThreshold defaults. F5 has submitted to Red Hat RFE 04270713 to improve these. This article will be updated accordingly. Whether CIS integration is used or not, BIG-IP allows you to expose OpenShift ServiceMesh reliably with extensive L4-L7 security and traffic management capabilities. It also allows fine-grained access control, scalable SNAT or keeping the original source IP, among others. Overall, BIG-IP is able to fulfill any requirement. We look forward to hearing your experience and feedback on this article.314Views2likes5CommentsDouble Trouble: Multiple Controllers Handling the Same Kubernetes LoadBalancer Service
Kubernetes doesn’t prevent multiple controllers from handling the same Service. In fact, from Kubernetes’ perspective, it’s just a chunk of YAML describing an abstract networking resource. If two controllers are watching services and both think they should provision a VIP, they’ll each do so.395Views6likes2CommentsGetting Started With n8n For AI Automation
First, what is n8n? If you're not familiar with n8n yet, it's a workflow automation utility that allows us to use nodes to connect services quite easily. It's been the subject of quite a bit of Artificial Intelligence hype because it helps you construct AI Agents. I'm going to be diving more into n8n and what it can do with AI. My hope is that you can use this in your own labs to work out some of these AI networking and security challenges in your environment. Here's an example of how someone could use Ollama to control multiple Twitter accounts, for instance: How do you install it? Well... It’s all node, so the best way to install it in any environment is to ensure you have node version 22 (on Mac, homebrew install node@22) installed on your machine, as well as nvm (again, for mac, homebrew install nvm) and then do an npm install -g n8n. Done! Really...That simple. How much does it cost? While there is support and expanded functionality for paid subscribers, there is also a community edition that I have used here and it's free. How to license:
470Views5likes0CommentsSimplifying Application Health Monitoring with F5 BIG-IP
A simple agreement between BIG-IP administrators and application owners can foster smooth collaboration between teams. Application owners define their own simple or complex health monitors and agree to expose a conventional /health endpoint. When a /health endpoint responds with an HTTP 200 request, BIG-IP assumes the application is healthy based on the application owners' own criteria. The Challenge of Health Monitoring in Modern Environments F5 BIG-IP administrators in Network Operations (NetOps) teams often work with application teams because the BIG-IP acts as a full proxy, providing services like: TLS termination Load balancing Health monitoring Health checks are crucial for effective load balancing. The BIG-IP uses them to determine where to send traffic among back-end application servers. However, health monitoring frequently causes friction between teams. Problems with the Traditional Approach Traditionally, BIG-IP administrators create and maintain health monitors ranging from simple ICMP pings to complex monitors that: Simulate user transactions Verify HTTP response codes Validate payload contents Track application dependencies This leads to several issues: Knowledge Gap: NetOps may not fully grasp each application's intricacies. Change Management Overhead: Application updates require retesting monitors, causing delays. Production Risk: Monitors can break after application changes, incorrectly marking services as up/down. Team Friction: Troubleshooting failed health checks involves tedious back-and-forth between teams. A Cloud-Native Solution The cloud-native and microservices communities have patterns that elegantly solve these problems. One widely used pattern is the [health endpoint], which adapts well to BIG-IP environments. The /health Endpoint Convention Cloud-native applications commonly expose dedicated health endpoints like /health, /healthy, or /ready. These return standard status codes reflecting the application's state. The /health endpoint provides a clear contract between NetOps and application teams for BIG-IP integration. Implementing the Contract This approach establishes a simple agreement: Application Team Responsibilities: Implement /health to return HTTP 200 when the application is ready for traffic Define "healthy" based on application needs (database connectivity, dependencies, etc.) Maintain the health check logic as the application changes BIG-IP Team Responsibilities: Configure an HTTP monitor targeting the /health endpoint Treat 200 as "healthy", anything else as "unhealthy" Benefits of This Approach Aligned Expertise: Application teams define health based on their knowledge. Less Friction: BIG-IP configuration stays stable as applications evolve. Better Reliability: Health checks reflect true application health, including dependencies. Easier Troubleshooting: The /health endpoint can return detailed diagnostic info, but this is ignored by the BIG-IP and used strictly for troubleshooting. Implementation Examples F5 BIG-IP Health Monitor Configuration ltm monitor http /Common/app-health-monitor { defaults-from /Common/http destination *:* interval 5 recv 200 recv-disable none send "GET /health HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n" time-until-up 0 timeout 16 } Node.js Health Endpoint Implementation const express = require('express'); const app = express(); const port = 3000; app.get('/', (req, res) => { res.send('Application is running'); }); app.get('/health', async (req, res) => { try { const dbStatus = await checkDatabaseConnection(); const serviceStatus = await checkDependentServices(); if (dbStatus && serviceStatus) { return res.status(200).json({ status: 'healthy', database: 'connected', services: 'available', timestamp: new Date().toISOString() }); } res.status(503).json({ status: 'unhealthy', database: dbStatus ? 'connected' : 'disconnected', services: serviceStatus ? 'available' : 'unavailable', timestamp: new Date().toISOString() }); } catch (error) { res.status(500).json({ status: 'error', message: error.message, timestamp: new Date().toISOString() }); } }); async function checkDatabaseConnection() { // Check real database connection return true; } async function checkDependentServices() { // Check required service connections return true; } app.listen(port, () => { console.log(`Application listening at http://localhost:${port}`); }); Adopting this health check pattern can greatly reduce friction between NetOps and application teams while improving reliability. The simple contract of HTTP 200 for healthy provides the needed integration while letting each team focus on their expertise. For apps that can't implement a custom /health endpoint, BIG-IP admins can still use traditional ICMP or TCP port monitoring. However, these basic checks can't accurately reflect an app's true health and complex dependencies. This approach fosters collaboration and leverages the specialized knowledge of both network and application teams. The result is more reliable services and smoother operations.446Views1like0CommentsVIPTest: Rapid Application Testing for F5 Environments
VIPTest is a Python-based tool for efficiently testing multiple URLs in F5 environments, allowing quick assessment of application behavior before and after configuration changes. It supports concurrent processing, handles various URL formats, and provides detailed reports on HTTP responses, TLS versions, and connectivity status, making it useful for migrations and routine maintenance.1KViews5likes2CommentsPassing Arguments to iCall Scripts
iCall is a control-plane automation tool for the BIG-IP platform. There are several articles on overview and implementation details, but lost among them is being clear about how to pass arguments to iCall scripts. A post in the technical forum on disabling multiple other interfaces if one should fail highlighted the fact that the configuration can get pretty bloated if one does not pass data to the script. Here are two ways to do it: Setting variables in user_alert.conf events The alert definition supports variable names and values, here are a few examples: alert local-http-10-2-80-1-80-DOWN "Pool /Common/my_pool member /Common/10.2.80.1:80 monitor status down" { exec command="tmsh generate sys icall event tcpdump context { { name ip value 10.2.80.1 } { name port value 80 } { name vlan value internal } { name count value 20 } }" } alert interface_1_1_down "Link: 1.1 is DOWN" { exec command="tmsh generate sys icall event interface_manager context { { name action value disabled } { name interface value 1.1 } }" } The key/value pair arguments are set in the context of the exec command like so: { { name k1 value v1 } { name k2 value v2 } { name k3 value v3 } } Setting variables in iCall handlers A second method of setting variables is to do so in the handler definition. Note, however, this only is supported on the periodic handler. sys icall handler periodic myPeriodicTestHandlerWithArguments { arguments { { name k1 value v1 } { name k2 value v2 } { name k3 value v3 } } interval 30 script myTestScriptWithArguments } Reading the variables in the iCall script This is where the magic happens! In the iCall script, there's a little snippet to gain access to all that goodness you set in the alerts and/or handlers: sys icall script myTestScriptWithArguments { app-service none definition { foreach var { k1 k2 k3 } { set $var $EVENT::context($var) } tmsh::log "k1: ${k1}" tmsh::log "k2: ${k2}" tmsh::log "k3: ${k3}" } description none events none } The for loop iterates through the names you established in the alert/handler (specified also in the script) and then sets each variable to the context you provided. In this dummy example, I'm just logging it. But let's look at a real example to close the loop. Use Case: Disable other interfaces when one fails In the forum thread, the ask was to validate the alerts, handlers, and scripts they had assembled to accomplish disabling multiple interfaces when one fails. Totally possible without passing arguments, but think about how many objects you need to accomplish this. It's a lot! The only number of objects that doesn't change is how many alert definitions you need in user_alert.conf. But...the size of that definition shrinks considerably. Let's start with the user_alert.conf file, and I'm limiting to two interfaces (one failure triggering the other to be disabled) for brevity. alert interface_1_1_down "Link: 1.1 is DOWN" { exec command="tmsh generate sys icall event interface_manager context { { name action value disabled } { name interface value 1.1 } }" } alert interface_1_3_down "Link: 1.3 is DOWN" { exec command="tmsh generate sys icall event interface_manager context { { name action value disabled } { name interface value 1.3 } }" } alert interface_1_1_up "Link: 1.1 is UP" { exec command="tmsh generate sys icall event interface_manager context { { name action value enabled } { name interface value 1.1 } }" } alert interface_1_3_up "Link: 1.3 is UP" { exec command="tmsh generate sys icall event interface_manager context { { name action value enabled } { name interface value 1.3 } }" } Pretty simple here. Notice there is only one exec command, and I only need to pass the action desired (enabled, disabled) and the failing interface. Now let's look at the handler. This is the easier piece of the puzzle. We only need one triggered handler to call the script. sys icall handler triggered interface_manager { script interface_manager subscriptions { interface_manager { event-name interface_manager } } } So here in the handler, the event-name matches the event specified in the alert, and for consistency, I've named the script that as well. And now the script. sys icall script interface_manager { app-service none definition { foreach var { action interface } { set $var $EVENT::context($var) } switch ${interface} { "1.1" { tmsh::modify /net interface 1.3 ${action} } "1.3" { tmsh::modify /net interface 1.1 ${action} } } } description none events none } You can see at the top of the script definition is our little snippet to extract and set the variables for use, and then I'm using a switch statement to then modify the interfaces I want disabled or enabled based on the source interface failure. By passing the action along with the interface, I don't have to have two handlers and two scripts, one for each interface state. You could further optimize by passing with each source interface failure a list of the interfaces that should be disabled, then execute a tmsh::modify in a foreach loop, but the script is easier to modify programmatically than the user_alert.conf file. Testing the solution My first attempt to test the script failed because I disabled the interface administratively rather than having it fail. I had to look up how to "fail" an interface in my BIG-IP VE running on VMWare Fusion. Turns out I just have to deactivate the appropriate NIC in the VM settings. Here's a quick one minute video showing the results of the alert, handler, and script above.
1.8KViews0likes0CommentsLightboard Lessons: iCall
In this episode of Lightboard Lessons, I give an introduction to iCall, the built-in event-based BIG-IP control-plane scripting engine. Resources iCall release article iCall Codeshare iCall Triggers Example with iStats to Invalidate Cache iCall Periodic Example Pool Check to Disable Interface Passing Arguments to iCall Scripts1.9KViews0likes4Comments