application delivery
43239 TopicsVMware VKS integration with F5 BIG-IP and CIS
Introduction vSphere Kubernetes Service (VKS) is the Kubernetes runtime built directly into VMware Cloud Foundation (VCF). With CNCF certified Kubernetes, VKS enables platform engineers to deploy and manage Kubernetes clusters while leveraging a comprehensive set of cloud services in VCF. Cloud admins benefit from the support for N-2 Kubernetes versions, enterprise-grade security, and simplified lifecycle management for modern apps adoption. Alike with other Kubernetes platforms, the integration with BIG-IP is done through the use of the Container Ingress Services (CIS) component, which is hosted in the Kubernetes platform and allows to configure the BIG-IP using the Kubernetes API. Under the hood, it uses the F5 AS3 declarative API. Note from the picture that BIG-IP integration with VKS is not limited to BIG-IP´s load balancing capabilities and that most BIG-IP features can be configured using this integration. These features include: Advanced TLS encryption, including safe key storage with Hardware Security Module (HSM) or Network & Cloud HSM support. Advanced WAF, L7 bot and API protection. L3-L4 High-performance firewall with IPS for protocol conformance. Behavioral DDoS protection with cloud scrubbing support. Visibility into TLS traffic for inspection with 3 rd party solutions. Identity-aware ingress with Federated SSO and integration with leading MFAs. AI inference and agentic support thanks to JSON and MCP protocol support. Planning the deployment of CIS for VMware VKS The installation of CIS in VMware VKS is performed through the standard Helm charts facility. The platform owner needs to determine beforehand: Whether the deployment is hosted on a vSphere (VDS) network or an NSX network. It has to be taken into account that on an NSX network, VKS doesn´t currently allow to place the load balancers in the same segment as the VKS cluster. No special considerations have to be taken when hosting BIG-IP in a vSphere (VDS) network. Whether this is a single-cluster or a multi-cluster deployment. When using the multi-cluster option and clusterIP mode (only possible with Calico in VKS), it has to be taken into account that the POD networks of the clusters cannot have overlapping prefixes. What Kubernetes networking (CNI) is desired to be used. CIS supports both VKS supported CNIs: Antrea (default) and Calico. From the CIS point of view, the CNI is only relevant when sending traffic directly to the PODs. See next. What integration with the CNI is desired between the BIG-IP and VKS NodePort mode This is done by making applications discoverable using Services of type NodePort. From the BIG-IP, the traffic is sent to the Node´s IPs where it is redistributed to the POD depending on the TrafficPolicies of the Service. This is CNI agnostic. Any CNI can be used. Direct-to-POD mode This is done by making applications discoverable using the Services of type ClusterIP. Note that the CIS integration with Antrea uses Antrea´s nodePortLocal mechanism, which requires an additional annotation in the Service declaration. See the CIS VKS page in F5 CloudDocs for details. This Antrea nodePortLocal mechanism allows to send the traffic directly to the POD without actually using the POD IP address. This is especially relevant for NSX because it allows to access the PODs without actually re-distributing the PODs IPs across the NSX network, which is not allowed. When using vSphere (VDS) networking, either Antrea’s nodePortLocal or clusterIP with Calico can be used. Another way (but not frequent) is the use of hostNetwork POD networking because it requires privileges for the application PODs or ingress controllers. Network-wise, this would have a similar behavior to nodePortLocal, but without the automatic allocation of ports. Whether the deployment is a single-tier or a two-tier deployment. A single-tier deployment is a deployment where the BIG-IP sends the traffic directly to the application PODs. This has a simpler traffic flow and easier persistence and end-to-end monitoring. A two-tier deployment sends the traffic to an ingress controller POD instead of the application PODs. This ingress controller could be Contour, NGINX Gateway Fabric, Istio or an API gateway. This type of deployment offers the ultimate scalability and provides additional segregation between the BIG-IPs (typically owned by NetOps) and the Kubernetes cluster (typically owned by DevOps). Once CIS is deployed, applications can be published either using the Kubernetes standard Ingress resource or F5’s Custom Resources. This latter is the recommended way because it allows to expose most of the BIG-IPs capabilities. Details on the Ingress resource and F5 custom annotations can be found here. Details on the F5 CRDs can be found here. Please note that at time of this writing Antrea nodePortLocal doesn´t support the TransportServer CRD. Please consult your F5 representative for its availability. Detailed instructions on how to deploy CIS for VKS can be found on this CIS VKS page in F5 CloudDocs. Application-aware MultiCluster support MultiCluster allows to expose applications that are hosted in multiple VKS clusters and publish them in a single VIP. BIG-IP & CIS are in charge of: Discover where the PODs of the applications are hosted. Note that a given application doesn´t need to be available in all clusters. Upon receiving the request for a given application, decide to which cluster and Node/Pod the request has to be sent. This decision is based on the weight of each cluster, the application availability and the load balancing algorithm being applied. Single-tier or Two-tier architectures are possible. NodePort and ClusterIP modes are possible as well. Note that at the time of this writing, Antrea in ClusterIP mode (nodePortLocal) is not supported currently. Please consult your F5 representative for availability of this feature. Considerations for NSX Load Balancers cannot be placed in the same VPC segment where the VMware VKS cluster is. These can be placed in a separate VPC segment of the same VPC gateway as shown in the next diagram. In this arrangement the BIG-IP can be configured as either 1NIC mode or as a regular deployment, in which case the MGMT interface is typically configured through an infrastructure VLAN instead of an NSX segment. The data segment is only required to have enough prefixes to host the self-IPs of the BIG-IP units. The prefixes of the VIPs might not belong to the Data Segment´s subnet. These additional prefixes have to be configured as static routes in the VPC Gateway and Route Redistribution for these must be enabled. Given that the Load Balancers are not in line with the traffic flow towards the VKS Cluster, it is required to use SNAT. When using SNAT pools, the prefixes of these can optionally be configured as additional prefixes of the Data Segment, like the VIPs. Specifically for Calico, clusterIP mode cannot be used in NSX because this would require the BIG-IP to be in the same VPC segment as VMware VKS. Note also that BGP multi-hop is not feasible either because it would require the POD cluster network prefixes to be redistributed through NSX, which is not possible either. Conclusion and final remarks F5 BIG-IPs provides unmatched deployment options and features for VMware VKS; these include: Support for all VKS CNIs, which allows sending the traffic directly instead of using hostNetwork (which implies a security risk) or using the common NodePort, which can incur an additional kube-proxy indirection. Both 1-tier or 2-tier arrangements (or both types simultaneously) are possible. F5´s Container Ingress Services provides the ability to handle multiple VMware VKS clusters with application-aware VIPs. This is a unique feature in the industry. Securing applications with the wide range of L3 to L7 security features provided by BIG-IP, including Advanced WAF and Application Access. To complete the circle, this integration also provides IP address management (IPAM) which provides great flexibility to DevOps teams. All these are available regardless of the form factor of the BIG-IP: Virtual Edition, appliance or chassis, allowing great scalability and multi-tenancy options. In NSX deployments, the recommended form-factor is Virtual Edition in order to connect to the NSX segments. We look forward to hearing your experience and feedback on this article.34Views1like0Comments[ASM] : "Request length exceeds defined buffer size " - How to increase the limit ?
Hi Experts , WAF is rejecting the request because it exceeds the maximum allowed request size (10MB) Requested URL : [HTTPS] /stock.option Host : trade-it.ifund.com Detected Request Length : 12005346 bytes ( 12 MB ) Expected Request Length : 10000000 bytes ( 10 MB ) How to increase the limit specific to this url/uri only ?30Views0likes1CommentNeed step-by-step guidance for migrating BIG-IP i2800 WAF to rSeries (UCS restore vs clean build)
Hello DevCentral Community, We are planning a hardware refresh migration from a legacy BIG-IP i2800 running WAF/ASM to a new rSeries platform and would like to follow F5 recommended best practices. Could you please advise on the step-by-step process for this migration, specifically around: o Whether UCS restore is recommended versus building config fresh o BIG-IP version compatibility considerations during the migration o Interface/VLAN mapping differences between iSeries and rSeries hardware o Best approach to migrate WAF/ASM policies and tuning after migration o Common issues or lessons learned during real-world cutovers Current environment: " BIG-IP model: i2800 " BIG-IP version: 17.1.3 " WAF module: ASM / Advanced WAF " Deployment: Active/Active Thank you .132Views0likes3CommentsHigh Availability for F5 NGINX Instance Manager in AWS
Introduction F5 NGINX Instance Manager gives you a centralized way to manage NGINX Open Source and NGINX Plus instances across your environment. It’s ideal for disconnected or air-gapped deployments, with no need for internet access or external cloud services. The NGINX Instance Manager features keep changing. They now include many features for managing configurations, like NGINX config versioning and templating, F5 WAF for NGINX policy and signature management, monitoring of NGINX metrics and security events, and a rich API to help external automation. As the role of NGINX Instance Manager becomes increasingly important in the management of disconnected NGINX fleets, the need for high availability increases. This article explores how we can use Linux clustering to provide high availability for NGINX Instance Manager across two availability zones in AWS. Core Technologies Core technologies used in this HA architecture design include: Amazon Elastic Compute instances (EC2) - virtual machines rented inside AWS that can be used to host applications, like NGINX Instance Manager. Pacemaker - an open-source high availability resource manager software used in Linux clusters since 2004. Pacemaker is generally deployed with the Corosync Cluster Engine, which provides the cluster node communication, membership tracking and cluster quorum. Amazon Elastic File System (EFS) - a serverless, fully managed, elastic Network File System (NFS) that allows servers to share file data simultaneously between systems. Amazon Network Load Balancer (NLB) - a layer 4 TCP/UDP load balancer that forwards traffic to targets like EC2 instances, containers or IP addresses. NLB can send periodic health checks to registered targets to ensure that traffic is only forwarded to healthy targets. Architecture Overview In this highly available architecture, we will install NGINX Instance Manager (NIM) on two EC2 instances in different AWS Availability Zones (AZ). Four EFS file systems will be created to share key stateful information between the two NIM instances, and Pacemaker/Corosync will be used to orchestrate the cluster - only one NIM instance is active at any time and Pacemaker will facilitate this by starting/stopping the NIM systemd services. Finally, an Amazon NLB will be used to provide network failover between the two NIM instances, using an HTTP health check to determine the active cluster node. Deployment Steps 1. Create AWS EFS file systems First, we are going to create four EFS volumes to hold important NIM configuration and state information that will be shared between nodes. These file systems will be mounted onto: /etc/nms, /var/lib/clickhouse, /var/lib/nms and /usr/share/nms inside the NIM node. Take note of the File System IDs of the newly created file systems. Edit the properties of each EFS file system and create a mount target in each AZ you intend to deploy a NIM node in, then restrict network access to only the NIM nodes by setting up an AWS Security Group. You may also consider more advanced authentication methods, but these aren't covered in this article. 2. Deploy two EC2 instances for NGINX Instance Manager Deploy two EC2 instances with suitable specifications to support the number of data plane instances that you plan to manage (you can find the sizing specifications here) and connect one to each of the AZ/subnet that you configured EFS mount targets in above. In this example, I will deploy two t2.medium instances running Ubuntu 24.04, connect one to us-east-1a and the other to us-east-1c, and create a security group allowing only traffic from its local assigned subnet. 3. Mount the EFS file systems on NGINX Instance Manager Node 1 Now we have the EC2 instances deployed, we can log on to Node 1 and mount the EFS volumes onto this node by executing the following steps: 1. SSH onto Node 1 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 4. Install NGINX Instance Manager on Node 1 With the EFS file systems now mounted, it's time to run through the NGINX Instance Manager installation on Node 1. 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 5. Install NGINX Instance Manager on Node 2 This time we are going to install NGINX Instance Manager on Node two but without attaching the EFS file systems. On Node 2: 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 6. Mount EFS file systems on NGINX Instance Manager Node 2 Now we have the NGINX Instance Manager binaries installed on each node, let's mount the EFS file systems on Node 2: 1. SSH onto Node 2 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 7. Install and configure Pacemaker/Corosync With NGINX Instance Manager now installed on both nodes, it's now time to get Pacemaker and Corosync installed: 1. Install Pacemaker, Corosync and other important agents sudo apt update sudo apt install pacemaker pcs corosync fence-agents-aws resource-agents-base 2. To allow Pacemaker to communicate between nodes, we need to add TCP communication between nodes to the Security Group for the NIM nodes. 3. Once we have the connectivity in place, we have to set a common password for the hacluster user on both nodes - we can do this by running the following command on both nodes: sudo passwd hacluster password: IloveF5 (don't use this!) 4. Now we start the Pacemaker services by running the following commands on both nodes: systemctl start pcsd.service systemctl enable pcsd.service systemctl status pcsd.service systemctl start pacemaker systemctl enable pacemaker 5. And finally, we authenticate the nodes with each other (using hacluster username, password and node hostname) and check the cluster status: pcs host auth ip-172-17-1-89 ip-172-17-2-160 pcs cluster setup nimcluster --force ip-172-17-1-89 pcs status 8. Configure Cluster Fencing Fencing is the ability to make a node unable to run resources, even when that node is unresponsive to cluster commands - you can think of fencing as cutting the power to the node. Fencing protects against corruption of data due to concurrent access to shared resources, commonly known as "split brain" scenario. In this architecture, we use the fence_aws agent, which uses boto3 library to connect to AWS and stop the EC2 instances of failing nodes. Let's install and configure the fence_aws agent: 1. Create an AWS Access Key and Secret Access key for fence_aws to use 2. Install the AWS CLI on both NIM nodes 3. Take note of the Instance IDs for the NIM instances 4. Configure the fence_aws agent as a Pacemaker STONITH device. Run the psc stonith command inserting your access key, secret key, region, and mappings of Instance ID to Linux hostname. pcs stonith create hacluster-stonith fence_aws access_key=(your access key) secret_key=(your secret key) region=us-east-1 pcmk_host_map="ip-172-31-34-95:i-0a46181368524dab6;ip-172-31-27-134:i-032d0b400b5689f68" power_timeout=240 pcmk_reboot_timeout=480 pcmk_reboot_retries=4 5. Run pcs status and make sure that the stonith device is started 9. Configure Pacemaker resources, colocations and contraints Ok - we are almost there! It's time to configure the Pacemaker resources, colocations and constraints. We want to make sure that the clickhouse-server, nms and nginx systemd services all come up on the same node together, and in that order. We can do that using Pacemaker colocations and constraints. 1. Configure a pacemaker resource for each systemd service pcs resource create clickhouse systemd:clickhouse-server pcs resource create nms systemd:nms.service pcs resource create nginx systemd:nginx.service 🔥HOT TIP🔥 check out pcs resource command options (op monitor interval etc.) to optimize failover time. 2. Create two colocations to make sure they all start on the same node pcs constraint colocation add clickhouse with nms pcs constraint colocation add nms with nginx 3. Create three constraints to define the startup order: Clickhouse -> NMS -> NGINX pcs constraint order start clickhouse then nms pcs constraint order start nms then nginx 4. Enable and start the pcs cluster pcs cluster enable --all pcs cluster start --all 10. Provision AWS NLB Load Balancer Finally - we are going to set up the AWS Network Load Balancer (NLB) to facilitate the failover. Create a Security Group entry to allow HTTPs traffic to enter the EC2 instance from the local subnet 2. Create a Load Balancer target group, targeting instances, with Protocol TCP on port 443 ⚠️NOTE ⚠️ if you are using Load balancing with TCP Protocol and terminating the TLS connection on the NIM node (EC2 instance), you must create a security group entry to allow TCP 443 from the connecting clients directly to the EC2 instance IP address. If you have trusted SSL/TLS server certificates, you may want to investigate a load balancer for TLS protocol. 3. Ensure that a HTTPS health check is in place to facilitate the failover 🔥HOT TIP🔥 you can speed up failure detection and failover using Advanced health check settings. 4. Include our two NIM instances as pending and save the target group 5. Now let's create the network load balancer (NLB) listening on TCP port 443 and forwarding to the target group created above. 6. Once the load balancer is created, check the target group and you will find that one of the targets is healthy - that's the active node in the pacemaker cluster! 7. With the load balancing now in place, you can access the NIM console using the FQDN for your load balancer and login with the password set in the install of Node 1. 8. Once you have logged in, we need to install a license before we proceed any further: Click on Settings Click on Licenses Click Get Started Click Browse Upload your license Click Add 9. With the license now installed, we have access to the full console 11. Test failover The easiest way to test failover is to just shut down the active node in the cluster. Pacemaker will detect the node is no longer available and start the services on the remaining node. Stop the active node/instance of the NIM 2. Monitor the Target Group and watch it fail over - depending on the settings you have set up, this may take a few minutes 12. How to upgrade NGINX Instance Manager on the cluster To upgrade NGINX Instance Manager in a Pacemaker cluster, perform the following tasks: 1. Stop the Pacemaker Cluster services on Node 2 - forcing Node 1 to take over. pcs cluster stop ip-172-17-2-160 2. Disconnect the NFS mounts on Node2 umount /usr/share/nms umount /etc/nms umount /var/lib/nms umount /var/lib/clickhouse 3. Upgrade NGINX Instance Manager on Node 1 Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 4. Upgrade NGINX Instance Manager on Node 2 (with the NFS mounts disconnected) Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 5. Re-mount all the NFS mounts on Node 2 mount -a 6. Start the Pacemaker Cluster services on Node 2 - adding it back into the cluster pcs cluster start ip-172-17-2-160 13. Reference Documents Some good references on Pacemaker/Corosync clustering can be found here: Configuring a Red Hat High Availability cluster on AWS Implement a High-Availability Cluster with Pacemaker and Corosync ClusterLabs Pacemaker website Corosync Cluster Engine website191Views0likes0CommentsHelp with SSH Virtual Server
Hello, we've 2 VS for SSH ( Delinea Secret Server ), Type Performance L4, NAT: AutoMap, an appropiate L4 tcp Profile and so on. If I try the connection with ssh -vvv admin@service.com. the connection gets established, but I don't get the challenge for the Fingerprint and no Password Prompt. A tcpdump looks fine, no Resets or else. I can ssh to the Pool Members from a Linux Client and from the F5 CLI without Problems. So I think the F5 drops anywhere the Key Exchange/Fingerprint. Any Idea? Thank you Karl122Views0likes6CommentsBIG IP LTM BEST PRACTICES
I want to do an F5 deployment to balance traffic to multiple web servers for an application that will be accessed by 500k users, and I have several questions. As an architecture, I have a VXLAN fabric (ONE-SITE)where the F5 (HA ACTIVE-PASIVE) and the firewall(HA ACTIVE-PASIVE) are attached to the border/service leafs(eBGP PEERING for FIREWALL-BORDER LEAF, STATIC FOR F5-BORDER). The interface to the ISP is connected to the firewall(I think it would have been recommended to attach it to the border leafs), where the first VIP is configured, translating the public IP to an IP in the FIRST ARM VLAN(CLIENT SIDE TRANSIT TO BORDER), specifically where I created the VIP on F5. 1) I want to know if the design up to this point is correct. I would also like to know whether the subnet where the VIPs reside on the F5 can be different, and if it is recommended for it to be different, from the subnet used for CLIENT SIDE TRANSIT. 2) I also want to know if it is recommended for the second ARM VLAN (server side) to be the same as the web server VLAN, or if it is better for the web server subnet(another vlan) to be different, with routing between the two networks. 3) I would also like to know whether it is recommended for the SOURCE NAT pool to be the same as the SECOND ARM VLAN (server side) or if it should be different. In any of the approaches, I would still need to perform Source NAT, I also need to implement SSL offloading and WAF (Web Application Firewall). I am very familiar with the routing aspects for any deployment model. What I would like to know is what the best architectural approach would be, or how you would design such a deployment. Thank you very much—any advice would be greatly appreciated.106Views0likes1CommentExplore TCP and TLS Profiles for Optimal S3 with MinIO Clusters
A lab-based investigation was conducted to observe measurable performance differences when using different profiles, including TCP and TLS settings, in both local and simulated wide-area implementations. Testing at a high-traffic scale was not carried out. That may be something to look for in the future. Rather, simply observing the nuances of TCP and TLS in support of modern S3 flows, think AI data delivery for model training exercises, led to interesting strategic findings. Top of mind throughout the exercise were new available configurations of both TCP and TLS BIG-IP profiles, with the express interest in improving the speed and resilience of S3 data flows. Suggested tweaks to the rich set of TCP and TLS parameters will be touched upon. Lab Setup A fully software-based configuration was created using virtualization and a modern hypervisor, where S3 clients were routed to tunable BIG-IP virtual servers, which proxied S3 traffic to a virtualized AIStor single-node single-drive object solution. The routing of client traffic was chosen so as to simulate first a high-speed local area network (LAN) experience, followed by, second, a wide area network (WAN) examination. The router offered various network impairments, simulating cross-North American latency, shaping of traffic to a reasonable but constrained maximum bandwidth, and the introduction of packet loss throughout the S3 activities. The investigation was primarily around changes in performance for a representative S3 transaction, such as a 10-megabyte object retrieval by a client, as different profiles in BIG-IP were exercised. LAN Baseline for S3 Traffic and BIG-IP To establish what our lab setup is capable of delivering, some standard S3 transactions were carried out. This started without a simulated WAN. S3 was put through its paces by both downloading and uploading of objects using both Microsoft Edge and the S3Browser clients. The LAN setup, instantiated on an ESXi host, is designed to be highly responsive. As seen by quick measurements of ICMP latency between a client and a BIG-IP VE virtual server, the round-trip time is sub-1 millisecond. The first Ping slightly exceeds 1 millisecond due to a client-side ARP request/response between client and router, and subsequent responses settle to approximately 400 microseconds. The measurements were taken with Wireshark on the S3 client running on Windows. The command prompt results do not have the fidelity of Wireshark and simply report latencies of sub-1 milliseconds for the bulk of response times. With performance protected by limiting Wireshark capture to the first 128-bytes of all packets, we use the Wireshark delta times highlighted to confirm more accurately the low round-trip time (RTT) latency. To support local on-premises S3 clients, the virtual server at 10.150.92.202 was configured to use one of the BIG-IP available LAN-oriented TCP profiles “tcp-lan-optimized”, available with all fresh BIG-IP installs. As pointed out in the diagram, the S3 traffic will also benefit from SSL/TLS security between the client and the BIG-IP. The TCP profile “tcp-lan-optimized” exhibits settings that are beneficial for low-latency TCP sessions only traversing LANs, where intuitively low packet loss and high throughput are both very likely. Just some of the characteristics include aggressive (“high speed”) congestion control, smaller initial congestion windows (suited for lower RTT), and disabling of Nagle’s algorithm such that data is sent immediately, even if not filling out a TCP maximum segment size (MSS). There are a breadth of LAN profiles also available with their own tweaks, such as “f5-tcp-lan”, the selected profile "tcp-lan-optimized" was simply chosen as a starting point. With MinIO AIStor configured with buckets allowing public access, meaning no S3 access key is necessary, simple browsing of buckets with something like Microsoft Edge and its Developer Tools can give an estimate of S3 retrieval times for a sample 10-megabyte object. Determining Throughput for LAN Delivered S3 Objects We noted that Edge Developer Tools, with caching disabled, suggested our 10-megabyte object required between 155 and 259 milliseconds to download. Some variances in time can be expected as an early test might require full TCP 3-way handshakes on port 9000 (the MinIO S3 port used) and a full TLS negotiation. Later downloads can benefit from features like TLS resumption. To drill deeper, and to estimate the actual traffic rate provided in the LAN only setup, in terms of bits per second, one can again turn to Wireshark. It is important to restrict capture to only the necessary bytes to decode TCP, as it’s the study of progressive TCP sequence numbers and issues like TCP retransmits that lead to an estimate of S3 transfer rate. Trying to capture all the payload on a consumer Windows virtual machine will not see all the packets stored. It is also good practice to filter out capture of extraneous traffic, such as Remote Desktop Protocol (RDP), which often operates on TCP and UDP ports 3389. Three successive downloads from the BIG-IP, serving the MinIO AIStor solution in the backend, appear as follows when using the TCP Stream Graphs, specifically the Sequence Number (Stevens) plot, where one clearly sees the rapid rise in sequence numbers as the 3 downloads complete at high speed. Interestingly, zooming in upon any one of the downloads, one notes there is room for improvement, which is simply to say we could have driven S3 faster with client adjustments. The Windows client periodically sends TCP “zero window” advertisements to BIG-IP, essentially halting S3 delivery for some number of milliseconds while buffers on the client are serviced. A quick filter on the client’s address and zero window events can show the activity “on the wire” during these moments (double-click to enlarge). We see that the Windows S3 client periodically shuts down the delivery of TCP segments by reducing its TCP receive (Rx) window size to zero, after which it can be seen to take 15.5 milliseconds to re-open the window. This is not critical but means our measured bit rate for a simple set of basic transactions, even in a virtualized environment, could easily be increased with more performant clients in use. The objective is not to do benchmarking but rather a comparison of TCP profiles (double click for high resolution). As seen above, the typical S3 10-megabyte object, even with a client throttling delivery periodically, was still in the 85 Mbps range. This with the TCP-LAN-Optimized profile in use. The last baseline measurement undertaken was to push S3 data (technically an HTTP PUT command) from the client to the MinIO object storage, via the BIG-IP virtual server. The value here was to measure the TCP response times observed by the client as it moved TCP segments to BIG-IP. To this end, S3Browser, available here, was used as it fully supports S3 user access keys and their corresponding secrets, and allows a user a graphical “File Explorer” experience to manipulate S3 objects. Uploading a 20-kilobyte object from our local client, we see the following response times in Wireshark (double-click for detailed image). Note, response times with something like ICMP Ping Echo Request/Reply are simple, each transaction provides an estimate of round-trip time. With TCP and data in flight, most TCP stacks use a delayed ACK approach where two times MSS (often meaning two times 1,460 bytes) are waited for, or a short timeout in case no additional data arrives, before the data is acknowledged. With large bursts of traffic, such as the upload of a sizable object with S3, the response time is likely not to deviate a lot as delayed ACKs add negligible time. We see in the chart that the data moved from client to BIG-IP, over the LAN, with typical TCP acknowledgment times in the 400 to 800 microsecond range. Baseline S3 Performance Across WAN Using Different BIG-IP TCP Profiles The router used in the lab, OPNSense, has an ability to emulate network impairments, including those consistent with the realities of traffic traversing long distances. Objectives for testing include: Base round-trip delays, in our case, 70 milliseconds, will be introduced as this is in line with best-case optical delays for round trips between New York City and Los Angeles. Shaping of traffic to emulate an achievable bandwidth of 10 Mbps, simulating normal limiting factors such as IP hops, queuing delays, and head-of-line blocking on oversubscribed intermediate network gear. The expectation is such a bandwidth cap will add variance to the overall round-trip delays of packets and occasional drops, forcing TCP into phases such as TCP slow start with congestion avoidance. Experimenting with packet loss, as anything below 1 percent packet loss generally is considered not altogether unexpected. A value of 0.5 percent packet loss will be set for both directions. As with LAN TCP profiles, BIG-IP has a number of WAN options. The one to be contrasted with the existing LAN profile was selected as “tcp-wan-optimized”. The objectives of this profile include maximizing throughput and efficient recovery from packet loss over less performant WAN links. The expectation is to encounter higher latency and lower bandwidth end-to-end network experience. Note, on the inside of BIG-IP, where MinIO AIStor continues to be co-located at LAN speeds, the server side will continue to use the tcp-lan-optimized profile. A rule of thumb is that packet loss in a WAN environment, even as little as 0.5 percent, will impede network quality of service more than latency. As such, we started with no packet loss, simply 70 milliseconds of round-trip latency and policing of bandwidth to a 10 Mbps maximum. Continuing to use the lan-tcp-optimized profile on BIG-IP still saw decent results. As expected, the bandwidth now falls under 10 Mbps, with the measured value, over three downloads, appearing to hover just under 7 Mbps. Using Edge’s developer tools, it shows, even with a profile optimized for LAN traffic, that total download times are fairly consistent, averaging just under 13 seconds. The BIG-IP virtual server was then updated to use the tcp-wan-optimized profile, just for traffic involving the external, client-side. The TCP profile for the internal-side, co-located MinIO server was left with an LAN profile. Using S3 from the client to retrieve the very same 10-megabyte objects, and the results were, to a degree, better. The object download times were in the same order of magnitude as with the LAN profile, however, drilling into the actual bit rates achieved, one can see a marginal increase in overall S3 performance. The next step would be to introduce another real-world component; packet loss equally applied in both directions. Since both directions were subjected to loss, beyond the need to retransmit TCP segments in the direction of the client, TCP acknowledgments from the client to the BIG-IP can also be dropped. The resulting behavior will exercise the TCP congestion control mechanisms. With even low packet loss in our simulated WAN environment, the outcome with the tcp-wan-optimized profile on BIG-IP was markedly better. As seen in the tables above, this is far from a scientific, in-depth analysis, as three 10-megabyte S3 retrievals is not a rigorous baseline. However, simply using these numbers above to guide us, we come to these findings: Average S3 download with WAN profile: 33.7 seconds Average S3 download with LAN profile: 40.2 seconds Percentage reduction in S3 transaction time using WAN profile: 16 percent Comparing one sample S3 transaction using each of the two profiles, visually, we see modest differences that can help to explain the increased quality of service of the WAN profile. For the purpose of a quick investigation, the WAN profile has been seen to offer benefits in a lossy, higher-latency environment such as the lab emulation. Two specific TCP features to call out, and are worth enabling in such environments are: Selective ACKs (SACK option) Without SACK, a normal approach for a client to signal back to a sender that a TCP segment appears to have been lost in flight, is to send simple duplicate ACKs. This means for every newly received segment after the missing data, an ACK is sent but only acknowledging the last contiguously received data, there is no positive indication of the subsequent data received. Simply receiving duplicate ACKs can let the sender infer there is a gap in the data stream delivered. With SACK, there is no inferring, should both ends (e.g. the client and BIG-IP in our case) support this TCP option, agreed upon during the 3-way TCP handshake, then the edges of the discontinuity in data are clearly sent by the client. Most TCP profiles will have SACK enabled, but it is worth confirming it has not been disabled and is active with your key S3 clients, as seen in the following screenshot. TCP Receive Window Exponential Scaling Original implementations of TCP only allowed for approximately 65,535 bytes to be in flight, without the sender having received acknowledgments of reception. This number can be a bottleneck with highly performant devices exchanging TCP over higher latency WAN pipes, networks with so-called high-bandwidth delay products (BDP). The workaround is for each end to advertise an exponent, for instance 2, in which case the peer device will understand that advertised receive windows are interpreted as four times (2^2) the encoded value. An exponent of 4 would indicate a 16-time multiplier (2^4), and so on. To enable this on a BIG-IP stack, per this KB article, we simply adjust the buffer size to the desired value in the profile setup. Without this adjustment, there will be no windows scaling used. Introducing the New S3 TCP Profile for BIG-IP The rise in S3 traffic has implications specific to networking and traffic in-flight. Some of the characteristics that are relevant include: There can be vast differences in network load, transaction by transaction. Consider an IoT device generating a 40-kilobyte sensor reading, followed immediately by a 500-megabyte high-definition medical ultrasound. Both are valid and sample payloads in S3 delivery today. S3, being transported by an HTTPS conduit, employs extensive parallelism for many types of larger transactions, for instance multi-part uploads or using HTTP ranges for large object downloads. Essentially, large transactions such as a 60-megabyte upload become, as an example, 12 smaller 5-megabyte writes. This parallelism is particularly advantageous with clustered HDD nodes, as spinning media still is estimated to provide 60 percent of all storage, yet the input/output (IOPS) rates of HDD technology frequently peaks at rates of only 100 to 150. As such, there is value around using many smaller, but parallel transactions. Solutions supporting S3 storage are focused upon strong read-upon-write consistency, meaning the correct versioning of objects must be served immediately after writing, frequently from any number of storage sites. As such, the immediate replication of asynchronously connected sites through S3 protocol, is something that must happen very quickly for an effective solution. With version 21 and later of TMOS, the BIG-IP LTM module has provided a starting point for a S3 TCP profile, a profile with logical settings aligned with S3 network delivery. The s3-tcp profile is based upon the parent “tcp” profile of BIG-IP, with a number of tweaks now to be described. With the s3-tcp profile, management of the receive window and send buffer is turned over to the system, which will monitor network behavior to auto-adjust. The one striking note are maximum values for items like the largest possible receive window to advertise are much bigger, moving from 65,535 bytes into the millions of bytes range. Similarly, send buffers are much larger, although the client side will ultimately have control over throttling the amount of sent traffic "in flight". The other major difference is that two different congestion control algorithms are in use, CUBIC for the s3-tcp profile and high-speed for the standard tcp-profile. Congestion control is the art of operating a TCP connection with the highest possible congestion window (CWND) over time, while minimizing segment loss due to saturated networking or peer-perceived loss due to fluctuations in delivery latency. TCP is designed to back off how many segments may be in flight, meaning the CWND, when loss is detected. The two algorithms that dictate the ensuing behavior are slow start and congestion avoidance. TCP slow start aggressively opens the congestion window when starting a TCP connection, or during a connection when recovering from a segment loss. Don't be fooled by the term slow, it's actually aggressive and fast off the mark! The "slow" is from historical convention, so named because it is slow only in comparison to the original, 1980s TCP behavior of immediately transmitting a full window of data at wire speed. Meanwhile, congestion avoidance will normally try to slowly, continually and usually linearly further open the congestion window once arriving at a fraction of the last fleetingly reached CWND, perhaps half of the last CWND before loss was detected. From there the congestion avoidance algorithm will inch upwards at a controlled rate trying to achieve the optimal end to end bandwidth before segment loss sets in once again. Think of it as a constant fine-tuning exercise trying to maximize throughput while being wary of the network’s saturation point. CUBIC stems from Linux implementations where high bandwidth, coupled with substantive network latency, are prevalent. CUBIC, as the name suggests, uses a cubic function to calculate congestion window growth based on time elapsed since the last packet loss. The bonus of BIG-IP is that adjustments to TCP profiles are quite easy to achieve. Simply create a new profile, often based upon an existing “parent” profile and make “before” and “after” observations. A good candidate for experimentation is the Westwood+ profile. Westwood+ congestion control is a server-side only implementation that studies the return of TCP ACKs, to infer optimal transmission rates. A more basic TCP approach is to half the congestion window, a full 50 percent reduction of unacknowledged bytes in flight, when three duplicate ACKs are received. The presumption is that three are supporting evidence that a TCP segment has been lost over the network. Westwood+ has a more advanced approach to studying the ACKs to arrive at a less coarse value for a new congestion window. New S3 TLS Profile for BIG-IP Similar to the new s3-tcp profile, there is now an S3 TLS profile. The normal practice is to make a copy of this profile, to allow the setting of specific certificates, keys and certificate chains for the application in question. One of the most important aspects of the s3 profile is that it by default supports TLS 1.3. There are a number of aspects to 1.3 that make it a good option. One, it removes some of the antiquated ciphering and hashing algorithms of older TLS revisions. For another thing, TLS 1.3 is a mandatory baseline for supporting post-quantum computing (PQC) shared key establishment between peers. BIG-IP, using TLS 1.3 as a building block, supports NIST FIPS 203 – ML-KEM key encapsulation as per the standard found here. A S3 immediate win around TLS 1.3 is the ability for entirely new TLS sessions, not resumptions of previously negotiated sessions, to start delivery of application data in one round-trip time (RTT). This is due to a simplification in the protocol exchange between peers in TLS 1.3. Among other things, in non-mutual TLS sessions, only the server is required to provide a certificate, which is already encrypted when delivered. In other words, no later TLS 1.2-style quick validation testing of encryption is required; all necessary tasks required of the server, including if the client can successfully decipher the provided certificate, are achieved in one logical step. Take the following example of a S3 object retrieval through an encrypted HTTP GET. As seen in the diagram, there are two round-trip times and the latency, run across the wide-area network emulator used early, required 300 milliseconds from Client Hello to the first application data. It is noted above that the time from the TLS Client Hello to the first transmission of encrypted data, carrying a S3 transaction, was 300 milliseconds. Contrast this with the same lab setup, now using a copy of the BIG-IP S3 TLS profile, which offers out-of-the box default advertising for TLS 1.3, something other TLS profiles typically require a custom adjustment to make happen in prior BIG-IP releases. As with this overall exploration, the results are suggestive of a notable performance increase, but much higher-load testing should be considered more decisive evidence. What one can say, in this case of single S3 encrypted transactions, a 50 msec or 17 percent savings in latency was observed, with respect to getting a transaction on the wire. Extrapolating over thousands of TLS sessions and the savings, let alone the security-posture improvement of TLS 1.3, make it appear like a logical configuration choice. Summary The amount of attention now being paid to S3 data delivery, including delivery for AI initiatives like model training, is heightened across the industry. With network considerations top of mind, like the effect of both WAN latency and non-zero packet loss ratios, a lab oriented simple exploration of S3 performance between clients and MinIO AIStor was carried out. New configurations of both TCP and TLS profiles, with the express interest in improving the speed and resilience of S3 data flows, were investigated. The impact of using existing LAN or WAN TCP profiles on BIG-IP was measured for small-scale, sampled lab tests. The outcomes suggested that although a LAN profile performed well with significant latency applied, the WAN oriented profiles were demonstrably better in environments with just 0.5 percent packet loss. TLS 1.3 configurations for S3 traffic were also tested, with a noticeably quicker transition to the data plane being setup when the 1RTT of TLS 1.3 was in use. Extrapolating to enterprise loads and the win should be significant, beyond just preparing the foundations for PQC-infused TLS in the future. The recommendation from this exercise is to investigate, using tools with network visibility, what options are already actively in use by both TCP and TLS in support of your S3 applications today. Awareness and benefits from TCP windows, scaling, selective acknowledgments and the use of TLS1.3 wherever possible, are all likely to add to the most robust S3 performance possible.109Views2likes0CommentsGrpc Keepalive and F5 full proxy
Hi - my F5 is running v16.1 - and is a full gRPC proxy - the problem i am having i s clinet sends gRPC ping to keep session open - to the F5 .but F5 cannot keep the session alive as there is no traffic that goes to the server - because -F5 being a proxy responds to teh ping . How can i keep the server side connection open - other than increasing the the timeout. thanks1.6KViews0likes5Comments