F5 High Availability - Public Cloud Guidance
This article will provide information about BIG-IP and NGINX high availability (HA) topics that should be considered when leveraging the public cloud. There are differences between on-prem and public cloud such as cloud provider L2 networking. These differences lead to challenges in how you address HA, failover time, peer setup, scaling options, and application state. Topics Covered: Discuss and Define HA Importance of Application Behavior and Traffic Sizing HA Capabilities of BIG-IP and NGINX Various HA Deployment Options (Active/Active, Active/Standby, auto scale) Example Customer Scenario What is High Availability? High availability can mean many things to different people. Depending on the application and traffic requirements, HA requires dual data paths, redundant storage, redundant power, and compute. It means the ability to survive a failure, maintenance windows should be seamless to user, and the user experience should never suffer...ever! Reference: https://en.wikipedia.org/wiki/High_availability So what should HA provide? Synchronization of configuration data to peers (ex. configs objects) Synchronization of application session state (ex. persistence records) Enable traffic to fail over to a peer Locally, allow clusters of devices to act and appear as one unit Globally, disburse traffic via DNS and routing Importance of Application Behavior and Traffic Sizing Let's look at a common use case... "gaming app, lots of persistent connections, client needs to hit same backend throughout entire game session" Session State The requirement of session state is common across applications using methods like HTTP cookies,F5 iRule persistence, JSessionID, IP affinity, or hash. The session type used by the application can help you decide what migration path is right for you. Is this an app more fitting for a lift-n-shift approach...Rehost? Can the app be redesigned to take advantage of all native IaaS and PaaS technologies...Refactor? Reference: 6 R's of a Cloud Migration Application session state allows user to have a consistent and reliable experience Auto scaling L7 proxies (BIG-IP or NGINX) keep track of session state BIG-IP can only mirror session state to next device in cluster NGINX can mirror state to all devices in cluster (via zone sync) Traffic Sizing The cloud provider does a great job with things like scaling, but there are still cloud provider limits that affect sizing and machine instance types to keep in mind. BIG-IP and NGINX are considered network virtual appliances (NVA). They carry quota limits like other cloud objects. Google GCP VPC Resource Limits Azure VM Flow Limits AWS Instance Types Unfortunately, not all limits are documented. Key metrics for L7 proxies are typically SSL stats, throughput, connection type, and connection count. Collecting these application and traffic metrics can help identify the correct instance type. We have a list of the F5 supported BIG-IP VE platforms on F5 CloudDocs. F5 Products and HA Capabilities BIG-IP HA Capabilities BIG-IP supports the following HA cluster configurations: Active/Active - all devices processing traffic Active/Standby - one device processes traffic, others wait in standby Configuration sync to all devices in cluster L3/L4 connection sharing to next device in cluster (ex. avoids re-login) L5-L7 state sharing to next device in cluster (ex. IP persistence, SSL persistence, iRule UIE persistence) Reference: BIG-IP High Availability Docs NGINX HA Capabilities NGINX supports the following HA cluster configurations: Active/Active - all devices processing traffic Active/Standby - one device processes traffic, others wait in standby Configuration sync to all devices in cluster Mirroring connections at L3/L4 not available Mirroring session state to ALL devices in cluster using Zone Synchronization Module (NGINX Plus R15) Reference: NGINX High Availability Docs HA Methods for BIG-IP In the following sections, I will illustrate 3 common deployment configurations for BIG-IP in public cloud. HA for BIG-IP Design #1 - Active/Standby via API HA for BIG-IP Design #2 - A/A or A/S via LB HA for BIG-IP Design #3 - Regional Failover (multi region) HA for BIG-IP Design #1 - Active/Standby via API (multi AZ) This failover method uses API calls to communicate with the cloud provider and move objects (IP address, routes, etc) during failover events. The F5 Cloud Failover Extension (CFE) for BIG-IP is used to declaratively configure the HA settings. Cloud provider load balancer is NOT required Fail over time can be SLOW! Only one device actively used (other device sits idle) Failover uses API calls to move cloud objects, times vary (see CFE Performance and Sizing) Key Findings: Google API failover times depend on number of forwarding rules Azure API slow to disassociate/associate IPs to NICs (remapping) Azure API fast when updating routes (UDR, user defined routes) AWS reliable with API regarding IP moves and routes Recommendations: This design with multi AZ is more preferred than single AZ Recommend when "traditional" HA cluster required or Lift-n-Shift...Rehost For Azure (based on my testing)... Recommend using Azure UDR versus IP failover when possible Look at Failover via LB example instead for Azure If API method required, look at DNS solutions to provide further redundancy HA for BIG-IP Design #2 - A/A or A/S via LB (multi AZ) Cloud LB health checks the BIG-IP for up/down status Faster failover times (depends on cloud LB health timers) Cloud LB allows A/A or A/S Key difference: Increased network/compute redundancy Cloud load balancer required Recommendations: Use "failover via LB" if you require faster failover times For Google (based on my testing)... Recommend against "via LB" for IPSEC traffic (Google LB not supported) If load balancing IPSEC, then use "via API" or "via DNS" failover methods HA for BIG-IP Design #3 - Regional Failover via DNS (multi AZ, multi region) BIG-IP VE active/active in multiple regions Traffic disbursed to VEs by DNS/GSLB DNS/GSLB intelligent health checks for the VEs Key difference: Cloud LB is not required DNS logic required by clients Orchestration required to manage configs across each BIG-IP BIG-IP standalone devices (no DSC cluster limitations) Recommendations: Good for apps that handle DNS resolution well upon failover events Recommend when cloud LB cannot handle a particular protocol Recommend when customer is already using DNS to direct traffic Recommend for applications that have been refactored to handle session state outside of BIG-IP Recommend for customers with in-house skillset to orchestrate (Ansible, Terraform, etc) HA Methods for NGINX In the following sections, I will illustrate 2 common deployment configurations for NGINX in public cloud. HA for NGINX Design #1 - Active/Standby via API HA for NGINX Design #2 - Auto Scale Active/Active via LB HA for NGINX Design #1 - Active/Standby via API (multi AZ) NGINX Plus required Cloud provider load balancer is NOT required Only one device actively used (other device sits idle) Only available in AWS currently Recommendations: Recommend when "traditional" HA cluster required or Lift-n-Shift...Rehost Reference: Active-Passive HA for NGINX Plus on AWS HA for NGINX Design #2 - Auto Scale Active/Active via LB (multi AZ) NGINX Plus required Cloud LB health checks the NGINX Faster failover times Key difference: Increased network/compute redundancy Cloud load balancer required Recommendations: Recommended for apps fitting a migration type of Replatform or Refactor Reference: Active-Active HA for NGINX Plus on AWS, Active-Active HA for NGINX Plus on Google Pros & Cons: Public Cloud Scaling Options Review this handy table to understand the high level pros and cons of each deployment method. Example Customer Scenario #1 As a means to make this topic a little more real, here isa common customer scenario that shows you the decisions that go into moving an application to the public cloud. Sometimes it's as easy as a lift-n-shift, other times you might need to do a little more work. In general, public cloud is not on-prem and things might need some tweaking. Hopefully this example will give you some pointers and guidance on your next app migration to the cloud. Current Setup: Gaming applications F5 Hardware BIG-IP VIRPIONs on-prem Two data centers for HA redundancy iRule heavy configuration (TLS encryption/decryption, payload inspections) Session Persistence = iRule Universal Persistence (UIE), and other methods Biggest app 15K SSL TPS 15Gbps throughput 2 million concurrent connections 300K HTTP req/sec (L7 with TLS) Requirements for Successful Cloud Migration: Support current traffic numbers Support future target traffic growth Must run in multiple geographic regions Maintain session state Must retain all iRules in use Recommended Design for Cloud Phase #1: Migration Type: Hybrid model, on-prem + cloud, and some Rehost Platform: BIG-IP Retaining iRules means BIG-IP is required Licensing: High Performance BIG-IP Unlocks additional CPU cores past 8 (up to 24) extra traffic and SSL processing Instance type: check F5 supported BIG-IP VE platforms for accelerated networking (10Gb+) HA method: Active/Standby and multi-region with DNS iRule Universal persistence only mirrors to only next device, keep cluster size to 2 scale horizontally via additional HA clusters and DNS clients pinned to a region via DNS (on-prem or public cloud) inside region, local proxy cluster shares state This example comes up in customer conversations often. Based on customer requirements, in-house skillset, current operational model, and time frames there is one option that is better than the rest. A second design phase lends itself to more of a Replatform or Refactor migration type. In that case, more options can be leveraged to take advantage of cloud-native features. For example, changing the application persistence type from iRule UIE to cookie would allow BIG-IP to avoid keeping track of state. Why? With cookies, the client keeps track of that session state. Client receives a cookie, passes the cookie to L7 proxy on successive requests, proxy checks cookie value, sends to backend pool member. The requirement for L7 proxy to share session state is now removed. Example Customer Scenario #2 Here is another customer scenario. This time the application is a full suite of multimedia content. In contrast to the first scenario, this one will illustrate the benefits of rearchitecting various components allowing greater flexibility when leveraging the cloud. You still must factor in-house skill set, project time frames, and other important business (and application) requirements when deciding on the best migration type. Current Setup: Multimedia (Gaming, Movie, TV, Music) Platform BIG-IP VIPRIONs using vCMP on-prem Two data centers for HA redundancy iRule heavy (Security, Traffic Manipulation, Performance) Biggest App: oAuth + Cassandra for token storage (entitlements) Requirements for Success Cloud Migration: Support current traffic numbers Elastic auto scale for seasonal growth (ex. holidays) VPC peering with partners (must also bypass Web Application Firewall) Must support current or similar traffic manipulating in data plane Compatibility with existing tooling used by Business Recommended Design for Cloud Phase #1: Migration Type: Repurchase, migration BIG-IP to NGINX Plus Platform: NGINX iRules converted to JS or LUA Licensing: NGINX Plus Modules: GeoIP, LUA, JavaScript HA method: N+1 Autoscaling via Native LB Active Health Checks This is a great example of a Repurchase in which application characteristics can allow the various teams to explore alternative cloud migration approaches. In this scenario, it describes a phase one migration of converting BIG-IP devices to NGINX Plus devices. This example assumes the BIG-IP configurations can be somewhat easily converted to NGINX Plus, and it also assumes there is available skillset and project time allocated to properly rearchitect the application where needed. Summary OK! Brains are expanding...hopefully? We learned about high availability and what that means for applications and user experience. We touched on the importance of application behavior and traffic sizing. Then we explored the various F5 products, how they handle HA, and HA designs. These recommendations are based on my own lab testing and interactions with customers. Every scenario will carry its own requirements, and all options should be carefully considered when leveraging the public cloud. Finally, we looked at a customer scenario, discussed requirements, and design proposal. Fun! Resources Read the following articles for more guidance specific to the various cloud providers. Advanced Topologies and More on Highly Available Services Lightboard Lessons - BIG-IP Deployments in Azure Google and BIG-IP Failing Faster in the Cloud BIG-IP VE on Public Cloud High-Availability Load Balancing with NGINX Plus on Google Cloud Platform Using AWS Quick Starts to Deploy NGINX Plus NGINX on Azure5.6KViews5likes2CommentsUsing Cloud Templates to Change BIG-IP Versions - Google GCP
Introduction This article will make use of F5 cloud templates on GitHub to modify the BIG-IP versions for your public cloud deployments in Google. This is part of an article series, so please review the “Concepts” as well as other articles within the series. Modifying BIG-IP Templates for Google Cloud This section will show you how to modify the BIG-IP version in Google deployments. The template deployment service in Google is called Google Deployment Manager. Google GDM BIG-IP Cloud Templates for Google on GitHub There are a few methods I tested, and I’ll do a “How To” for each. Check the Appendix for additional examples. Use Latest Template Release (no edits required) Use Previous Template Release (no edits required) Edit Latest Template to Change BIG-IP Versions Edit Latest Template to Use Custom Uploaded Image Note: At the time of this article, the "Latest" template release version for F5 cloud templates in Google is 3.5.0 and found under Tag 3.5.0 on GitHub. See Tag 3.5.0 Release Notes. Option #1: Use Latest Template Release (no edits required) This option lets you use templates without modification of code. Each release corresponds to a certain BIG-IP version (see Google GDM Template Matrix), and the template is hard coded with the selection of one BIG-IP version in Google F5 cloud templates. The latest template will deploy BIG-IP version 15.0.1.0 by default. You can change the BIG-IP version by supplying a different Google image in the imageName parameter. Here is an example to deploy BIG-IP version 14.1.2.3. Search for Google images via gcloud CLI: Open your favorite terminal Enter a search filter. This filter will query the f5 vendor project. command = gcloud compute images list --project=f5-7626-networks-public | grep f5 Find your desired image My example = BIG-IP 14.1.2.3 PAYG BEST You can filter further with basic CLI by adding “grep 14-1-2-3" If you want BYOL, then “grep” for “byol” Copy the image ID and save for later (my example f5-bigip-14-1-2-3-0-0-5-payg-best-1gbps-191218142340) #Example image search and results gcloud compute images list --project=f5-7626-networks-public | grep f5 #Output similar to this... --snippet-- f5-bigip-13-1-3-2-0-0-4-payg-best-1gbps-20191105210022 f5-bigip-13-1-3-2-0-0-4-payg-best-200mbps-20191105210022 f5-bigip-13-1-3-2-0-0-4-byol-all-modules-2slot-20191105200157 ...and some more f5-bigip-14-1-2-3-0-0-5-byol-ltm-1boot-loc-191218142225 f5-bigip-14-1-2-3-0-0-5-payg-best-1gbps-191218142340 ...and more... Deploy BIG-IP with custom image ID: Find your favorite BIG-IP template for Google. I’ll use the BIG-IP, standalone, 3nic, PAYG licensing (Tag 3.5.0) Review the entire README for installation instructions Download template files: py, schema, yaml Edit yaml file imageName = f5-bigip-14-1-2-3-0-0-5-payg-best-1gbps-191218142340 Populate all remaining parameters Make sure all values in yaml are populated or commented (#) If values are commented, make sure schema contains defaults More details in my other article, Service Discovery in Google Gloud with BIG-IP Save file and deploy with your favorite method Google will validate the template and launch a BIG-IP running 14.1.2.3 #Example deploy using Google gcloud gcloud deployment-manager deployments create my-f5-14-1-2 --config f5-existing-stack-payg-3nic-bigip.yaml Easy right? Try another AMI ID search and launch the template again to get a v13.x, v14.x, or v15.x BIG-IP. Option #2: Use Previous Template Release (no edits required) If you don’t mind a previous template release (less fixes/features), AND you still don’t want to tweak template code, AND you still need a different BIG-IP version, AND the BIG-IP version is listed in the matrix then keep reading! Here is an example to deploy BIG-IP version 13.1.1.0. Find a previous template release to deploy BIG-IP version you desire: Decide what BIG-IP version you need (my example 13.1.1.0) Check the Google GDM Template Matrix for BIG-IP Scroll down the list and you’ll see template release v2.2.0 It allows “13.1.1” Click the link to review v2.2.0 template release notes Deploy BIG-IP with previous template release: Find your favorite BIG-IP template for Google. I’ll use the BIG-IP, standalone, 3nic, PAYG licensing (Tag 2.2.0) Review the entire README for installation instructions Download template files: py, schema, yaml Edit yaml file imageName = f5-hourly-bigip-13-1-1-0-0-4-best-1gbps (see yaml file for available images) Populate all remaining parameters Make sure all values in yaml are populated or commented (#) If values are commented, make sure schema contains defaults More details in my other article, Service Discovery in Google Gloud with BIG-IP Save file and deploy with your favorite method Google will validate the template and launch a BIG-IP running 13.1.1.0 #Example deploy using Google gcloud gcloud deployment-manager deployments create my-f5-13-1-1 --config f5-existing-stack-payg-3nic-bigip.yaml OK...we made it this far, but you still don’t see the BIG-IP version you need. Keep reading! In the next section, we’ll tweak some templates! Option #3: Edit Latest Template to Change BIG-IP Versions (TBD) This section is reserved for future use in situations where templates need to be modified in order to select a preferred BIG-IP version from public marketplace. In my testing, I have not yet found a scenario where a template edit is required. Therefore, this section is a place holder for potential future template hacks! Note: Review the knowledge article F5 support for GitHub software for any questions pertaining to support of templates and modified templates. Option #4: Edit Latest Template to Use Custom Uploaded Image The final Google option allows you to upload or create your own BIG-IP images and reference those images in F5 cloud template deployments. There is an existing how-to doc on the F5 Image Generator GitHub for GCE repository explaining how to create a BIG-IP image for your Google environment. I’ll walk through the high-level steps of the article below. Then we'll review the deploy steps. Note: Custom images only allow BYOL licensing. Note: Review the knowledge article F5 support for GitHub software for any questions pertaining to support of templates and modified templates Upload/Create Custom Image: Obtain an image file for the BIG-IP version you desire My example = 13.1.3.3 Download image file from https://downloads.f5.com Use F5 Image Generator to make your own custom image Review entire "Prerequisites" section Create a virtual disk image locally Upload the virtual disk image to a GCE bucket Create the virtual machine image Save image name for later (my example f5-bigip-13-1-3-3-0-0-6-byol-ltm-1slot-a83bji8j2) #Example image creation ./build-image -i BIGIP-13.1.3.3-0.0.6.iso -c config.yml -p gce -m ltm -b 1 #Output similar to this... ... qemu-system installing RTM Image -- start time: 01:58:44 qemu-system installing RTM Image -- elapsed time: 0:08:51 qemu-system performing selinux relabeling -- start time: 02:07:35 qemu-system performing selinux relabeling -- elapsed time: 0:02:10 ... ------======[ Finished disk generation for 'gce' 'ltm' '1' boot-locations. ]======------ Starting prepare cloud image 'f5-bigip-13-1-3-3-0-0-6-byol-ltm-1slot-a83bji8j2'. ... Finished prepare cloud image 'f5-bigip-13-1-3-3-0-0-6-byol-ltm-1slot-a83bji8j2' Deploy custom BIG-IP image with latest template release: Find your favorite BIG-IP template for Google. I’ll use the BIG-IP, standalone, 3nic, BYOL licensing (Tag 3.5.0) Review the entire README for installation instructions Download template files: py, schema, yaml Edit python file **Refer to EXAMPLE EDITS code snippet below Replace project ID in sourceImage with your project ID Edit yaml file imageName = f5-bigip-13-1-3-3-0-0-6-byol-ltm-1slot-a83bji8j2 Populate all remaining parameters Make sure all values in yaml are populated or commented (#) If values are commented, make sure schema contains defaults More details in my other article, Service Discovery in Google Gloud with BIG-IP Save file and deploy with your favorite method Google will validate the template and launch a BIG-IP running 13.1.3.3 #Example Edits for Option #4: Edit Latest Template to Use Custom Uploaded Image #original 'sourceImage': ''.join([COMPUTE_URL_BASE, 'projects/f5-7626-networks-public', '/global/images/', context.properties['imageName'], ]) #after edits 'sourceImage': ''.join([COMPUTE_URL_BASE, 'projects/myproject123', '/global/images/', context.properties['imageName'], ]) #Example deploy using Google gcloud gcloud deployment-manager deployments create my-f5-13-1-3 --config f5-existing-stack-byol-3nic-bigip.yaml Summary That is a wrap! There’s lots of info in this post, and I hope it makes your job easier in deciding what template to choose when deploying various versions of BIG-IP devices in the Google public cloud.887Views1like1CommentUsing Cloud Templates to Change BIG-IP Versions - Concepts
Introduction This article series will detail how you can deploy F5 into a public cloud leveraging cloud templates. This article will first cover some of the basic concepts, and we then go into some deployment guidance specific for the most common public clouds. In this article series, we will be using F5 cloud templates (available on GitHub) to modify the BIG-IP versions for your public cloud deployments. I will also share a few ways to use cloud templates in order to deploy the latest or previous stable releases of BIG-IP. Topics covered: F5 Cloud Templates Overview Image Repository Template Versions and BIG-IP Versions F5 BIG-IP Custom Images F5 Cloud Templates Next Steps...Go Read the How-To Articles! F5 Cloud Templates Overview Given the range of F5 products and the numerous public cloud providers,the F5 product teams have created quite the collection of cloud templates. These are located on GitHub with various BIG-IP templates for each cloud provider. Each template enables you to deploy BIG-IP devices in your cloud environment using different designs, licensing models, and features like load balancing, web application security, or all the BIG-IP modules - all these across a range of BIG-IP versions! GitHub For those of you who are unfamiliar with our F5 cloud templates, I encourage you to head over to GitHub and review the available templates. Each template has a corresponding README with relevant install info. These templates take care of various dependencies like network setup, access list updates, public IP creation, BIG-IP creation, HA cluster, auto scale, and more. BIG-IP Cloud Templates for AWS BIG-IP Cloud Templates for Azure BIG-IP Cloud Templates for Google F5 CloudDocs Also...head over to F5 CloudDocs for BIG-IP virtual edition (VE) for some quickstart installation and docs. F5 CloudDocs is a key resource that details important features, capabilities and usage guidance across the range of F5’s Cloud offerings. Support If you find yourself running into issues and require support, don't hesitate to open a GitHub issue on the relevant repo and create a GitHub "issue". You also have the ability to create a ticket with F5 support on https://support.f5.com. Review the F5 support policies regarding GitHub F5 repositories to learn what is and is not covered. When in doubt, reach out to your friendly F5 account team! Image Repository Each cloud provider has a public marketplace where vendors like F5 can upload images of their products, like BIG-IP. The cloud provider marketplace can be used as an image repository...with some important caveats, of course. The marketplace for each cloud provider has limits on the number of images for each vendor. As new versions of BIG-IP are released, F5 removes previous versions from marketplace to make room for the new versions. Customers relying on previous BIG-IP releases are encouraged to archive those specific BIG-IP versions in their own private image repository for each cloud provider. You can do this by uploading a BIG-IP cloud image to the cloud provider. It then becomes an available image ID for deployment. Grab these images from https://download.f5.com and look for your specific cloud provider. Template Versions and BIG-IP Versions There are two types of “versions” to discuss. One...the F5 cloud template versions. Two...the BIG-IP version. F5 Cloud Template Versions The F5 cloud templates are updated regularly and tagged with a release number. The templates are tested against the latest (aka newest) BIG-IP version available in marketplace at the time of template release and allow the user to select other versions of BIG-IP too. There is a Template-to-BIG-IP Version matrix for each cloud provider, and this tells you the various template versions and BIG-IP versions supported for that specific template release. It is recommended to always use the latest template release because it includes the most recent template fixes and improvements. You can select the matrix links below and then click on the tag/release version to see release notes. AWS CFT Template Matrix Azure ARM Template Matrix Google GDM Template Matrix BIG-IP Versions As for BIG-IP versions and what is supported in each cloud provider, please review the BIG-IP VE Support Platforms. This will list the recommended instance types/sizes and licensing for each cloud provider. Note: Best practice is to always use the latest templates as they are most up to date with the latest features and fixes for templated deployment of BIG-IP devices. This article series has workarounds that I have tested in my lab in order to help you deploy various versions of BIG-IP. If you decide to use a previous template version, then check the release notes to determine if your deployment is affected by any of the items noted (features, fixes, workarounds). F5 BIG-IP Custom Images All that talk about vendor images, public marketplace, private images...oh my! What happens if you need to customize your own image, patch it, do something magical to it? What happens if the image you need on public marketplace is no longer there? Enter the F5 Custom Image Builder! Use the F5 BIG-IP Image Generator Tool to create custom images from the .ISO file for F5 BIG-IP VE releases or for hot-fixes. You can then upload that image to the cloud provider and reference it in your template deployments. The F5 Virtual Edition (VE) team developed the F5 BIG-IP Image Generator internally to do the following: Create custom images from the .ISO file for F5 BIG-IP VE releases or for hot-fixes that are not available on the various public cloud marketplaces. Provide pre-deployment file customization of BIG-IP (for example, SSH keys, trusted certificates, custom packages, and so forth). Automatically publish images to public cloud providers. Simplify deployment workflows, such as encrypting custom images in AWS (prevents launching an instance in the marketplace first). Next Steps In the rest of the articles in this series, I will discuss HOW TO modify cloud templates to change BIG-IP versions for AWS, Azure, and Google cloud offerings.610Views2likes1CommentService Discovery in Google Cloud with F5 BIG-IP
Service discovery allows cool things to happen like dynamic node discovery for your applications. The BIG-IP device can utilize service discovery to automate the scale in/out of pool members. What does this mean? Your BIG-IP configs will get updated without user intervention. Google Cloud uses "Labels" that are assigned to virtual machines (VM). The BIG-IP will use these "Labels" to automate the dynamic nature of pool members coming and going. It periodically scans the cloud provider for VMs matching those labels. Benefit? Yes indeed! No tickets to IT asking for pool member modifications...no waiting on emails...no approvals. I setup a BIG-IP (3nic standalone) with service discovery in Google Cloud using our F5 cloud templates on GitHub, and I am here to share the how-to and results. After reading the Github repo as well as visiting our CloudDocs site, I went to work. This article has the following sections: Prerequisites Download, Customize, and Deploy Template Attach Service Account to BIG-IP VM Login to the BIG-IP via SSH and Set Password Configure Service Discovery with iApp Configure Service Discovery with AS3 Summary Appendix Sections Prerequisites - Google Cloud SDK Must have Google Cloud SDK...easy. Go here https://cloud.google.com/sdk/docs/quickstarts. curl https://sdk.cloud.google.com | bash exec -l $SHELL gcloud init The Google Cloud SDK lets you do "?" and "tab" helpers. Meaning, type gcloud then hit tab a few times to see all the options. When you run "gcloud init", it will authenticate your device to the google network resulting in your laptop having Google Cloud API access. If you are running "gcloud init" for the first time, the SSH shell will pop open a browser window in which you authenticate to Google with your credentials. You'll be given an option to select the project name, and you can also configure default zones and regions. Play around with gcloud on CLI and then hit "tab" to see all the options. You can also the Google Cloud docs here https://cloud.google.com/sdk/docs/initializing. Prerequisites - VPC Networks, Subnets, Firewalls, and Routes A VM in Google Cloud can only have one NIC per VPC network. Therefore, a BIG-IP 3nic deployment requires 3 VPC networks with 1 subnet each. Before deploying the GDM template, you'll need to create the required networks and subnets. Then make sure any necessary ports are open via firewall rules. VPC Networks and Subnets Here are some screenshots of my setup. I have a management, external, and internal network. Here are the network and subnet properties for the management network as an example. Firewalls Modify firewall rules to allow any necessary management ports. These are not setup by the template. Therefore, common management ports like tcp:22 and tcp:443 should be created. Here is my management firewall ruleset as an example. In my example, my BIG-IP has additional interfaces (NICs) and therefore additional networks and firewall rules. Make sure to allow appropriate application access (ex. 80/443) to the NICs on the BIG-IP that are processing application traffic. Here is an example for my external NIC. Routes I didn't touch these. However, it is important to review the VPC network routes and make sure you have a default gateway if required. If the network is meant to be internal/private, then it's best to remove the default route pointing to internet gateway. Here is an example of my management network routes in use. Prerequisites - Service Account To do auto-discovery of pool members (aka Service Discovery), the BIG-IP device requires a role assigned to its VM. When a VM instance has an assigned role in the cloud, it will inherit permissions assigned by the role to do certain tasks like list compute instances, access cloud storage, re-map elastic IPs, and more. This avoids the need to hard-code credentials in application code. In the example of service discovery, we need the service account to have a minimum of "Compute Viewer" or "Compute Engine - Read Only" permissions. Other deployment examples may require more permissions such as storage permissions or pub-sub permissions. Create a new service account in the IAM section, then find "Service accounts". Here's an example of a new service account and role assigned. If done correctly, it will be visible as an IAM user. See below for example. Prerequisites - Pools Members Tagged Correctly Deploy the VM instances that will run your app...the pool members (e.g. http server running port 80) and add a "Label" name with value to each VM instance. For example, my label name = app. My label value = demo. Any VM instance in my project with label name = app, value = demo will be discovered by the BIG-IP and pulled in as new pool member(s). Example... Pre-reqs are done! Download and Customize YAML Template File The Google Cloud templates make use of a YAML file, a PY file, and a schema file (requirements and defaults). The GitHub README contains helpful guidance. Visit the GitHub site to download the necessary files. As a reminder, this demo uses the following 3nic standalone template: https://github.com/F5Networks/f5-google-gdm-templates/tree/master/supported/standalone/3nic/existing-stack/payg Scroll down to the "Deploying the template" section to review the requirements. Download the YAML file to your desktop and ALSO make sure to download the python (PY) and schema files to your desktop. Here's an example of my modified YAML file. Notice some fields are optional and not required by the template as noted in the GitHub README file parameters table. Therefore, 'mgmtSubnetAddress' is commented out and ignored, but I left it in the example for visualization purposes. # Copyright 2019 F5 Networks All rights reserved. # # Version 3.2.0 imports: - path: f5-existing-stack-payg-3nic-bigip.py resources: - name: f5-existing-stack-payg-3nic-bigip type: f5-existing-stack-payg-3nic-bigip.py properties: region: us-west1 availabilityZone1: us-west1-b mgmtNetwork: jgiroux-net-mgmt mgmtSubnet: jgiroux-subnet-mgmt #mgmtSubnetAddress: <DYNAMIC or address> restrictedSrcAddress: 0.0.0.0/0 restrictedSrcAddressApp: 0.0.0.0/0 network1: jgiroux-net-ext subnet1: jgiroux-subnet-ext #subnet1Address: <DYNAMIC or address> network2: jgiroux-net-int subnet2: jgiroux-subnet-int #subnet2Address: <DYNAMIC or address> provisionPublicIP: 'yes' imageName: f5-bigip-15-0-1-0-0-11-payg-best-25mbps-190803012348 instanceType: n1-standard-8 #mgmtGuiPort: <port> #applicationPort: <port port> #ntpServer: <server server> #timezone: <timezone> bigIpModules: ltm:nominal allowUsageAnalytics: 'yes' #logLevel: <level> declarationUrl: default Deploy Template Make sure you point to the correct file location, and you're ready to go! Again, reference the GitHub README for more info. Example syntax... gcloud deployment-manager deployments create <your-deployment-name> --config <your-file-name.yaml> Attach Service Account to BIG-IP VM Once deployed, you will need to attach the service account to the newly created BIG-IP VM instance. You do this by shutting down the BIG-IP VM instance, binding a service account to the VM, and then starting the VM again. It's worth noting that the template can be easily modified to include service account binding during VM instance creation. You can also do this via orchestration tools like Ansible or Terraform. Example... gcloud compute instances stop bigip1-jg-f5-sd gcloud compute instances set-service-account bigip1-jg-f5-sd --service-account svc-jgiroux@xxxxx.iam.gserviceaccount.com gcloud compute instances start bigip1-jg-f5-sd Login to the BIG-IP and Set Password You should have a running BIG-IP at this point with attached service account. In order to access the web UI, you'll need to first access SSH via SSH key authentication and then set the admin password There are orchestrated ways to do this, but let's do the old fashion manual way. First, go to Google Cloud Console, and view properties of the BIG-IP VM instance. Look for the mgmt NIC public IP. Note: In Google Cloud, the BIG-IP mgmt interface is swapped with NIC1 Open your favorite SSH client and access the BIG-IP. Make sure your SSH key already exists in your Google Console. Instructions for uploading SSH keys are found here. Example syntax... ssh admin@x.x.x.x -i /key/location Update the admin password and save config while on the TMOS CLI prompt. Here's an example. admin@(bigip1-jg-f5-sd)(tmos)# modify auth user admin password myNewPassword123! admin@(bigip1-jg-f5-sd)(tmos)# save sys config Now access the web UI using the mgmt public IP via https://x.x.x.x. Login with admin and the newly modified password. Configure Service Discovery with iApp The BIG-IP device is very programmable, and you can apply configurations using various methods like web UI or CLI, iApps, imperative APIs, and declarative APIs. For demo purposes, I will illustrate the iApp method in this section. The F5 cloud templates automatically include the Service Discovery iApp as part of the onboard and build process, but you'll still need to configure an application service. First, the CLI method is a quick TMSH one-liner to configure the app service using the Service Discovery iApp. It does the following: creates new app service called "serviceDiscovery" uses "gce" (Google) as provider chooses a region "default" (causes script to look in same region as BIG-IP VM) chooses intervals and health checks creates new pool, looks for pool tag:value (app=demo, port 80) tmsh create /sys application service serviceDiscovery template f5.service_discovery variables add { basic__advanced { value no } basic__display_help { value hide } cloud__cloud_provider { value gce } cloud__gce_region { value \"/#default#\" } monitor__frequency { value 30 } monitor__http_method { value GET } monitor__http_verison { value http11 } monitor__monitor { value \"/#create_new#\"} monitor__response { value \"\" } monitor__uri { value / } pool__interval { value 60 } pool__member_conn_limit { value 0 } pool__member_port { value 80 } pool__pool_to_use { value \"/#create_new#\" } pool__public_private {value private} pool__tag_key { value 'app'} pool__tag_value { value 'demo'} } If you still love the web UI, then go to the BIG-IP web UI > iApps > Application Services. If you executed the TMSH command above, then you should have an app service called "serviceDiscovery". Select it, then hit "Reconfigure" to review the settings. If no app service exists yet, then create a new app service and set it to match your environmental requirements. Here is my example. Validate Results of Service Discovery with iApp Review the /var/log/ltm file. It will show pool up/down messages for the service discovery pool. It will also indicate if the service discovery script is failing to run or not. tail -f /var/log/ltm Example showing successful member add to pool... Another place to look is the /var/log/cloud/service_discovery/get_nodes.log file. You'll see messages showing the script query and status. tail -f /var/log/cloud/service_discovery/get_nodes.log Example showing getNodes.js call and parameters with successful "finished" message... Last but not least, you can check the UI within the LTM > Pools section. Note: Service Discovery with iApp complete Attach this new pool to a BIG-IP virtual server, and now your app can dynamically scale. I'll leave the virtual server creation up to you! In other words...challenge time! For additional methods to configure service discovery on a BIG-IP, continue reading. Configure Service Discovery with AS3 (declarative option) As mentioned earlier, the BIG-IP device is very programmable. We used the iApp method to automate BIG-IP configs for pool members changes in the previous section. Now let's look at a declarative API approach using AS3 from the F5 Automation Toolchain. You can read more about AS3 - here. At a high level, AS3 enables L4-L7 application services to be managed in a declarative model. This enables teams to place BIG-IP security and traffic management services in orchestration pipelines and greatly eases the configuration of L4-L7 services. This also has the benefit of using consistent patterns to deploy and migrate applications. Similar to the Service Discovery iApp...the AS3 rpm comes bundled with the handy F5 cloud templates. If you deployed via alternative methods, if you do not have AS3 rpm pre-loaded, if you want to upgrade, or if you simply want a place to start learning, review the Quick Start AS3 Docs. Review the Additional Declarations for examples on how to use AS3 with iRules, WAF policies, and more. Note: AS3 is a declarative API and therefore no web UI exists to configure L4-L7 services. Postman will be used in my example to POST the JSON declaration. Open Postman, authenticate to the BIG-IP, and then post the app declaration. Learn how by reviewing the Quick Start AS3 Docs. Here's my example declaration. It does the following: creates new application (aka VIP) with public IP of 10.1.10.34 uses "gce" (Google) as cloud provider defines tenant as "Sample_sd_01" chooses a region "us-west1" in which to query for VMs creates new pool 'web_pool' with members matching tag=app, value=demo on port 80 { "class": "ADC", "schemaVersion": "3.0.0", "id": "urn:uuid:33045210-3ab8-4636-9b2a-c98d22ab425d", "controls": { "class": "Controls", "trace": true, "logLevel": "debug" }, "label": "GCP Service Discovery", "remark": "Simple HTTP application with a pool using GCP service discovery", "Sample_sd_01": { "class": "Tenant", "verifiers": { }, "A1": { "class": "Application", "template": "http", "serviceMain": { "class": "Service_HTTP", "virtualAddresses": [ "10.1.1.34" ], "pool": "web_pool" }, "web_pool": { "class": "Pool", "monitors": [ "http" ], "members": [ { "servicePort": 80, "addressDiscovery": "gce", "updateInterval": 1, "tagKey": "app", "tagValue": "demo", "addressRealm": "private", "region": "us-west1" } ] } } } } Validate Results of Service Discovery with AS3 Similar to the iApp method, review the /var/log/ltm file to validate AS3 pool member discovery. You'll see basic pool member up/down messages. tail -f /var/log/ltm Example showing successful member add to pool... Another place to look is the /var/log/restnoded/restnoded.log file. You'll see messages showing status. tail -f /var/log/restnoded/restnoded.log Example showing restnoded.log sample logs... Last but not least, you can check the UI within the LTM > Pools section. AS3 is multi-tenant and therefore uses partitions (tenants). Make sure to change the partition in upper-right corner of web UI if you don't see your config objects. Change partition... View pool and pool member... As a bonus, you can test the application from a web browser. AS3 deployed full L4-L7 services in my example. Therefore, it also deployed a virtual server listening on the value in declaration parameter 'virtualAddresses' which is IP 10.1.10.34. Here is my virtual server example... This IP of 10.1.10.34 maps to a public IP associated with the BIG-IP VM in Google Cloud of 34.82.79.120 on nic0. See example NIC layout below... Open a web browser and test the app on http://34.82.79.120. Note: Service Discovery with AS3 complete Great job! You're done! Review the Appendix sections for more information. Summary I hope you enjoyed this writeup and gained some new knowledge along the way. I demonstrated a basic Google Cloud network, deployed an F5 BIG-IP instance using F5 cloud templates, and then configured service discovery to dynamically populate pool members. As for other general guidance around BIG-IP and high availability designs in the cloud, I'll leave those details for another article. Appendix: Google Networking and BIG-IP Listeners In my example, I have an external IP mapping to the BIG-IP VM private IP on nic0...no forwarding rules. Therefore Google NATs the incoming traffic from 34.82.79.120 to the VM private IP 10.1.10.34. The BIG-IP virtual server listener will be 10.1.10.34. On the other hand, Google Cloud forwarding rules map public IPs to VM instances and do not NAT. Therefore a forwarding rule of 35.85.85.125 mapping to my BIG-IP VM will result in a virtual server listener of 35.85.85.125. Remember... External public IP > VM private IP mapping = NAT to VM Forwarding rule public IP > VM instance = no NAT to VM Learn more about Google Cloud forwarding rules in the following links: Forwarding Rule Concepts - how rules interact with Google LBs Using Protocol Forwarding - allowing multiple public IPs to one VM with forwarding rules Appendix: Example Errors This is an example of incorrect permissions. You will find this in /var/log/ltm. The iCall script runs and calls the cloud provider API to get a list of pool members. Jan 24 23:40:00 jg-f5-sd err scriptd[26229]: 014f0013:3: Script (/Common/serviceDiscovery.app/serviceDiscovery_service_discovery_icall_script) generated this Tcl error: (script did not successfully complete: (jq: error: unxpected response from node worker while executing "exec /bin/bash -c "NODE_JSON=\$(curl -sku admin: https://localhost:$mgmt_port/mgmt/shared/cloud/nodes?mgmtPort=$mgmt_port\\&cloud=gce\\&memberTag=app=..." invoked from within "set members [exec /bin/bash -c "NODE_JSON=\$(curl -sku admin: https://localhost:$mgmt_port/mgmt/shared/cloud/nodes?mgmtPort=$mgmt_port\\&cloud=gce\\&m..." line:4)) Review the iCall script to see the curl command running against the cloud provider. Run it manually via CLI for troubleshooting. curl -sku admin: "https://localhost/mgmt/shared/cloud/nodes?mgmtPort=443&cloud=gce&memberTag=app=demo&memberAddressType=private&memberPort=80&providerOptions=region%3d" {"error":{"code":500,"message":"Required 'compute.instances.list' permission for 'projects/xxxxx'","innererror":{"referer":"restnoded","originalRequestBody":"","errorStack":[]}}} From the output, I had the wrong permissions as indicated in the logs (error 500 permissions). Correct the permissions for the service account that is in use by the BIG-IP VM.1.4KViews1like2CommentsSecurity Sidebar: Google Leads The Way On Sunsetting SHA-1
SHA-1 (or Secure Hash Algorithm) is a cryptographic algorithm that was developed by the National Security Agency in the 1990s and is widely used in popular cryptographic protocols like Secure Sockets Layer (SSL) and Transport Layer Security (TLS). These protocols are designed to provide secure communications over the Internet. The SHA-1 algorithm is commonly used by Certificate Authorities (CA) as a part of the overall Public Key Infrastructure (PKI). While the intent of this article is not meant to fully explain PKI, it is important to note that many CAs utilize the SHA-1 algorithm to digitally sign certificates for secure websites. These CA-issued certificates are critical for users who want to maintain a level of trust and security when accessing those secure websites. If a user visits a secure website (https) and the digital certificate is not valid, it could mean that a bad guy is attempting to steal your information. Your Internet browser (Internet Explorer, Firefox, Chrome, Safari, etc) will notice that the certificate is bad and it will alert you to the fact that you are about to engage in some non-secure communications. Each browser presents this information in a slightly different way, but they all give you an alert nonetheless. The screenshot below shows an example of Google Chrome attempting to access a secure website that has an invalid certificate. On September 5, 2014, Google announced that their popular Chrome browser will sunset the SHA-1 algorithm. They claim that SHA-1 has been a weak digital signature algorithm for at least 9 years. One of the primary reasons for this weakness is the ease of collision attacks against SHA-1, thus prompting Google to declare it no longer safe for public consumption. Google is not alone in their current fear and loathing of SHA-1...most other browsers have stated their intention to deprecate SHA-1 as well. While everyone agrees that SHA-1 needs to be replaced, not everyone agrees on the process or timeline to do so. Starting in November 2014 (as in, like, next month), Google will methodically sunset the SHA-1 algorithm starting with Chrome version 39. Websites using HTTPS whose certificate chains use SHA-1 and are valid past January 1, 2017 will no longer appear to be fully trusted in Chrome. Google Chrome has several different icon indicators in the address bar that display the overall trust factor of a given certificate chain for the website you are accessing. The first is a lock with a yellow triangle over it. This indicates a certificate chain that is "secure, but with minor errors." The next is a blank page icon. This indicates a certificate chain that is "neutral, lacking security." The last is a lock with a red X and a red strike-through text treatment in the URL scheme. This indicates a certificate chain that is "affirmatively insecure." Check out the above screenshot again...notice that it falls in the category of "affirmatively insecure" based on the red X and the red strike-through text in the URL. So, how does all this fit together? Well, Google has announced that it will start displaying these various icons on websites that use the SHA-1 algorithm. The following table shows the details of the SHA-1 certificate expiration date and the related Chrome icon display in the address bar. Today, SHA-1 is used in over 98% of certificates issued worldwide. Likewise, Google Chrome accounts for 38% of all Internet browsers used today (as of August 2014...see the chart below). When you combine the fact that Google Chrome accounts for almost 40% of all Internet browsers and SHA-1 is used in over 98% of all certificates worldwide, you can see why so many CAs are scrambling right now to re-issue new certificates in very short order. As you update/validate the certificates in your organization, you will need to verify that legacy applications will support the new algorithm. Also, if you have external hosted applications, you may need to issue new certificates so that users don't get those crazy browser warnings. This SHA-1 situation is just another reminder of the ever-changing technical world we live in. It's important to know what's out there, and it's important to stay as much ahead of it as possible.326Views0likes4CommentsThe HTTP 2.0 War has Just Begun
#stirling Microsoft takes on Google as the war to win the standard for an overdue overhaul of HTTP starts to pick up steam RFC 1945 – “Hypertext Transfer Protocol -- HTTP/1.0” – was published in May 1996. In June of 1999, RFC 2616 – “Hypertext Transfer Protocol -- HTTP/1.1” was published. In the ensuing 13 years there has been no substantial changes to the HTTP standard. None. Nada. Zilch. Even as the size and number of objects has ballooned over that time, and the overall composition of web pages grown increasingly complex, still there’s been no substantial efforts to improve upon the now entrenched HTTP standard. Even as sites struggled to maintain availability and performance in the face of exploding usage growth – fueled by mobile device proliferation, increasingly affordable access enabling everything from plants to cows to users to “get online” – HTTP 1.1 remained the standard for web-everything, despite the growing fact that it simply wasn’t the most optimal means of connecting users with the resources they expect and increasingly, demand. AJAX and Web 2.0 gave us better interactive models that alleviated some of the pain associated with performance problems, but as that model took hold and video became the medium du jour even it’s advantages have become unable to produce the acceptable results. And then Google introduced SPDY. The first shot in the HTTP 2.0 war. Now Microsoft has fired back with “Speed+Mobility” and the battle appears about to be fully engaged. Although SPDY has been out and about for some time, it only recently made it to the status of “Internet-Draft” in the RFC system, being officially published in Feb 2012. Along comes March 2012, and Microsoft has (sort of) countered with Speed+Mobility. What will be interesting as the battle progresses is to see which other organizations and vendors will side with which version (if not both). Invariably other organizations will want to be able to claim to have been co-authors of whichever standard becomes, well, the standard but choosing sides so early in a war is hardly appropriate, especially when the technical details are still (as of this writing) missing from Microsoft’s proposal. RIP-REPLACE versus UPGRADE It’s also not clear how Speed + Mobility will “retain as much compatibility as possible with the existing Web infrastructure” – a noble and laudable sentiment, to be sure – while still adopting most of the core concepts including in SPDY: HTTP Speed+Mobility RFC It [the session layer] would maintain the integrity of the layered architecture. It would use an upgrade mechanism similar to that of WebSockets. This would enable compatibility with existing proxies and connection models, without creating a mandatory dependency on TLS. [Same as SPDY] The protocol would define two types of frames: data and control. [Same as SPDY] The session layer would enable negotiation of multiple simultaneous streams for HTTP requests with minimal overhead. [Same as SPDY] The session layer would allow for prioritizing delivery of content to ensure highest value traffic is delivered first. There’s not much in the Speed + Mobility RFC on which to base a technical impact assessment on infrastructure (existing proxies and other HTTP mediating devices like load balancers) but what Microsoft appears to be saying is that it wants to leverage the concepts introduced by Google with SPDY (acknowledging their performance and ultimately scaling benefits) without leaving the familiar world of HTTP. That’s actually important, assuming it can be done, because SPDY requires significant changes to existing infrastructure – network and server – in order to operate, and it is not inherently interoperable with HTTP. Despite this, SPDY interest and inquiries are beginning to become more frequent, which means it’s getting the attention it deserves. Being the only kid on the block to really address the performance issues inherent with HTTP (especially with respect to mobile devices) that’s no surprise as the investment in new solutions to support SPDY would ostensibly see a return in the form of scalability on the server side by requiring fewer server resources to support as many if not more users. But SPDY isn’t so far along (see previous note) as to be a clear front runner. It’s still too new despite interest to have garnered widespread support or mindshare, and despite Google’s ubiquitous status as a household term for search, it isn’t necessarily synonymous with web standards. Chrome may be gaining on IE, but in the minds of most users, IE is still synonymous with web browsing. It also has a serious advantage over Google in its relationship with the enterprise and IT, and in its more intimate understanding of data center infrastructure, as is evident from its blog on the introduction of its proposal: We think that rapid adoption of HTTP 2.0 is important. To make that happen, HTTP 2.0 needs to retain as much compatibility as possible with the existing Web infrastructure. Awareness of HTTP is built into nearly every switch, router, proxy, Load balancer, and security system in use today. If the new protocol is “HTTP” in name only, upgrading all of this infrastructure would take too long. By building on existing web standards, the community can set HTTP 2.0 up for rapid adoption throughout the web. -- Speed and Mobility: An Approach for HTTP 2.0 to Make Mobile Apps and the Web Faster Google, while not necessarily openly hostile to the enterprise or infrastructure vendors who’d need to support SPDY, certainly appears indifferent to the impact of a rip-and-replace protocol model. That’s not to say Google’s approach isn’t feasible or desirable. Indeed, in some cases a “rip-and-replace” strategy is the only way to clean out the cobwebs that otherwise seem to hang onto technology for years after they’ve been superceded and superceded again. Think COBOL, which in some industries is still under active development, augmented by a hundred other technologies designed to workaround the reality that it’s an aged, outdated technology that for various reasons we are unable to simply walk away from. TAKE a SIDE ALREADY, WILL YOU?! Nope. Not gonna take a side yet – if ever. Personal preferences aside (which it’s hard to have at this point without more technical details from Microsoft) the decision whether an organization eventually wants to go with SPDY or Speed+Mobility will not at all impact negatively mediating devices. In fact, the existence of both would not negatively impact such devices because of their strategic location in the network. The existence of all three – SPDY, S+M, HTTP – would actually not negatively impact these devices as long as they were able to support all three, which seems more likely than simply choosing a side. There will be a need to support both – and likely all three (do I hear a fourth?) – protocols moving forward. Regardless of who wins this particular war and comes out crowned HTTP 2.0 champion, there will still be a need to implement support across infrastructure vendors. There will be a transitory period during which browsers and servers and infrastructure all must “get up to speed” (ha!) and will do so at different rates, making the need for intermediating devices critical. Just as is the case with the migration from IPv4 to IPv6, intermediating application delivery solutions provide the means by which organizations with substantial infrastructure investments to maintain the value of those investments while moving forward to support emerging standards. Being able to translate, for example, between SPDY and HTTP today would be a significant boon for organizations, as it requires no changes to what is likely an extensive application and server infrastructure. Similarly, assuming a pilot of Speed+Mobility, if the application delivery tier can support it, it can mediate – translate – and provide an opportunity to support users via either standard without radically disrupting the application server infrastructure. A full-proxy based application delivery infrastructure is full of advantages, after all. I like SPDY. I like it’s approach and I actually admire Google’s chutzpah in diverging from HTTP as a solution, recognizing perhaps the inherent tendency to be more concerned with backwards compatibility than with improving upon the model. But I like what Microsoft is saying from an enterprise perspective because honestly, replacing an entire infrastructure architecture to support one protocol out of many is not an appealing option, no matter the benefits. Both approaches have merit, and the bigger story is that an overhaul of HTTP is necessary - and long overdue. Web App Performance: Think 1990s. Network versus Application Layer Prioritization Oops! HTML5 Does It Again Fire and Ice, Silk and Chrome, SPDY and HTTP Grokking the Goodness of MapReduce and SPDY Google SPDY Protocol Would Require Mass Change in Infrastructure What Does Mobile Mean, Anyway? Moore’s (Traffic) Law235Views0likes1CommentWhy Layer 7 Load Balancing Doesn’t Suck
Alternative title: Didn’t We Resolve This One 10 Years Ago? There’s always been a bit of a disconnect between traditional network-focused ops and more modern, application-focused ops when it comes to infrastructure architecture. The arguments for an against layer 7 (application) load balancing first appeared nearly a decade ago when the concept was introduced. These arguments were being tossed around at the same time we were all arguing for or against various QoS (Quality of Service) technologies, namely the now infamous “rate shaping” versus “queuing” debate. If you missed that one, well, imagine an argument similar to that today of “public” versus “private” clouds. Same goals, different focus on how to get there. Seeing this one crop up again is not really any more surprising than seeing some of the other old debates bubbling to the surface. cloud computing and virtualization have brought network infrastructure and its capabilities, advantages and disadvantages – as well as its role in architecting a dynamic data center – to the fore again. But seriously? You’d think the arguments would have evolved in the past ten years. While load balancing hardware marketing execs get very excited about the fact that their product can magically scale your application by using amazing Layer 7 technology in the Load balancer such as cookie inserts and tracking/re-writing. What they fail to mention is that any application that requires the load balancer to keep track of session related information within the communications stream can never ever be scalable or reliable. -- Why Layer 7 load balancing sucks… I myself have never shied away from mentioning the session management capabilities of a full-proxy application delivery controller (ADC) and, in fact, I have spent much time advocating on behalf of its benefits to architectural flexibility, scalability, and security. Second, argument by selective observation is a poor basis upon which to make any argument, particularly this one. While persistence-based load balancing is indeed one of the scenarios in which an advanced application delivery controller is often used (and in some environments, such as scaling out VDI deployments, is a requirement) this is a fairly rudimentary task assigned to such devices. The use of layer 7 routing, aka page routing in some circles (such as Facebook), is a highly desirable capability to have at your disposable. It enables more flexible, scalable architectures by creating scalability domains based on application-specific functions. There are a plethora of architectural patterns that leverage the use of an intelligent, application-aware intermediary (i.e. layer 7 load balancing capable ADC). Here’s a few I’ve written myself, but rest assured there are many more examples of infrastructure scalability patterns out there on the Internets: Infrastructure Architecture: Avoiding a Technical Ambush Infrastructure Scalability Pattern: Partition by Function or Type Applying Scalability Patterns to Infrastructure Architecture Infrastructure Scalability Pattern: Sharding Streams Interestingly, ten years ago this argument may have held some water. This was before full-proxy application delivery controllers were natively able to extract the necessary HTTP headers (where cookies are exchanged). That means in the past, such devices had to laboriously find and extract the cookie and its value from the textual representation of the headers, which obviously took time and processing cycles (i.e. added latency). But that’s not been the case since oh, 2004-2005 when such capabilities were recognized as being a requirement for most organizations and were moved into native processing, which reduced the impact of such extraction and evaluation to a negligible, i.e. trivial, amount of delay and effectively removed as an objection this argument. THE SECURITY FACTOR Both this technique and trivial tasks like tracking and re-writing do, as pointed out, require session tracking – TCP session tracking, to be exact. Interestingly, it turns out that this is a Very Good Idea TM from a security perspective as it provides a layer of protection against DDoS attacks at lower levels of the stack and enables an application-aware ADC to more effectively thwart application-layer DDoS attacks, such as SlowLoris and HTTP GET floods. A simple, layer 4 load balancing solution (one that ignores session tracking, as is implied by the author) can neither recognize nor defend against such attacks because they maintain no sense of state. Ultimately this means the application and/or web server is at high risk of being overwhelmed by even modest attacks, because these servers are incapable of scaling sessions at the magnitude required to sustain availability under such conditions. This is true in general of high volumes of traffic or under overwhelming loads due to processor-intense workloads. A full-proxy device mitigates many of the issues associated with concurrency and throughput simply by virtue of its dual-stacked nature. SCALABILITY As far as scalability goes, really? This is such an old and inaccurate argument (and put forth with no data, to boot) that I’m not sure it’s worth presenting a counter argument. Many thousands of customers use application delivery controllers to perform layer 7 load balancing specifically for availability assurance. Terminating sessions does require session management (see above), but to claim that this fact is a detriment to availability and business continuity shows a decided lack of understanding of failover mechanisms that provide near stateful-failover (true stateful-failover is impossible at any layer). The claim that such mechanisms require network bandwidth indicates either a purposeful ignorance with respect to modern failover mechanisms or simply failure to investigate. We’ve come a long way, baby, since then. In fact, it is nigh unto impossible for a simple layer 4 load balancing mechanism to provide the level of availability and consistency claimed, because such an architecture is incapable of monitoring the entire application (which consists of all instances of the application residing on app/web servers, regardless of form-factor). See axiom #3 (Context-Aware) in “The Three Axioms of Application Delivery”. The Case (For & Against) Network-Driven Scalability in Cloud Computing Environments The Case (For & Against) Management-Driven Scalability in Cloud Computing Environments The Case (For & Against) Application-Driven Scalability in Cloud Computing Environments Resolution to the Case (For & Against) X-Driven Scalability in Cloud Computing Environments The author’s claim seems to rest almost entirely on the argument “Google does layer 4 not layer 7” to which I say, “So?” If you’re going to tell part of the story, tell the entire story. Google also does not use a traditional three-tiered approach to application architecture, nor do its most demanding services (search) require state, nor does it lack the capacity necessary to thwart attacks (which cannot be said for most organizations on the planet). There is a big difference between modern, stateless applications (and many benefits) and traditional three-tiered application architectures. Now it may in fact be the case that regardless of architecture an application and its supporting infrastructure do not require layer 7 load balancing services. In that case, absolutely – go with layer 4. But to claim layer 7 load balancing is not scalable, or resilient, or high-performance enough when there is plenty of industry-wide evidence to prove otherwise is simply a case of not wanting to recognize it.677Views0likes0CommentsDevCentral Top5 02/15/2012
Welcome to a special "yes I know it's Wednesday but I won't be here Friday" edition of the Top5. There has already been some great content in the last week or so, which makes it easy to do an edition mid-week, but that's not unusual. Given the amount of awesome content that can generally be found roaming the wilds of DevCentral, it isn't uncommon to have enough to fill up the Top5 by Wednesday. This week I am taking advantage of that fact. Though I have no doubt there will be still more goodness to come this week, you'll have to manage for yourselves...so dig deep and see what's out there! In the meantime, here are a few great pieces with which to get started: iRules Concepts: Tcl, The How and Why http://bit.ly/z9j18P One of the questions that we get asked from time to time is, "Why Tcl". Those people are referring to the interpreter we chose as the underlying infrastructure for iRules, of course. I've answered this question several times, and frankly it belies many solid, deep dive style concepts about iRules: how they work at their core, TMM interaction, byte code compilation and more, that are worth discussing. So...that's what I did. This article looks to shed some light on iRules history, anatomy, our choices in regards to their underpinnings, and why we do what we do how we do it. What it lacks in code samples and graphs, it makes up for in sheer word count (if, you know, that's your thing) but hopefully others find it useful content...I certainly did. Google reCaptcha Verification With Sideband Connections http://bit.ly/A4PAma One of the many awesome Tech Tips that George has written recently...this one eluded the Top5 in previous weeks as there was just too much good stuff to share. Having read through it again this week, though, I decided it needs to make the hit list. This shows off one of the key features in iRules for v11, sideband connections, and how to do something very handy with them. Real world applications of bleeding edge iRules features in a consumable, organized, easy to follow format ... yep, that's kind of my thing. So here it is, better late than never. Take a read and see what else George has been up to, it's definitely worth the time. F5 ARX WAN Optimization with WOM http://bit.ly/w9bhsg Pushing out an example from the field is a treat for me, and this week is no exception. Michael Fabiano, one of the FSEs here at F5, put together a very solid article on ARX WAN Opt with WOM. If you've been curious about possible solutions for multi-data center storage, this is the article for you. There are many things from an F5 perspective that can be done to streamline and optimize the general multi-location storage deployment, and those benefits are broken out here in an easy to follow (and implement) format. Whether it's ARX, WOM or both that you're looking to deploy or investigate, this picture is a good one, especially given the ways they work together. Michael does a good job of making this approachable and interesting, so take a look and learn something. F5 Friday: What's Inside an F5? http://bit.ly/yAwUi0 Lori came through last week with a solid answer to a question that seems to take many forms with this look at what actually goes on inside F5 devices. We have come a long, long way from the old 4.x and before days. As she points out things have changed all the way up and down the stack from hardware to software, as well as many massive leaps forward conceptually, allowing us to deliver a whole new level of power. Many people don't fully understand what it is that these products we talk about all the time offer at a somewhat base level. Lori does a good job here of giving some insight into that without going so deep that she loses the passengers on the trip. If you've ever wondered about TMOS, vCMP, or any of the other magic that happens internally...take a look. New iOS Edge Client http://bit.ly/wE68Lv Last but not least Pete delivered a friendly reminder today that there is a new iOS Edge Client available for download in the App store. If you, like me, are one of the many folks making use of the Edge Client from an iOS device, this new version adds some worthwhile features. I love seeing the effort being put into making our products easier to use and more accessible not just for the administrators, but for the end users as well. This new release won't change the lives of the people running the systems, but it makes it just that much easier for those of us using the products as an end user (yes, I'm an end user too), and that is valuable. I just updated my device, and figured I'd pass on the heads up as a nice way to round out the Top5 for this week. That's it for this week, as always feel free to drop me some feedback or suggestions. #Colin212Views0likes0CommentsDevCentral Top 5 01/06/2012
The holidays have passed, the new year is upon us and there is much geeky goodness to be thankful for. I am thankful for the forums and the wikis, the tech tips and blogs. I am thankful for the outstanding community that drives it all, and the supporting cast of hundreds within F5 that helps support this DevCentral thing we get to do. I am so thankful, in fact, that I am here to share five of my favorite recent DevCentral additions with you. Hurried over the holidays? Nagged after the new year? Fall behind on your feeds? Never fear, I'm more than happy to give you my Top5 picks from past weeks to give you a place to start. Keep in mind there will always be more goodness on DevCentral than anyone could pack into a single missive, even someone as wordy as me, so be sure to get out there and check it out for yourselves. For now, though, here is my first DevCentral Top 5 of 2012: $DevCentral += 1; http://bit.ly/yNw8zs We've grown! The team has gained a new face, a new name, and some wicked security chops. Josh joined the team before the holiday season and has been cranking away largely in secret since. His focus has been and will continue to be security. He'll be answering forums, checking in from conferences, keeping you abreast of the most twisted, brutal and/or interesting vulnerabilities out in the wild, hopefully with a means to fix them, and more. Part of said "more" will be contributing to the ever growing content engine that is DevCentral. He has already started, in fact! Check out this latest blog of his wherein he discuses the new(ish) slowread vulnerability along with a helpful fix from F5. He assures me there is more to discuss regarding this vuln, and having gotten to know him a bit I have no doubt this is just one of many helpful, timely, security centric posts to come. Add him to your feeds, drop a note and say hi, and check back often to see what security science Josh is dropping next. Two-Factor Auth with Google Authenticator and LDAP http://bit.ly/yO8G6a Speaking of science, I feel it is a crime that George was not gifted a lab coat and appropriately mad scientist-esque safety goggles over the holidays. He has upped the iRules Tech Tip game to a level that Jason and I agree is both awesome, and inspiring. In this article George documents how to turn your LTM and the inherent beauty within known as iRules into a two-factor auth system, integrated with LDAP, by way of Google Authenticator. In simpler terms: you can scan a QR code, enter a time based secure token, and authenticate into your systems...all via an iRule. That is the very definition of iRules science, kids. I've been raving about this one for weeks, and likely will for weeks to come. So before you hear me tell you again later, go check it out. Not only is the concept outstanding, but the write-up is second to none, so don't be dissuaded by the double black diamond sounding description. George turns this one into a bunny slope compared to what it could be if you tried to tackle it alone. External File Access from iRules via iFiles http://bit.ly/ArfZS6 There is a new tool in iRules town, and it's known as iFiles. Jason does an excellent job writing up this powerful, exciting new feature that was released for iRules in version 11.1. iFiles allow you to, as you might imagine, access files on the file system from within your iRules. This has been a popular request for years now, but there are inherent security and performance issues with giving out file system access to the LTM, something the PD crew here at F5 is understandably hesitant about. They have, however, cracked that proverbial nut and provided us with a solution that is both fast and secure. If you want the details on how it works you'll have to go read Jason's article, which you should do anyway because it rocks. Between v11 and v11.1, the iRules landscape continues to grow and become more hawesome by the version. I'm eager to see what comes next. Until then, though...go learn about iFiles. They rock. The Three Axioms of Application Delivery http://bit.ly/Av5oPf Lori took on what I have often thought an unenviable task: defining Application Delivery. This is more slippery than it sounds as the landscape is constantly changing with new technologies, application concerns and demands, security liabilities and more. Trying to specifically define exactly what one means when using the term "Application Delivery" has proved foolhardy before, and as such Lori's approach is one that appeals to me greatly. Namely, she decided not to define it directly, but instead laid out three axioms that describe the bedrock upon which the term lives and breathes, changing as it is apt to do, based on the needs and solutions of the times. Application Centric, Operational Risk Mitigating, Contextual. Those are effectively the concepts that are conveyed as the root of all things Application Delivery. Of course, many more juicy details and descriptions are a click away. Go see what you think. Do you agree? How would you define it, if you could? iRules Concepts: Connection States and Command Suspension http://bit.ly/wxY5f3 The iRules Concepts series is something I started a couple of months back in order to address some of the more esoteric functionality within iRules. Not everything fits so squarely into a command namespace or man page. Things such as command suspension and connection states within TMM warrant a bit more conversation and explanation. Given that I have seen this question come up multiple times in recent months, it seemed time to delve more deeply into the inner workings and shed some light on just what we're talking about when we use these terms. If you're an iRules geek like me, or frankly if you're curious about how F5 gear does what it does, I believe this is an interesting look at a tiny slice of that picture. If you have questions about how other things within iRules work, this would be a great place to ask. I'm really enjoying discussing the nuts and bolts of how this awesome technology does its thing, and am always keen on taking requests for future articles. There you have it, five ways to spend some time learning about what has been happening on DevCentral. For more frequent updates make sure you're registered and signed up for some of our many groups and feeds.288Views0likes0CommentsFire and Ice, Silk and Chrome, SPDY and HTTP
The impact of SPDY on infrastructure architecture The Internets were abuzz with the revelation that the custom browser Silk, distributed on Amazon’s latest endeavor Fire, leverages competitor Google’s own technological innovation, SPDY, against it. SPDY, short for "speedy" was developed by Google as a way of augmenting the regular HTTP protocol. It uses compression and several methods of optimizing and even predicting requests so resources are sent faster from the server to the browser. Amazon Silk uses SPDY for its connection to the EC2 cloud. Google also uses it in Chrome for all connections to Google sites. SPDY is an open protocol, so anyone is free to use it and Google is encouraging websites to adopt it. This isn’t the first time I’ve pondered SPDY; the last time was a dive into how SPDY combined with Map/Reduce might make for a very interesting and scalable web architecture. We could focus on the advantages of owning the “hardware” with SPDY, Fire, Chrome, and the interaction with Google and Amazon’s own cloud services, but I think it more valuable to examine SPDY from the potential impact on infrastructure, architecture, and the broader market. With both web behemoths taking advantage of SPDY, the increasing rate of consumerization of IT and eager adoption of mobile devices by consumers, it’s becoming more obvious that perhaps the fledgling protocol might indeed be something the market should take a much, much closer look at. A QUICK REVIEW of SPDY There are three main points to SPDY that are (most) relevant to modern and emerging web architectures: 1. Only one, asynchronous connection is allowed between client and server 2. SPDY sits above TCP and encapsulates HTTP 3. Requests can be prioritized The main premise of SPDY is the use of a single, asynchronous connection (1) between client and server to reduce latency inherent in network transfer times. Clients then fire off a series of requests with or without priority (3) desired over that connection. Those requests are encapsulated into SPDY (2) and sent to a SPDY-capable web server infrastructure where they are translated and processed before being returned. This process is, as Google points out, much faster than traditional acceleration techniques involving parallelization of requests because a browser simply cannot open a number of connections to the web server commensurate with the number of objects (requests) it must retrieve. Connection limitations and the synchronous nature of the HTTP protocol impose a performance penalty that is nigh unto impossible to eliminate. Hence, SPDY, which eliminates the penalty by changing the rules of the game. Google has some fairly impressive performance results from using SPDY (no doubt improved further by the proximity of Google services to the Internet backbone, as is also the case with Amazon) that I will not even attempt to refute. As I’ve long verified similar improvements in the use of TCP and HTTP multiplexing on the server-side of this equation, Google’s numbers are no doubt an accurate depiction of the improvements gained using similar techniques on the client side. Now, the question becomes, “how could a typical enterprise take advantage of this?” After all, we all want faster web sites and user-experiences. But the reliance of SPDY on support in both the client and the server infrastructure seems a potentially insurmountable challenge. While we may be able to take advantage of Silk or Chrome’s native SPDY support, we would still need to ensure the web/application server infrastructure on which web applications are deployed (in the cloud and in the data center) supports it as well. This is because the traditional HTTP transaction is encapsulated inside the SPDY frame: it must be extracted before it can be processed by traditional web/application infrastructure. IMPACT on INFRASTRUCTURE INTERMEDIARIES The most obvious impact to any infrastructure between a SPDY-enabled client and server is that it drives intermediate processing back to layer 4, to TCP. Because SPDY is its own protocol and encapsulates HTTP inside its frame, infrastructure focused on layer 7 (application, HTTP) would effectively be blinded by SPDY. While traditional layer 4 functionality – network firewall, QoS, load balancing – would remain unaffected by being in the data path, traditional layer 7 functionality – web application firewall, web access management, application switching (a.k.a. “page routing” in Facebook speak), etc… – would be rendered ineffectual. This, much in the same way IPv4 encapsulation in IPv6-enabled architectures, would render infrastructure architectures dependent on such processing inoperable. After all, if web application firewall or access management services rely on a URI to determine which policy should be applied and that URI is no longer available as part of the request, the service cannot function. Additionally, the use of encryption as an integral component of the protocol would prove problematic for many infrastructure intermediaries unable to decrypt the data and perform inspect of the contents. “TLS encrypts the contents of all transmission (except the handshake itself), making it difficult for attackers to control the data which could be used in a cross-protocol attack.” – SPDY Protocol That said, any intermediary capable of decrypting and subsequently extracting the HTTP data – such as via network-side scripting capable infrastructure - would be capable of serving in the same capacity as it has in the past, albeit while incurring some amount of latency while doing so. WHAT SPDY DOES not ADDRESS There are several operationally-related issues that SPDY introduces as well as those it does not address. For example, SPDY does not address capacity constraints nor does it intelligently apply compression. Furthermore it assumes security will be applied at the server. Architectural best practices concerning both performance and capacity dictate SSL/TLS be offloaded upstream and that TCP multiplexing between intermediate load balancing servers and servers be used to improve utilization of server resources. The behavior of SPDY assumes a persistent connection between the client and the server. Again, best practices for architecting highly available applications include the use of load balancing services to provide a mechanism for failover as well as scalability, introducing potential issues with persistence. While rudimentary, albeit effective, layer 4 load balancing services would alleviate this potential pitfall (layer 4 load balancing maps clients and servers at the TCP layer upon initial connection and thereafter act as little more than a layer 3 forwarding mechanism, a.k.a. a switch) but introduce others, such as effective server capacity management necessary to dynamically scale in and out, i.e. elasticity. Other questions remain, as well, around basic security of SPDY-enabled infrastructure. The premise of the protocol is a rapid-firing of requests with client-specified priority at a web site or application. Differentiating between legitimate clients and potential attackers may be an interesting exercise and one that is not directly addressed by SPDY with the exception of a mention regarding the ability to throttle clients. ONWARD The conclusion at this point may be that traditional architecture and infrastructure is inherently incompatible with SPDY. This is far from an accurate conclusion. For application delivery components, at least, the introduction of SPDY should not be viewed as problematic nor a show-stopper. In fact, it could be viewed as the opposite. An advanced application delivery controller serves as a SPDY-enabling technology. An advanced application delivery controller with network-side scripting ability could easily act as a translator for SPDY enabled clients interacting with non-SPDY enabled sites and applications. Through the use of network-side scripting, the HTTP data could be extracted, inserted into a traditional HTTP exchange with servers, and the responses re-injected into a SPDY-compliant data exchange with the client. Such an architecture could easily serve as a migratory step toward a fully SPDY-enabled web and application architecture, or as a means to support (the currently limited set of) SPDY-enabled browsers. Even in a SPDY-enabled architecture, an advanced application delivery controller remains a key component for security, access and capacity management. If able to extract and interpret SPDY, the intermediary retains its ability to perform processing on the data without sacrificing the inherent performance gains achieved by the nascent protocol. This allows existing web and application architectures to remain in place but supports the use of the protocol as a means of accelerating the end-user experience. This approach is likely preferred by the infrastructure market itself as it remains to be seen whether SPDY will become more widely used in mobile devices (and other browsers) and the investment to natively support what is a new protocol in what is a rather large and varied market would need to be justified by widespread demand. If intermediaries are able to "speak" SPDY natively, they could act in much the same way IPv6 Gateways today serve as an transitional step between IPv4 and IPv6. In this respect, a component based on a full-proxy architecture is perfectly suited to inserting itself (and thus its other capabilities such as security and access control) into a SPDY conversation to ensure both sides of the equation are equally efficient and performant. Because of the limited deployment and support for SPDY, this is likely to remain a non-issue for most organizations for quite some time. However, with growing use on mobile devices like Silk and other Android-based devices and the push to integrate Amazon and Google cloud services with enterprise architectures, it may be time to put SPDY on the list of technologies to keep a closer eye on. Grokking the Goodness of MapReduce and SPDY MapReduce and PageRank Notes from Remzi Arpaci-Dusseau's Fall 2008 class Map/Reduce Tutorial from Hadoop Google SPDY Protocol Would Require Mass Change in Infrastructure Message-Based Load Balancing: Scaling Diameter, RADIUS, and Message-Oriented Protocols Draft of the SPDY Protocol Amazon's Silk Browser Uses Google's Own Technologies Against It Understanding network-side scripting203Views0likes0Comments