series-adopting-sre-practices-with-f5
7 TopicsAccelerating Digital Transformation in Banking and Financial Services
Introduction A recent survey from Forrester’s Business Technographics shows that 33% of BFSI tech leaders are currently undertaking a digital transformation within their organizations. That’s 13 points ahead of the average across industries. Still, many enterprises worry that they aren't moving fast enough. For banking and financial services organizations, there is intense pressure to transform their enterprises to remain more competitive in an age of disruption. Evolving regulatory requirements, rapidly advancing technology, increasing customer demands, COVID-19 and competition from fintech’s are all forcing financial services firms to rethink the way they operate. Digital Transformation Challenges This digital transformation imperative requires banking and financial services organizations to improve their technical capabilities. But true transformation demands more than just new technologies. It requires strategic vision and commitment from the top of the organization to rethink and retool its culture, its processes, and its technology. Admittedly, the financial industry has a long history of not collaborating, lack of transparency, and resistance to adaptability, favoring instead confidentiality, siloed organizational structures, and risk aversion. For many years, that heritage enabled financial services firms to succeed. Existing cultural, behavioral, and organizational hurdles can be hard to overcome because they are so entrenched. New processes and technology are also necessary for digital transformation. Traditional development practices are common in the industry and are built on segmented and monolithic team structures that lack the agility required to achieve transformation. Additionally, very few possess the infrastructure and application architectures required to rapidly innovate. The Benefits of an Open Approach Digital transformation is not merely about adopting new technologies but also establishing new cultural practices and ‘ways of working’ within the IT organization.By taking an open approach to architecture, process, and culture, you can transform the way your entire organization operates. Modular architecture To create a more modular environment, banking and financial services institutions will require integration across the entire legacy network, as well as integration with partner systems, networks, and other external services such as Software-as-a-Service (SaaS) solutions. An open and composablearchitecture gives customers access to a growing range of ‘Best of Breed’ technologies from industry leaders, consumable in a frictionless “single-stack” feel. Agile process In the open organization model, collaboration is key. Modern, agile practices establish common goals and empower teams to move forward together. According to the Harvard Business Review article “Reassessing Digital Transformation:The Culture and Process Change Imperative”, financial services were more apt to say that DevOps was important than other industries, and were also more likely to have implemented agile development, project management processes, CI/CD, and DevOps. These new processes are necessary as financial services firms seek faster time to value and leverage microservices to effect this change. Open culture Open organizations are more transparent, inclusive, adaptive, collaborative, and community focused.When you view digital transformation as a continuous process—and emphasize the importance of culture in parallel to, not at the expense of, technology and process— you’re positioning your organization for a successful transformation. Technologies that Enable Digital Transformation The pandemic has accelerated the need for digital transformation in the BFSI segment.Not only have workforces become remote, but person to person contact has become less frequent.Financial organizations have not only had to scale up infrastructure and security to support a remote workforce but have also had to simultaneously scale to support a fully remote customer base. Inherent in this approach is a hybrid cloud strategy that allows the ability to scale up or down resources to meet application needs.Architectural design and practices must also align with these new cloud infrastructures.There is a need to balance the requirements for speed with the absolute necessity for security and availability.There are a few key best practices that BFSI organizations have used to balance these competing demands: ·Establish a foundation of resilience by adopting site reliability engineering (SRE) concepts. ·Rapidly deploy new services quickly based on market demand. ·Consolidated, consistent, and controlled security and access, including identity management, intrusion protection, anti-virus, predictive threat capabilities ·Application performance (response time and latency), on-demand scalability, and disaster recovery and backup •Automation for efficiency and to speed delivery, with consistency in operations and tools, continuous integration and continuous delivery (CI/CD) •System-wide business monitoring, reporting, and alerting. An Open Architecture with F5 and Red Hat Now that we have established the open approach for implementing a financial service platform and the capabilities needed for a successful digital transformation, we can examine the architecture needed to support it. It starts on the path toward site reliability engineering (SRE). In the SRE model, the operations team and the business give developers free rein to deploy new code—but only until the error budget is exceeded. At that point, development stops, and all efforts are redirected to technical debt. As shown in Figure 1, it boils down to 5 areas that an SRE team should be doing to achieve the balancing goal. Figure 1. Enabling SRE Best Practices Together, F5, Red Hat, Elasticsearch, and other ecosystem partners can deliver a suite of technologies to fulfill the extension and transformation of existing architecture to an agile financial service platform. Figure 2. SRE Microservice Architecture with F5, Red Hat, and Elasticsearch The following describes the most fundamental components of Figure 2 in more detail, to enable the SRE best practices: 1.Red Hat OpenShift Container Platform (container PaaS) provides a modular, scalable, cloud-ready, enterprise open-source platform. It includes a rich set of features to build and deploy containerized solutions and a comprehensive PaaS management portal that together extend the underlying Kubernetes platform. 2.Combining BIG-IP and NGINX, this architecture allows SRE to optimize the balance between agility and stability, by implementing blue-green and targeted canary deployment. It’s a good way to release beta features to users and gather their feedback, and test your ideas in a production environment, with reduced risk. 3.BIG-IP combined NGINX Plus also gives SRE the flexibility to adapt to the changing conditions of the application environments, address the needs of NetOps, DevOps, DevSecOps, and app developers 4.ELK is utilized to analyze and visualize application performance through a centralized dashboard. A dashboard enables end-users to easily correlate North-South traffic with East-West traffic for end-to-end performance visibility. 5.F5’s WAF offerings, including F5 Advanced WAF and NGINX App Protect, deployed across hybrid clouds, protect OpenShift clusters against exploits of web application vulnerabilities as well as malware attempting to move laterally. 6.Equally important is integration with Red Hat Ansible that enables the automated configuration of security policy enforcement for immediate remediation. 7.Built intoCI/CD pipeline so that any future changes to the application are built and deployed automatically. Conclusion Digital transformation has been accelerated by the dual challenges of Covid and the emergence of Fintech.Traditional BFSI organizations have had to respond to these enormous challenges by accelerating their deployment timelines and adopting agile processes without compromising security and availability. These practices also dovetail with the greater adoption of microservices architectures that allow for scale up and scale out of application services.F5 & NGINX helps aid this transformation by providing world class performance and security combined with a flexible microservices ADC (NGINX+). This hybrid architecture allows for Kubernetes deployments to become ‘production grade’.1.9KViews3likes0CommentsAdopting SRE practices with F5: Observability and beyond with ELK Stack
This article is a joint collaboration between Eric Ji and JC Kwon. Getting started In the previous article, we explained SRE (Site Reliability Engineering) and how F5 helps SRE deploy and secure modern applications. We already talked observability is essential for SRE to implement SLOs. Meanwhile, we havea wide range of monitoring tools and analytic applications, each assigned to special devices or runningonly for certain applications. In this article, we will explore one of the most commonly utilized logging tools, or the ELK stack. The ELK stack is a collection of three open-source projects, namely Elasticsearch, Logstash, and Kibana. It provides IT project stakeholders the capabilities of multi-system and multi-application log aggregation and analysis.Besides, the ELK stack provides data visualization at stakeholders' fingertips, which is essential for security analytics, system monitoring, and troubleshooting. A brief description of the three projects: Elasticsearch is an open-source, full-text analysis, and search engine. Logstash is a log aggregator that executes transformations on data derived from various input sources, before transferring it to output destinations. Kibana provides data analysis and visualization capabilities for end-users, complementary to Elasticsearch. In this article, the ELK is utilized to analyze and visualize application performance through a centralized dashboard. A dashboard enables end-users to easily correlate North-South traffic with East-West traffic, for end-to-end performance visibility. Overview This use case is built on top of targeted canary deployment. As shown in the diagram below, we are taking advantage of the iRule on BIG-IP, generated a UUID is and inserted it into the HTTP header for every HTTP request packet arriving at BIG-IP. All traffic access logs will contain the UUIDs when they are sent to the ELK server, for validation of information, like user location, the response time by user location, response time of BIG-IP and NGINX plus, etc. Setup and Configuration 1.Create HSL pool, iRule on BIG-IP First, we created a High-Speed Logging (HSL) pool on BIG-IP, to be used by the ELK Stack. The HSL pool is assigned to the sampleapplication. This pool member will be used by iRule to send access logs from BIG-IP to the ELK server. The ELK server is listening for incoming log analysis requests Below is the iRule that we created. when CLIENT_ACCEPTED { set timestamp [clock format [clock seconds] -format "%d/%h/%y:%T %Z" ] } when HTTP_REQUEST { # UUID injection if { [HTTP::cookie x-request-id] == "" } { append s [clock seconds] [IP::local_addr] [IP::client_addr] [expr { int(100000000 * rand()) }] [clock clicks] set s [md5 $s] binary scan $s c* s lset s 8 [expr {([lindex $s 8] & 0x7F) | 0x40}] lset s 6 [expr {([lindex $s 6] & 0x0F) | 0x40}] set s [binary format c* $s] binary scan $s H* s set myuuid $s unset s set inject_uuid_cookie 1 } else { set myuuid [HTTP::cookie x-request-id] set inject_uuid_cookie 0 } set xff_ip "[expr int(rand()*100)].[expr int(rand()*100)].[expr int(rand()*100)].[expr int(rand()*100)]" set hsl [HSL::open -proto UDP -pool pool_elk] set http_request "\"[HTTP::method] [HTTP::uri] HTTP/[HTTP::version]\"" set http_request_time [clock clicks -milliseconds] set http_user_agent "\"[HTTP::header User-Agent]]\"" set http_host [HTTP::host] set http_username [HTTP::username] set client_ip [IP::remote_addr] set client_port [TCP::remote_port] set http_request_uri [HTTP::uri] set http_method [HTTP::method] set referer "\"[HTTP::header value referer]\"" if { [HTTP::uri] contains "test" } { HTTP::header insert "x-request-id" "test-$myuuid" } else { HTTP::header insert "x-request-id" $myuuid } HTTP::header insert "X-Forwarded-For" $xff_ip } when HTTP_RESPONSE { set syslogtime [clock format [clock seconds] -format "%h %e %H:%M:%S"] set response_time [expr {double([clock clicks -milliseconds] - $http_request_time)/1000}] set virtual [virtual] set content_length 0 if { [HTTP::header exists "Content-Length"] } { set content_length \"[HTTP::header "Content-Length"]\" } else { set content_length \"-\" } set lb_server "[LB::server addr]:[LB::server port]" if { [string compare "$lb_server" ""] == 0 } { set lb_server "" } set status_code [HTTP::status] set content_type \"[HTTP::header "Content-type"]\" # construct log for elk, local6.info <182> set log_msg "<182>$syslogtime f5adc tmos: " #set log_msg "" append log_msg "time=\[$timestamp\] " append log_msg "client_ip=$client_ip " append log_msg "virtual=$virtual " append log_msg "client_port=$client_port " append log_msg "xff_ip=$xff_ip " append log_msg "lb_server=$lb_server " append log_msg "http_host=$http_host " append log_msg "http_method=$http_method " append log_msg "http_request_uri=$http_request_uri " append log_msg "status_code=$status_code " append log_msg "content_type=$content_type " append log_msg "content_length=$content_length " append log_msg "response_time=$response_time " append log_msg "referer=$referer " append log_msg "http_user_agent=$http_user_agent " append log_msg "x-request-id=$myuuid " if { $inject_uuid_cookie == 1} { HTTP::cookie insert name x-request-id value $myuuid path "/" set inject_uuid_cookie 0 } # log local2. sending log to elk via log publisher #log local2. $log_msg HSL::send $hsl $log_msg } Next, we added a new VIP for theHSL pool which was created earlier, and applied iRule for this VIP. Then all access logs containing the respective UUID for the HTTP datagram will be sent to the ELK server. Now, the ELK server is ready for the analysis of the BIG-IP access logs. 2.Configure NGINX plus Logging We configure logging for each NGINXplus deployed inside the OpenShift cluster through the respective configmap objects. Here is one example: 3.Customize Kibana Dashboard With all configurations in place, log information will be processed by the ELK server. We will be able to customize a dashboard containing useful, visualized data, like user location, response time by location, etc. When an end-user accesses the service, the VIP will be responded and iRule will apply. Next, the user’s HTTP header information will be checked by iRule, and logs are forwarded to the ELK server for analysis. As the user is accessing the app services, the app server’s logs are also forwarded to the ELK server based on the NGINX plus configmap setting. The list of key indicators available on the Kibana dashboard page is rather long, so we won't describe all of them here. You can check detail here 4.ELK Dashboard Samples We can easily customize the data for visualization in the centralized display, and the following is just a list of dashboard samples. We can look at user location counts and response time by user location: We can check theaverage response time and max response time for each endpoint: We can seethe correlation between BIG-IP (N-S traffic) and NGINX plus endpoint(E-W traffic): We can also checkthe response time for N-S traffic: Summary In this article, we showed how the ELK stack joined forces with F5 BIG-IP and NGINX plus to provide an observability solution for visualizing application performance with a centralized dashboard. Withcombined performance metrics, it opens up a lot of possibilities for SRE's to implement SLOs practically. F5 aims to provide the best solution to support your business success and continue with more use cases. If you want to learn more about this and other SRE use cases, please visit the F5 DevCentral GitHub link here.1KViews1like0CommentsAdopting SRE practices with F5: Layered Security Policy for North-South Traffic
In an organization with enough maturity in cybersecurity and modern application architectures, there are two different cybersecurity teams that operate the more advanced security policies for the company. NetSecOps and DevSecOps are the two cybersecurity teams in an organization, and they typically have different security requirements. NetSecOps requires a ‘Standardized Application Security Policy'. They aim to block common attacks to the production network with a high level of confidence, resulting in a ‘low-false positive rate,’ at the network level. The OWASP Top 10 threats is a good example here. Moreover, the responsibility of NetSecOps is not limited to stopping basic attack types like the OWASP Top 10, but it also covers more advanced and complicated application-based attacks such as ‘Bot Attacks,’ ‘Fraud Attacks,’ and ‘DDoS Attacks.’ However, when it comes to the ‘Modern-App environment,’ it is not easy for the NetSecOps team to understand the details of the application traffic flow inside the Kubernetes or OpenShift cluster. For this reason, as far as modern applications are concerned, the security policies of NetSecOps often focus more on compliance and audit purposes. However, DevSecOps wants the application-specific security policies for different types of applications to be operating inside their Kubernetes or OpenShift clusters. This is possible since DevSecOps understands how their applications work and they want to apply more optimized security policies for their backend applications. This is why it is sometimes difficult to achieve both security team’s goals with a single security solution. This is why the enterprise needs to deploy two different WAFs to meet the different requirements from both NetSecOps and DevSecOps. This article will cover how two different security teams can achieve their goals with two separate WAF (Web Application Firewall) deployments in the network - F5 Advanced WAF for NetSecOps and NGINX App Protect for DevSecOps. Solution Overview The solution includes two F5 components – F5 Advanced WAF and NGINX App Protect. From a technological point of view, NGINX App Protectutilizes s a subset of F5 Advanced WAF functionality, meaning that their underlying technologies are the same. Each of those WAF components can run with different security policies in order to achieve different goals. In F5 Advanced WAF, NetSecOps can apply the WAF policy for the ‘coarse-grained model’ of security, while DevSecOps adopts the ‘fine-grained model’ with the NAP. In other words, this means that F5 Advanced WAF can be configured with a ‘Negative Policy,’ and NGINX App Protect can be configured with a ‘Positive Policy.’ In our use-case, we assumed that NetSecOps wants to block the OWASP Top 10 threats while DevSecOps has a different 'file accessing' policy for each backend application. The brief architecture is depicted below. Combining F5 Advanced WAF and NGINX App Protect enables layered application security policies to prevent the most complicated and advanced application-based attacks efficiently. This architecture utilizes the following workflow: 1.The F5 Advanced WAF blocks the most commonly used attack types including ‘Command Injection,’ ‘SQL Injection,’ ‘Cross-Site Scripting,’ and ‘Server Side Request Forgery’ attacks. 2.When the attacker tries to access the different files in each application, NGINX App Protect manually specifies the file types that are allowed (or disallowed) in traffic based on the security policies configured by the DevSecOps team. 3.All alert details from F5 Advanced WAF and NGINX App Protect are sent to the ‘Elasticsearch’ for central monitoring purposes. Each of the above workflows will be discussed in the following sections. ·This blog doesn’t include all the required steps to reproduce the use-case in the environment. Please refer to this link for all the required configuration steps. NGINX App Protect provides ‘Application-Specific’ policies NGINX App Protect can provide security protection and controls at the microservice level inside the Kubernetes or OpenShift cluster. The NGINX App Protect can be deployed in the OpenShift cluster as a container image. The NGINX App Protect policy configuration uses the declarative format built on a pre-defined base template. The policy uses the JSON format to represent the policy details. This file can be edited to apply a unique security policy to the NGINX App Protect instance. Once the policy is created, the policy can be attached to the 'nginx.conf' file by referencing the policy file. In this example, we used the ‘nginx_sre.conf’ file as the main configuration file for NGINX and the ‘NginxSRELabPolicy.json’ file represents the NGINX App Protect policy. NginxSRELabPolicy.json: | { "policy": { "name": "SRE_DVWA01_POLICY", "template": { "name": "POLICY_TEMPLATE_NGINX_BASE" }, "applicationLanguage": "utf-8", "enforcementMode": "blocking", "response-pages": [ { "responseContent": "<html><head><title>SRE DevSecOps - DVWA01 - Blocking Page</title></head><body><font color=green size=10>NGINX App Protect Blocking Page - DVWA01 Server</font><br><br>Please consult with your administrator.<br><br>Your support ID is: <%TS.request.ID()%><br><br><a href='javascript:history.back();'>[Go Back]</a></body></html>", "responseHeader": "HTTP/1.1 302 OK\\r\\nCache-Control: no-cache\\r\\nPragma: no-cache\\r\\nConnection: close", "responseActionType": "custom", "responsePageType": "default" } ], "blocking-settings": { "violations": [ { "name": "VIOL_FILETYPE", "alarm": true, "block": true } ] }, "filetypes": [ { "name": "*", "type": "wildcard", "allowed": true, "checkPostDataLength": false, "postDataLength": 4096, "checkRequestLength": false, "requestLength": 8192, "checkUrlLength": true, "urlLength": 2048, "checkQueryStringLength": true, "queryStringLength": 2048, "responseCheck": false }, { "name": "pdf", "allowed": false } ] } } --- The above configuration file shows the NAP policy of application #01, where the DevSecOps team wants to disallow file access to the ‘PDF’ file format. For application #02, the NAP policy is configured to reject the access to the ‘JPG’ file. And the ‘remote logging’ configuration needs to be applied on the NGINX to export the NGINX App Protect's alert details. The below configuration shows how we exported the NGINX App Protect logging details to an external device, Elasticsearch. server { listen 8080; server_name dvwa02-http; proxy_http_version 1.1; real_ip_header X-Forwarded-For; set_real_ip_from 0.0.0.0/0; app_protect_enable on; app_protect_security_log_enable on; app_protect_policy_file "/etc/nginx/NginxSRELabPolicy.json"; app_protect_security_log "/etc/app_protect/conf/log_default.json" syslog:server=your_elk_ip_here; location / { client_max_body_size 0; default_type text/html; proxy_pass http://dvwa02; proxy_set_header Host $host; } Preventing OWASP Top 10 threats in F5 Advanced WAF F5 Advanced WAF is the next-generation WAF solution designed to prevent advanced application-based attacks. It supports 1000+ proven application-level signatures, custom signatures, Machine-Learning based DDoS prevention, Intelligence-based attack mitigation, and Behavioural-based WAF functions. But in this use-case, we focused on the prevention of the OWASP Top 10 attacks, which is only a small part of the F% Advanced WAF attack overall coverage. The important point here is how we can configure the F5 Advanced WAF to apply the WAF's efficient ‘Negative Security’ model. In order to configure the correct F5 Advanced WAF policy, one should follow the procedures below: 1. Go to 'Security' -> 'Application Security' -> 'Security Policies' -> 'Create' 2. Click the security policy that was just created (SRE_DEVSEC_01) ·Click the 'View Learning and Blocking Settings' under the 'Enforcement Mode' menu 3. Expand 'Attack Signatures' and Click 'Change' menu 4. Apply the check box. ·Click 'Close' ->click 'Save' -> click 'Apply Policy' ·Apply the policy to the virtual server. (Please make sure that we're on OCP partition.) 5. 'Local Traffic' -> 'Virtual Servers' -> 'devsecops_http_vs' -> Security -> Policies Please note that the ‘virtual server’ configuration is required in the BIG-IP before proceeding to this step. Configuring custom blocking page for F5 Advanced WAF 1.Click the security policy that was created (SRE_DEVSEC_01) 2.Go to 'Response and Blocking page' -> 'Blocking page default' -> 'Custom response' -> 'Response Body' <html><head><title>SRE DevSecOps Blocking Page</title></head><body><font color=red size=12>F5 Advanced WAF Blocking Page</font><br><br>Please consult with your administrator.<br><br>Your support ID is: <%TS.request.ID()%><br><br><a href='javascript:history.back();'>[Go Back]</a></body></html> Simulating the Attack The following steps show how to simulate the application-based attacks and to see how F5 Advanced WAF and NGINX App Protect can protect the applications efficiently. Preventing OWASP Top 10 Attacks - NetSecOps First, log in to the application through the GUI and go to the ‘Command Injection’ menu. And type the command ‘8.8.8.8 | cat /etc/passwd’ and click the ‘Submit’ button. If F5 Advanced WAF works correctly, you should be able to see the below ‘blocking page’. ·You can find the instructions from the Github link here how to simulate other attack types – SQL Injection, SSRF and XSS. Restrict file accessing based on the application types - DevSecOps 1.Access to application 01 on the browser with URL -> "http://your_app_domain.com/hackable/uploads/" 2.When the ‘PDF’ file is clicked on in this directory, the following blocking screen should be shown. Summary In modern application architectures, security concerns are becoming more serious. WAF is the major security solution available to enterprise applications. The security policy of the WAF has to protect backend applications correctly, but at the same time, it must also ensure legitimate user traffic access to the backend resources without creating issues. This sounds straightforward, but it is not easy to configure the right security policies to achieve both goals simultaneously. When it comes to modern application architectures, it is even more difficult to achieve this goal. Since traditional security teams lack understanding about the application flow inside a Kubernetes or OpenShift environment, it is challenging to apply the required security policies in the WAF to protect the microservices. Due to the nature of their microservices, different applications spin up and down frequently, and security requirements are also changed on a regular basis. The cybersecurity team needs to have a solution that can fit these unique requirements. For NetSecOps, they would require a solution that can have enterprise-level protection features and operational-efficiency for their SOC team. F5 Advanced WAF is designed to efficiently prevent known and unknown types of advanced application-based attacks, while NGINX App Protect easily provides ‘application-specific’ security policies for each application inside the microservice environment. The enterprises can acquire the proper protection for their modern app environment through the combination of F5 Advanced WAF and NGINX App Protect. Please visit the DevCentral GitHub repo and follow the guidelines to try this use-case in your environment.1.3KViews1like1CommentProtecting Critical Apps against EastWest Attack
In the previous article, we explained how NetSecOps and DevSecOps could manage their application security policies to prevent advanced attacks from external organization networks. But in advanced persistent hacking, hackers sometimes exploit application vulnerabilities and use advanced malware with phishing emails to the operators. This is an old technique but still valid and utilized by many APT (Advanced Persistent Threat) Hacking Groups. And if the advanced hackers obtain a DevOps operator's ID and password using the malware, they could access a Kubernetes or OpenShift cluster through the normal login process and easily bypass advanced WAF(Web Application Firewall) solutions deployed in front of the cluster. Once the attacker can get a user ID and password of the Kubernetes or OpenShift cluster, the attacker also can access each application that is running inside of the cluster. Since most people on the SecOps team normally install very basic security functions inside the Kubernetes or OpenShift cluster, the hacker who logged in to the cluster can attack other applications in the same cluster without any security barrier. F5 Container Ingress Service is not designed to stop these sort of attacks within the cluster. To overcome this challenge, we have another tool, NGINX App Protect. NGINX App Protect delivers Layer 7 visibility and granular control for the applications while enabling an advanced application security policies. With an NGINX App Protect deployment, DevSecOps can ensure only legitimate traffic is allowed while all other unwanted traffic is blocked. NGINX App Protect can monitor the traffic traversing namespace boundaries between pods and provide advanced application protection at layer 7 for East-West traffic. Solution Overview This article will cover how NGINX App Protect can protect the critical applications in an OpenShift environment against an attack originating within the same cluster. Detecting advanced application attacks inside the cluster is beneficial for the DevSecOps team but this can increase the complexity of security operations. To provide a certain level of protection for the critical application the NGINX App Protect instance should be installed as a ‘PoD Proxy’ or a ‘Service Proxy’ for the application. This means the customer may need multiple NGINX App Protect instances to have the required level of protection for their applications. On the face of it this might seem like a dramatic increase in the complexity of security related operations. Security automation is the recommended solution to overcome the increased complexity of this security operations challenge. In this use case, we use Red Hat Ansible as our security automation tool. With Red Hat Ansible, the user can automate their incident response process with their existing security solutions. This can dramatically reduce the security team’s response time from hours to minutes. We use Ansible and Elasticsearch to provide all the required ‘security automation’ processes in this demo. With all these combined technologies, the solution provides WAFprotection for the critical applicationsdeployed in the OpenShift cluster. Once it detects the application-based attack from the same cluster subnet, it immediately blocks the attack and deletes the compromised pod with a pre-defined security automation playbook. The workflow is organized as shown below: The malware of 'Phishing email' infects the developer's laptop. The attacker steals the ID/PW of the developer using the malware. In this demo, the stolen ID is 'dev_user.' The attacker logs in the 'Test App' on the 'dev-test01' namespace, owned by the 'dev_user'. The attacker starts the network-scanning process on the internal subnet of the OpenShift cluster. And the attacker finds the 'critical-app' application pod. The attacker starts the web-based attack against 'critical-app'. NGINX App Protect protects the 'critical-app'; thus, the attack traffic is blocked immediately. NGINX exports the alert details to the external Elasticsearch. If this specific alert meets a pre-defined condition, Elasticsearch will trigger the pre-defined Ansible playbook. Ansible playbook accesses OpenShift and deletes the compromised 'Test App’ pod automatically. *Since this demo focuses on an attack inside the OpenShift cluster, the demo does not include the 'Step#1' and 'Step#2' (Phishing email). Understanding of the ‘Security Automation’ process The ‘Security Automation’ is the key part of this demo because the organizations don’t want to respond to each WAF alert manually, one by one. Manual incident-response processes are a time-consuming job and inefficient, especially in a modern-app environment with hundreds of container-based applications. In this demo, Red Hat Ansible and Elasticsearch take the security automation. Below is the brief workflow of the security automation of this use case. In this use case, the F5 Advanced WAF has been deployed in front of the OpenShift cluster and has inserted the X-Forwarded-For header value at each session. Since F5 Advanced WAF inserts the X-Forwarded-For header into the packet that comes from the external, if the packet doesn’t include the X-Forwarded-For header, it is likely coming from the internal network. NGINX App Protectinstalled as a pod proxy’ with the critical application we want to protect. Because NGINX App Protect runs as a pod proxy, all the traffic must be sent through this to reach the ‘critical-application.’ If the NGINX App Protect detects any malicious activities, it sends the alert details to the external Elasticsearch System. When any new alerts come from the NGINX App Protect, Elasticsearch analyzes the details of the alerts. If the alert meets the below conditions, Elasticsearch triggers the notification to the Logstash. If the source IP address of the alert is a part of the OpenShift cluster subnet… If the WAF alert severity is Critical… Once the Logstash system receives the notification from the Elasticsearch, it creates the ip.txt file, which includes the source IP address of the attack and executes the pre-defined Ansible playbook. Ansible playbook reads the ip.txt file and extracts the IP address from the file. And Ansible accesses the OpenShift and finds the compromised pod using that Source IP Address from the ip.txt file. Then Ansible deletes the compromised pod and ip.txt files automatically. Creates Ansible Playbook Red Hat Ansible is the automation tool that enables network and security automation for users with enterprise-ready functions. F5 and Red Hat have a strategic partnership and deliver the joint use cases for our customer base. With Ansible integration with F5 solutions, organizations can have the single pane of glass management for network and security automation. In this use-case, we implement an automated security response process with the Ansible playbook when the F5 NGINX App Protect detects malicious activities in the OpenShift cluster. Below is the Ansible playbook to execute the incident response process for the attacker's compromised pod. ansible_ocp.yaml --- - hosts: localhost gather_facts: false tasks: - name: Login to OCP cluster k8s_auth: host: https://yourocpdomain:6443 username: kubeadmin password: your_ocp_password validate_certs: no register: k8s_auth_result - name: Extract IP Address command: cat /yourpath/ip.txt register: badpod_ip - name: Extract App Label from OpenShift shell: | sudo oc get pods -A -o json --field-selector status.podIP={{ badpod_ip.stdout }} | grep "\"app\":" | awk '{print $2}' | sed 's/,//' register: app_label - name: Delete Malicious Deployments shell: | sudo oc delete all --selector app={{ app_label.stdout }} -A register: delete_pod - name: Delete IP and Info File command: rm -rf /yourpath/ip.txt - name: OCP Service Deletion Completed debug: msg: "{{ delete_pod.stdout }}" Configuring Elasticsearch Watcher and Logstash To trigger the Ansible playbook for the Security Automation, SOC analysts need to validate the alert from the NGINX App Protect first. And based on the difference of the alert details, the SOC analyst might want to execute a different playbook. For example, if the alert is related to a Credential Stuffing Attack, the SOC analysts may want to block the user's application access. But if the alert is related to the known IP Blacklist, the analyst might want to block that IP address in the firewall. To support these requirements, the security team needs to have a tool that can monitor the security alerts and trigger the required actions based on them. Elasticsearch Watcher is the feature of the commercial version of Elasticsearch that users can use to create actions based on conditions, which are periodically evaluated using queries on the data. Configuring the Watcher of Kibana * You need an Elastic Platinum license or Eval license to use this feature on the Kibana. * Go to Kibana UI. * Management -> Watcher -> Create -> Create advanced watcher * Copy and paste below JSON code watcher_ocp.json { "trigger": { "schedule": { "interval": "1m" } }, "input": { "search": { "request": { "search_type": "query_then_fetch", "indices": [ "nginx-*" ], "rest_total_hits_as_int": true, "body": { "query": { "bool": { "must": [ { "match": { "outcome_reason": "SECURITY_WAF_VIOLATION" } }, { "match": { "x_forwarded_for_header_value": "N/A" } }, { "range": { "@timestamp": { "gte": "now-1h", "lte": "now" } } } ] } } } } } }, "condition": { "compare": { "ctx.payload.hits.total": { "gt": 0 } } }, "actions": { "logstash_logging": { "webhook": { "scheme": "http", "host": "localhost", "port": 1234, "method": "post", "path": "/{{watch_id}}", "params": {}, "headers": {}, "body": "{{ctx.payload.hits.hits.0._source.ip_client}}" } }, "logstash_exec": { "webhook": { "scheme": "http", "host": "localhost", "port": 9001, "method": "post", "path": "/{{watch_id}}", "params": {}, "headers": {}, "body": "{{ctx.payload.hits.hits[0].total}}" } } } } 2. Configuring 'logstash.conf' file. Below is the final version of the 'logstash.conf' file. Please note that you have to start the logstash with 'sudo' privilege logstash.conf input { syslog { port => 5003 type => nginx } http { port => 1234 type => watcher1 } http { port => 9001 type => ansible1 } } filter { if [type] == "nginx" { grok { match => { "message" => [ ",attack_type=\"%{DATA:attack_type}\"", ",blocking_exception_reason=\"%{DATA:blocking_exception_reason}\"", ",date_time=\"%{DATA:date_time}\"", ",dest_port=\"%{DATA:dest_port}\"", ",ip_client=\"%{DATA:ip_client}\"", ",is_truncated=\"%{DATA:is_truncated}\"", ",method=\"%{DATA:method}\"", ",policy_name=\"%{DATA:policy_name}\"", ",protocol=\"%{DATA:protocol}\"", ",request_status=\"%{DATA:request_status}\"", ",response_code=\"%{DATA:response_code}\"", ",severity=\"%{DATA:severity}\"", ",sig_cves=\"%{DATA:sig_cves}\"", ",sig_ids=\"%{DATA:sig_ids}\"", ",sig_names=\"%{DATA:sig_names}\"", ",sig_set_names=\"%{DATA:sig_set_names}\"", ",src_port=\"%{DATA:src_port}\"", ",sub_violations=\"%{DATA:sub_violations}\"", ",support_id=\"%{DATA:support_id}\"", ",unit_hostname=\"%{DATA:unit_hostname}\"", ",uri=\"%{DATA:uri}\"", ",violation_rating=\"%{DATA:violation_rating}\"", ",vs_name=\"%{DATA:vs_name}\"", ",x_forwarded_for_header_value=\"%{DATA:x_forwarded_for_header_value}\"", ",outcome=\"%{DATA:outcome}\"", ",outcome_reason=\"%{DATA:outcome_reason}\"", ",violations=\"%{DATA:violations}\"", ",violation_details=\"%{DATA:violation_details}\"", ",request=\"%{DATA:request}\"" ] } break_on_match => false } mutate { split => { "attack_type" => "," } split => { "sig_ids" => "," } split => { "sig_names" => "," } split => { "sig_cves" => "," } split => { "sig_set_names" => "," } split => { "threat_campaign_names" => "," } split => { "violations" => "," } split => { "sub_violations" => "," } remove_field => [ "date_time", "message" ] } if [x_forwarded_for_header_value] != "N/A" { mutate { add_field => { "source_host" => "%{x_forwarded_for_header_value}"}} } else { mutate { add_field => { "source_host" => "%{ip_client}"}} } geoip { source => "source_host" database => "/etc/logstash/GeoLite2-City.mmdb" } } } output { if [type] == 'nginx' { elasticsearch { hosts => ["127.0.0.1:9200"] index => "nginx-%{+YYYY.MM.dd}" } } if [type] == 'watcher1' { file { path => "/yourpath/ip.txt" codec => line { format => "%{message}"} } } if [type] == 'ansible1' { exec { command => "ansible-playbook /yourpath/ansible_ocp.yaml" } } } Simulate the demo You should start the Kibana watcher and logstash services first before proceeding with this step. Kubeadmin Console Please make sure you're logged in to the OCP cluster using a cluster-admin account. And confirm the 'critical-app' is running correctly. j.lee$ oc whoami kube:admin j.lee$ j.lee$ oc get projects NAME DISPLAY NAME STATUS critical-app Active default Active dev-test02 Active kube-node-lease Active kube-public Active kube-system Active openshift Active openshift-apiserver Active openshift-apiserver-operator Active openshift-authentication Active openshift-authentication-operator Active openshift-cloud-credential-operator Active j.lee$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES critical-app-v1-5c6546765f-wjhl9 2/2 Running 1 85m 10.129.2.71 ip-10-0-180-68.ap-southeast-1.compute.internal <none> <none> j.lee$ dev_user Console Please make sure you're logged in to the OCP cluster using 'dev_user' account on the compromised pod and confirm the 'dev-test-app' is running correctly. PS C:\Users\ljwca\Documents\ocp> oc whoami dev_user PS C:\Users\ljwca\Documents\ocp> PS C:\Users\ljwca\Documents\ocp> oc get projects NAME DISPLAY NAME STATUS dev-test02 Active PS C:\Users\ljwca\Documents\ocp> PS C:\Users\ljwca\Documents\ocp> oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dev-test-v1-674f467644-t94dc 1/1 Running 0 6s 10.128.2.38 ip-10-0-155-159.ap-southeast-1.compute.internal <none> <none> 2. Login to 'dev-test' container using remote shell command of the OCP PS C:\Users\ljwca\Documents\ocp> oc rsh dev-test-v1-674f467644-t94dc $ $ uname -a Linux dev-test-v1-674f467644-t94dc 4.18.0-193.14.3.el8_2.x86_64 #1 SMP Mon Jul 20 15:02:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux 3. Network scanning This step takes 1~2 hours to complete all scanning. $ nmap -sP 10.128.0.0/14 Starting Nmap 7.80 ( https://nmap.org ) at 2020-09-29 17:20 UTC Nmap scan report for ip-10-128-0-1.ap-southeast-1.compute.internal (10.128.0.1) Host is up (0.0025s latency). Nmap scan report for ip-10-128-0-2.ap-southeast-1.compute.internal (10.128.0.2) Host is up (0.0024s latency). Nmap scan report for 10-128-0-3.metrics.openshift-authentication-operator.svc.cluster.local (10.128.0.3) Host is up (0.0023s latency). Nmap scan report for 10-128-0-4.metrics.openshift-kube-scheduler-operator.svc.cluster.local (10.128.0.4) Host is up (0.0027s latency). . . . After completion of the scanning, you will be able to find the 'critical-app' on the list. 4. Application Scanning for the target You can find the open service ports on the target using nmap. $ nmap 10.129.2.71 Starting Nmap 7.80 ( https://nmap.org ) at 2020-09-29 17:23 UTC Nmap scan report for 10-129-2-71.critical-app.critical-app.svc.cluster.local (10.129.2.71) Host is up (0.0012s latency). Not shown: 998 closed ports PORT STATE SERVICE 80/tcp open http 8888/tcp open sun-answerbook Nmap done: 1 IP address (1 host up) scanned in 0.12 seconds $ But you will see the 403 error when you try to access the server using port 80. This happens because the default Apache access control only allows the traffic from the NGINX App Protect. $ curl http://10.129.2.71/ <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>403 Forbidden</title> </head><body> <h1>Forbidden</h1> <p>You don't have permission to access this resource.</p> <hr> <address>Apache/2.4.46 (Debian) Server at 10.129.2.71 Port 80</address> </body></html> $ Now, you can see the response through port 8888. $ curl http://10.129.2.71:8888/ <html> <head> <title> Network Operation Utility - NSLOOKUP </title> </head> <body> <font color=blue size=12>NSLOOKUP TOOL</font><br><br> <h2>Please type the domain name into the below box.</h2> <h1> <form action="/index.php" method="POST"> <p> <label for="target">DNS lookup:</label> <input type="text" id="target" name="target" value="www.f5.com"> <button type="submit" name="form" value="submit">Lookup</button> </p> </form> </h1> <font color=red>This site is vulnerable to Web Exploit. Please use this site as a test purpose only.</font> </body> </html> $ 5. Performing the Command Injection attack. $ curl -d "target=www.f5.com|cat /etc/passwd&form=submit" -X POST http://10.129.2.71:8888/index.php <html><head><title>SRE DevSecOps - East-West Attack Blocking</title></head><body><font color=green size=10>NGINX App Protect Blocking Page</font><br><br>Please consult with your administrator.<br><br>Your support ID is: 878077205548544462<br><br><a href='javascript:history.back();'>[Go Back]</a></body></html>$ $ 6. Verify the logs in Kibana dashboard You should be able to see the NGINX App Protect alerts on your Elasticsearch. You should be able to see the NGINX App Protect alerts on your ELK. 7. Verify the Ansible terminates the compromised pod Ansible deletes the compromised pod. Summary Today’s cyber based threats are getting more and more sophisticated. Attackers keep attempting to find out the weakest link in the company’s infrastructure and finally move from there to the data in the company using that link. In most cases, the weakest link of the organization is the human and the company stores its critical data in the application. This is why the attackers use the phishing email to compromise the user’s laptop and leverage it to access the application. While F5 is working very closely with our key alliance partners such as Cisco and FireEye to stop the advanced malware at the first stage, our NGINX App Protect can work as another layer of defence for the application to protect the organization's data. F5, Red Hat, and Elastic have developed this new protection mechanism, which is an automated process. This use case allows the DevSecOps team to easily deploy the advanced security layer in their OpenShift cluster. If you want tolearn moreabout this use case, please visit the F5 Business Development official Github linkhere.908Views0likes0CommentsAdopting SRE practices with F5: Targeted Canary deployment
In the last article, we covered a blue-green deployment in depth. Another approach to promote availability for SRE SLO is the canary deployment. In some cases, swapping out the entire deployment via a blue-green environment may not be desired. In a canary deployment, you upgrade an application on a subset of the infrastructure and allow a limited set of users to access the new version.This approach allows you to test the new software under a production load for a limited set of user connections, evaluate how well it meets users’ needs, and assess whether new features are functioning as designed. This article is focused on howwe can use F5 technologies (BIG-IP and NGNIX Plus) to implement the Canary deployment in an OpenShift environment. Solution Overview The solution combines the F5 Container Ingress Services (CIS) with the NGINX Plus for a microservice environment. The BIG-IP provides comprehensive L4-7 security services for N-S traffic into, out of, and between OpenShift clusters, while leveraging NGINX Plus as a micro-gateway to manage and secure (E-W) traffic inside cluster. This architecture is depicted below. Stitching the technologies together, this architecture enables the targeted canary use case. In “targeted” model, it takes canary deployment one step further by routing the users to different application versions based on the user identification, or their respective risk tolerance levels. It utilizes the following workflow: 1.The BIG-IP Access Policy Manager (APM) authenticates each user before it enters the OpenShift cluster 2.BIG-IP identifies users belonging to the ring 1, 2, or 3 user groups, and injects a group-based identifier into the HTTP header via a URI value 3.The above user identification is passed on to NGINX Plus micro-gateway, which will direct users to the correct microservice versions Each of above components will be discussed with implementation details in the following sections. APM provides user authentication BIG-IP APM is in the N-S traffic flow to authenticate and identify the users before their network traffic enters the cluster. To achieve this, we would need to: Create an APM policy as shown below Attach the above policy to HTTPS virtual server (manually, or using AS3 override) Note that in our demonstration, we simplified the deployment with 2 user groups: 1) a user group “Test1” for ring 1, to represent early adopters who voluntarily preview releases; 2) a user group “User1” for ring 2, to who consume the applications, after passing through the early adopters. We could follow above steps to configure three rings as needed. We use the AS3 override function of CIS to attach the APM policy, so that CIS remains as the source of truth. The AS3 override functionality allows us to alter the existing BIG-IP configuration using AS3 with a user-defined configmap without affecting the existing Kubernetes resources. In order to do so, we would need to add a new argument to the CIS deployment file. Run the following command to enable AS3 override functionality: --override-as3-declaration=<namespace>/<user_defined_configmap_name> An example of user-defined configmap to attach APM policy to the HTTPS virtual server is shown below (created by an OpenShift route): apiVersion: v1 kind: ConfigMap metadata: name: f5-override-as3-declaration namespace: default data: template: | { "declaration": { "openshift_AS3": { "Shared": { "bookinfo_https_dc1": { "policyIAM": { "bigip": "/Common/bookinfo" } } } } } } Next, we run the following command to create the configmap: oc create f5-override-as3-declaration.yaml Note: Restart the CIS deployment after deploying the configmap. When a user is trying to access the Bookinfo application, now it will first be authenticated with BIG-IP APM: BIG-IP injects user identification into HTTP header After the user is authenticated, BIG-IP creates a user identification and passes it on to NGINX Plus micro-gateway in order to direct users to the correct microservice version. It does so by mapping the user to a group and injecting the HTTP header with a URI value (http_x_request_id). Steps to configure the BIG-IP: Create a policy with the rule shown below: Attach the policy to the HTTPS virtual server (manually, or using AS3 override) NGINX Plus steers traffic to different versions NGINX Plus running inside OpenShift cluster will extract the user information from the HTTP header http_x_request_id, and steer traffic to the different versions of the Bookinfo review page accordingly. In the below example, we used configmap to configure the NGINX Plus POD that acts as the reverse proxy for the review services. ################################################################################################## # Configmap Review Services ################################################################################################## apiVersion: v1 kind: ConfigMap metadata: name: bookinfo-review-conf data: review.conf: |- log_format elk_format_review 'time=[$time_local] client_ip=$remote_addr virtual=$server_name client_port=$remote_port xff_ip=$remote_addr lb_server=$upstream_addr http_host=$host http_method=$request_method http_request_uri=$request_uri status_code=$status content_type="$sent_http_content_type" content_length="$sent_http_content_length" response_time=$request_time referer="$http_referer" http_user_agent="$http_user_agent" x-request-id=$myid '; upstream reviewApp { server reviews-v1:9080; } upstream reviewApp_test { server reviews-v1:9080; server reviews-v2:9080; server reviews-v3:9080; } # map to different upstream backends based on header map $http_x_request_id $pool { ~*test.* "reviewApp_test"; default "reviewApp"; } server { listen 5000; server_name review; #error_log /var/log/nginx/internalApp.error.log info; access_log syslog:server=10.69.33.1:8516 elk_format_review; #access_log /var/tmp/nginx-access.log elk_format_review; set$myid $http_x_request_id; if ($http_x_request_id ~* "(\w+)-(\w+)" ) { set$myid$2; } location / { proxy_pass http://$pool; } } NIGINX Plus will direct the user traffic to the right version of services: ·If it is “User1” or a normal user, it will be forwarded to “Ring 1” for the old version of the application ·If it is “Test1” or an early adopter, it will be forwarded to “Ring 2” for the newer version of the same application Summary Today’s enterprises increasingly rely on different expertise and skillsets, like DevOps, DevSecOps, and app developers, to work with NetOps teams to manage the sprawling application services that are driving their accelerated digital transformation.Combining BIG-IP and NGINX Plus, this architecture uniquely gives SRE the flexibility to adapt to the changing conditions of the application environments. It means we can deliver services that meet the needs of a broader set of application stakeholders. We may use BIG-IP to define the global service control and security for NetOps or SecOps, while using NGINX Plus to extend controls for more granular and application specific security to DevOps or app developers. So, go ahead: go to the DevCentral GitHub repo, download the source code behind our technologies, and follow the guide to try it out in your environment.1.1KViews0likes1CommentAdopting SRE practices with F5: Multi-cluster Blue-green deployment
In lastarticle, wecoveredblue-green deploymentas the most straightforward SRE deployment modelatahigh level,herewe are divingdeeperinto the detailsto see how F5 technologies enable this use case. Let’sstart offbylooking atsome of the key components. F5 DNS Load Balancer Cloud Service(GSLB) The first component of the solution is F5CloudService. TheDNS Load Balancer provides GSLB as a cloud-hosted SaaS service with built-in DDoS protection and an API-firstapproach.A blue-green deployment aims to minimize downtimedue to app deployment, and there are some basicroutingmechanismsout oftheboxwithOpenShiftthat assist in this area. However, ifwe are looking forswift routing switchwithmore flexibilityandreliabilityacross differentOpenShiftclusters, different clouds, or geo locations, this is when F5 DNS Load Balancer Cloud Service comes into the picture. Setting up DNS for F5 Cloud Services This solution requires thatyour corporateDNS server delegatesa DNSzone (akasubdomain)to theF5DNS Load BalancerCloud Service. An OpenShift cluster typically has its own domaincreatedfor the applications, for example: *.apps.<cluster name>.example.com.Theend user,however,doesn't really use such a long name and instead queries for www.example.com.A CNAME record is often used tomapone domainname(analias)to another (thetrue domainname). All set up, this is theDNSscenario: In case the customer hasmore than onecluster,itrequiresoneCNAME recordpercluster,with requestsload balancedamong clusters.The drawbacks ofthis type of solutionsinclude: No comprehensive health checkingandmonitoring Unabletoswitchworkloads across clustersat speed Lack of automation and integration with the OpenShiftcluster F5 Cloud Services provides these features in amulti-cluster and multi-cloudinfrastructure around the globe with the ease of aSaaSsolution,without the need ofinfrastructure modifications.You will set up your corporateDNStouseF5DNS Load Balancer CloudServiceasfollows: Here is a sample configuration foraCloud/CorporateDNS: You can register an F5 Cloud Service account, and then subscribe to DNS Load BalancerSservicehere: F5CloudServices F5 GSLB toolfor Ansible Automation The blue-green deployment represents a sequence of steps to rollout your new application.GSLB toolis developed toprovide a common automation plane for both OpenShift and F5 Cloud Service. LeveragingthedeclarativeAPI fromF5DNSLoadBalancerCloud Serviceand OpenShift, we used Ansible to automate the process. Itenables you to standardize and automate release and deployment by orchestrating the build, test, provisioning, configuration management, and deployment tools in yourContinuousDelivery pipeline. More specifically,GSLB toolautomatesyourinteraction with: theOpenShift/K8s deployments Retrieve Layer 7 routesfrom given project/namespace and OpenShift Cluster CopyLayer 7routes of a given project/namespace from one OpenShift Cluster to another F5 DNS Load BalancerService Createof GSLBload Balanced Records (LBRs)along with needed pieces (Monitors, IP endpoints, Pools etc.) Set the GSLB ratio for each deployment for a given project/namespace The benefits of using GSLB tool to automate the entire process: Improve speed and scale especially with100’s ofOpenShiftroutes Eliminate room for human error Achievedeterministic and repeatable outcomes I want to give credit tomy colleague,Ulises Alonso Camaro,who developedthe GSLB tool.Please refer to theGitHubfor details of the GSLB tool, andwikion how to set up the tool and operation. Buildand Run the Blue-green Deployment Now we canlookathow we can use F5 DNS Load BalancerServiceand GSLB tool to canary test the new version and manipulate the traffic routingfor Blue-green deployment.In Blue-green deployment, we deploy two versions of the application running simultaneously in two identical production environments called Blue (OpenShift Cluster 1) and Green (new OpenShift Cluster 2). Step 1. Retrieve routes from Blue cluster and push to F5 DNSLoad BalancerCloud Service Once you haveinstalledthe GSLB tool and configured thedeployment settingsfor your infrastructure,the firstset ofcommandsto runare ./project-retrieve defaultaws1&&./gslb-commit "publish routes from Blue cluster to F5 DNS load balancer" These commandsretrieve the OpenShift route(s) from your Blue clusteraws1, andthenpublishtheretrieved routes intoF5 DNS Load BalancerCloud Service. Step 2. Retrieve routes from Green cluster and push toF5 DNSLoad Balancer Cloud Service User to input the following commands: ./project-retrieve defaultaws2&&./gslb-commit "publish routes from Blue cluster to F5 DNS load balancer" Thesecommands will retrieve the OpenShift route(s) from your Green clusteraws2, andpush toF5 DNS Load Balancer Cloud Servicesconfiguration Step 3. Canary test green deployment User to input the following command: ./project-ratios default '{"aws1": "90", "aws2": "10" }&&./gslb-commit "canary testing blue and green clusters" The commands will set the traffic ratio fortheBlue (90%) andtheGreen deployment (10%) andpublish the configuration. As you can see,F5 DNS Load Balancer Cloud Servicesets the traffic ratio for each endpoint accordingly. Step 4. Switch traffic to Green After the testing succeeds, itis time to switch production traffic totheGreen cluster. User to input the following commands: ./project-evacuate defaultaws1&&./gslb-commit "switch all traffic to green cluster" The commands will switch the traffic completely fromtheBlue to theGreen deployment. More ArchitecturalPatterns There are many related patternsforBlue-greendeployment, each ofwhichoffers a different focus for an automated production deployment. Some examplevariantsinclude: Infrastructure as Code (IaC)In this variant of the pattern the release deployment target environment does not exist until it is created by the DevOps pipeline.Post deployment the original ‘blue’ environment is scheduled for destruction once the ‘green’ environment is considered stable in production. Container-based DeploymentIn this variant of the pattern the release deployment target is represented as a collection of one or more containers.Post release, once the ‘green’ environment is considered stable in production, the containers represented by the ‘blue’ container group are scheduled for destruction. Our solution can address allBlue-green deploymentvariants, withresources used in theblueandgreenenvironments can becreated or destroyed as needed, orthey can begeographically distributed. WhileContinuous Deployment (CD) is a natural fit for the Blue-green deployment,F5 DNSLoad BalancerCloud Servicecombined with GSLB tool can enable manypossibilitiesand support a collection of architecture patterns including: Migrateapplication froma source cluster(OCP 3.x)to a destination cluster(OCP 4.x),referherefor details MigrateworkloadfromKubernetescluster to OpenShift Cluster Modernize your application deployment with Lift and Shift. Repackage your application running as a set of VM’s into containers, and deploy then into OpenShift or Kubernetes cluster Built intoCI/CD pipeline so that any future changes to the application are built and deployed automatically. We arecontinuouslyworking onmore usagepatternsandwillexplore in more details in future blog posts. What’s next? So,go aheadtoDevCentralGitHub repo,download source code behind our technologies,follow the guide totestit out in yourownenvironment.1.4KViews1like1CommentAdopting Site Reliability Engineering with F5
Foreword The role of the Site Reliability Engineering (SRE) is common in cloud first enterprises and becoming more widespreadin traditional IT teams.Here, we would like to kick off this article series to look at the concepts that give SRE shape, outline the primary tools and best practices that make it possible, and explore some common use cases around Continuous Deployment (CD) strategy, visibility and security. While SRE and DevOps share many areas of commonality, there are significant differences between them. DevOps is a loose set of practices, guidelines, and culture designed to break down silos in Development, IT operations, Network, and Security team. DevOps does not tell you how to run operations at a detailed level. On the other hand, SRE, a term pioneered by Google, brings an opinionated framework to the problem of how to run operations effectively. If you think of DevOps as a philosophy, you can argue that SRE implements some of the philosophy that DevOps describes. In a way, SRE implements DevOps practices. After all, SRE only works at all if we have tools and technologies to enable it. Balancing Release Velocity and Reliability SRE aims to find the balance between feature velocity and reliability, which are often treated as opposing goals. Despite the risk of making changes to software, these changes are necessary for the business to succeed. Instead of advocating against change, SRE uses the concept of Service Level Objectives (SLOs) and error budgets to measure the impact of releases on reliability. The goal is to ship software as quickly as possible while meeting the reliability targets the users expect. While there are a wide range of ways an SRE-focused IT team might optimize the balance between agility and stability, two deployment models stand out for their widespread applicability and general ease of execution: Blue-green deployment For SRE, availability is currently the most common SLO. If getting new software to your users and uninterrupted access is truly required, there needs to be engineering work to implement load balancing or fractional release measures like blue-green or canary deployments to minimize any downtime. Recovery is a factor too. The idea behind blue-green deployment is that your blue environment is your existing production environment carrying live traffic. In parallel, you provision a green environment, which is identical to the blue environment other than the new version of your code. As you prepare a new version of your software, deployment and the final stage of testing takes place inthe environment that is not live: in this example, Green (or new OpenShift Cluster). When it's time to deploy, you route production traffic from the blue environment to the green environment. This technique can eliminate downtime due to app deployment. In addition, blue-green deployment reduces risk: if something unexpected happens with your new version on Green, you can immediately roll back by reverting traffic to the original blue environment. When you are looking for manipulating the traffic with more flexibility, reliability, across different clusters, different clouds, or geo locations, this is when F5 DNS Load Balancer Cloud Service comes into the picture. F5 Cloud service GSLB is a SaaS offering. It can provide automatic failover, load balancing across multiple locations, increased reliability by avoiding a single point of failure, and increased performance by directing traffic to the optimal site. This allows SRE to move fast while still maintaining enterprise grade. Targeted Canary deployment Another approach to promote availability for SRE SLO is canary deployment. In some cases, swapping out the entire deployment via a blue-green environment may not be desired. In a canary deployment, you upgrade an application on a subset of the infrastructure and allow a limited set of users to access the new version. This approach allows you to test the new software under a Production-like load, evaluate how well it meets users’ needs, and assess whether new features are profitable. One approachoftenused by Azure DevOps is ring deployment model. Users fall into three general buckets based on their respective different risk profiles: Ring 1 - Canaries who voluntarily test bleeding edge features as soon as they are available. Ring 2 - Early adopters who voluntarily preview releases, considered more refined than the canary bits. Ring 3 - Users who consume the products, after passing through canaries and early adopters. Developer can promote and target new versions of the same application (version 1.2, 1.1, 1.0) to targeted users (ring 1, 2 and 3) respectively, without involving and waiting the infrastructure operations team (NoOps). To identify theuserfor the right version, you maychoose tosimplyuse IP address, authenticate directlyby backend, oradd an authenticationlayerin front of the backend.F5 technologies can helpenable this targeted canary use case: BIG-IP APM in N-S will authenticate and identify users as ring 1, 2 or 3, and inject user identification into HTTP header This identification is passed on to NGINX plus micro-gateway to direct users to the correct microservice versions. Combining BIG-IP and NGINX, this architecture uniquely gives SRE the flexibility to adapt with the ability to define the baseline service control and security (for NetOps or SecOps), while extending controls for more granular and enhanced security to the developer team (for DevOps). The need for observability For SRE, at the heart of implementing SLOs practically is monitoring. You can't understand what you can't see. A classic and common approach to monitoring is to watch for a specific value or condition, and then to trigger an alert when that value is exceeded or that condition occurs. One of the valid monitoringoutputsis logging, which is recorded for diagnosis or forensic purposes. The ELK stack, a collection of three open source projects, namely Elasticsearch, Logstash and Kibana, provides IT project stakeholders the capabilities of multi-system and multi-application log aggregation and analysis. ELKcan beutilized for the analysis and visualization of applicationmetricsthrough a centralized dashboard. With general visibility in place, tracking can be enabled in order to add a level of specificity to what is being observed.Taking advantageofiRuleon BIG-IP,NetOps can generateUUID and insertitinto the HTTP header of every HTTP request packet arriving at BIG-IP. All traffic access logs containing UUIDs,fromBIG-IP and NGINX,are sent to the ELK server, for validation of informationsuch asuser location, response time by user location, response timeetc. Through the dashboard, end-users can easily correlate North-South traffic (processed by BIG-IP) with East-West traffic (processed by NIGNX+ inside cluster), for an end-to-end performance visibility. In turn, tracking performance metrics opens up the possibility of defining service level objectives (SLO). With observability, security is possible Security incident will always occur, and hence it's essential to integrate security into observability. What’s most important is giving reliability engineers the tools so that they can identify the security problem, work around it, and fix it as quickly as possible. Using the right set of tools, you can build custom autogenerated dashboards and tooling to expose the generated information to engineers in a way that makes it much easier to sort through everything and determine the root cause of a security problem. These include things like Kibana dashboard, which allows engineers to investigate incident, apply filters, quickly pinpoint suspicious data traffic and source. In concert with F5 Advanced WAF and NGINX App Protect, SRE can protect applications against software vulnerabilities and common attacks from both inside and outsidemicroservice clusters.UponBIG-IP Advance WAF orNGINX App Protect detect suspicious traffic, it sends alert with details toELK stack, whichwillindex, and processthe data, and thenexecute the pre-defined ‘Ansible Playbook’,to enforce security policy into Kubernetes or NGINX App Protect for immediate remediation. SRE does not only identify but rectify the anomalies by enacting security policy enforcement along the data path.Detect once and protecteverywhere. What’s next? This serves as an introductiontoor the first article of this SRE article series.In the coming articles, we will deep dive into each of the use cases, to showcase the technical details about how we are leveraging F5technologies and capabilitiestohelpSRE bring together DevOps, NetOps, and SecOps to develop the safeguards and implement the best practices. To learn more about developing a business case for SRE in your organization, please reach out toanF5 Business Development. For technical details and additional information, see thisDevCentralGitHub repo.1.1KViews0likes0Comments