Adopting SRE practices with F5: Observability and beyond with ELK Stack
This article is a joint collaboration between Eric Ji and JC Kwon. Getting started In the previous article, we explained SRE (Site Reliability Engineering) and how F5 helps SRE deploy and secure modern applications. We already talked observability is essential for SRE to implement SLOs. Meanwhile, we havea wide range of monitoring tools and analytic applications, each assigned to special devices or runningonly for certain applications. In this article, we will explore one of the most commonly utilized logging tools, or the ELK stack. The ELK stack is a collection of three open-source projects, namely Elasticsearch, Logstash, and Kibana. It provides IT project stakeholders the capabilities of multi-system and multi-application log aggregation and analysis.Besides, the ELK stack provides data visualization at stakeholders' fingertips, which is essential for security analytics, system monitoring, and troubleshooting. A brief description of the three projects: Elasticsearch is an open-source, full-text analysis, and search engine. Logstash is a log aggregator that executes transformations on data derived from various input sources, before transferring it to output destinations. Kibana provides data analysis and visualization capabilities for end-users, complementary to Elasticsearch. In this article, the ELK is utilized to analyze and visualize application performance through a centralized dashboard. A dashboard enables end-users to easily correlate North-South traffic with East-West traffic, for end-to-end performance visibility. Overview This use case is built on top of targeted canary deployment. As shown in the diagram below, we are taking advantage of the iRule on BIG-IP, generated a UUID is and inserted it into the HTTP header for every HTTP request packet arriving at BIG-IP. All traffic access logs will contain the UUIDs when they are sent to the ELK server, for validation of information, like user location, the response time by user location, response time of BIG-IP and NGINX plus, etc. Setup and Configuration 1.Create HSL pool, iRule on BIG-IP First, we created a High-Speed Logging (HSL) pool on BIG-IP, to be used by the ELK Stack. The HSL pool is assigned to the sampleapplication. This pool member will be used by iRule to send access logs from BIG-IP to the ELK server. The ELK server is listening for incoming log analysis requests Below is the iRule that we created. when CLIENT_ACCEPTED { set timestamp [clock format [clock seconds] -format "%d/%h/%y:%T %Z" ] } when HTTP_REQUEST { # UUID injection if { [HTTP::cookie x-request-id] == "" } { append s [clock seconds] [IP::local_addr] [IP::client_addr] [expr { int(100000000 * rand()) }] [clock clicks] set s [md5 $s] binary scan $s c* s lset s 8 [expr {([lindex $s 8] & 0x7F) | 0x40}] lset s 6 [expr {([lindex $s 6] & 0x0F) | 0x40}] set s [binary format c* $s] binary scan $s H* s set myuuid $s unset s set inject_uuid_cookie 1 } else { set myuuid [HTTP::cookie x-request-id] set inject_uuid_cookie 0 } set xff_ip "[expr int(rand()*100)].[expr int(rand()*100)].[expr int(rand()*100)].[expr int(rand()*100)]" set hsl [HSL::open -proto UDP -pool pool_elk] set http_request "\"[HTTP::method] [HTTP::uri] HTTP/[HTTP::version]\"" set http_request_time [clock clicks -milliseconds] set http_user_agent "\"[HTTP::header User-Agent]]\"" set http_host [HTTP::host] set http_username [HTTP::username] set client_ip [IP::remote_addr] set client_port [TCP::remote_port] set http_request_uri [HTTP::uri] set http_method [HTTP::method] set referer "\"[HTTP::header value referer]\"" if { [HTTP::uri] contains "test" } { HTTP::header insert "x-request-id" "test-$myuuid" } else { HTTP::header insert "x-request-id" $myuuid } HTTP::header insert "X-Forwarded-For" $xff_ip } when HTTP_RESPONSE { set syslogtime [clock format [clock seconds] -format "%h %e %H:%M:%S"] set response_time [expr {double([clock clicks -milliseconds] - $http_request_time)/1000}] set virtual [virtual] set content_length 0 if { [HTTP::header exists "Content-Length"] } { set content_length \"[HTTP::header "Content-Length"]\" } else { set content_length \"-\" } set lb_server "[LB::server addr]:[LB::server port]" if { [string compare "$lb_server" ""] == 0 } { set lb_server "" } set status_code [HTTP::status] set content_type \"[HTTP::header "Content-type"]\" # construct log for elk, local6.info <182> set log_msg "<182>$syslogtime f5adc tmos: " #set log_msg "" append log_msg "time=\[$timestamp\] " append log_msg "client_ip=$client_ip " append log_msg "virtual=$virtual " append log_msg "client_port=$client_port " append log_msg "xff_ip=$xff_ip " append log_msg "lb_server=$lb_server " append log_msg "http_host=$http_host " append log_msg "http_method=$http_method " append log_msg "http_request_uri=$http_request_uri " append log_msg "status_code=$status_code " append log_msg "content_type=$content_type " append log_msg "content_length=$content_length " append log_msg "response_time=$response_time " append log_msg "referer=$referer " append log_msg "http_user_agent=$http_user_agent " append log_msg "x-request-id=$myuuid " if { $inject_uuid_cookie == 1} { HTTP::cookie insert name x-request-id value $myuuid path "/" set inject_uuid_cookie 0 } # log local2. sending log to elk via log publisher #log local2. $log_msg HSL::send $hsl $log_msg } Next, we added a new VIP for theHSL pool which was created earlier, and applied iRule for this VIP. Then all access logs containing the respective UUID for the HTTP datagram will be sent to the ELK server. Now, the ELK server is ready for the analysis of the BIG-IP access logs. 2.Configure NGINX plus Logging We configure logging for each NGINXplus deployed inside the OpenShift cluster through the respective configmap objects. Here is one example: 3.Customize Kibana Dashboard With all configurations in place, log information will be processed by the ELK server. We will be able to customize a dashboard containing useful, visualized data, like user location, response time by location, etc. When an end-user accesses the service, the VIP will be responded and iRule will apply. Next, the user’s HTTP header information will be checked by iRule, and logs are forwarded to the ELK server for analysis. As the user is accessing the app services, the app server’s logs are also forwarded to the ELK server based on the NGINX plus configmap setting. The list of key indicators available on the Kibana dashboard page is rather long, so we won't describe all of them here. You can check detail here 4.ELK Dashboard Samples We can easily customize the data for visualization in the centralized display, and the following is just a list of dashboard samples. We can look at user location counts and response time by user location: We can check theaverage response time and max response time for each endpoint: We can seethe correlation between BIG-IP (N-S traffic) and NGINX plus endpoint(E-W traffic): We can also checkthe response time for N-S traffic: Summary In this article, we showed how the ELK stack joined forces with F5 BIG-IP and NGINX plus to provide an observability solution for visualizing application performance with a centralized dashboard. Withcombined performance metrics, it opens up a lot of possibilities for SRE's to implement SLOs practically. F5 aims to provide the best solution to support your business success and continue with more use cases. If you want to learn more about this and other SRE use cases, please visit the F5 DevCentral GitHub link here.1KViews1like0CommentsAdopting SRE practices with F5: Layered Security Policy for North-South Traffic
In an organization with enough maturity in cybersecurity and modern application architectures, there are two different cybersecurity teams that operate the more advanced security policies for the company. NetSecOps and DevSecOps are the two cybersecurity teams in an organization, and they typically have different security requirements. NetSecOps requires a ‘Standardized Application Security Policy'. They aim to block common attacks to the production network with a high level of confidence, resulting in a ‘low-false positive rate,’ at the network level. The OWASP Top 10 threats is a good example here. Moreover, the responsibility of NetSecOps is not limited to stopping basic attack types like the OWASP Top 10, but it also covers more advanced and complicated application-based attacks such as ‘Bot Attacks,’ ‘Fraud Attacks,’ and ‘DDoS Attacks.’ However, when it comes to the ‘Modern-App environment,’ it is not easy for the NetSecOps team to understand the details of the application traffic flow inside the Kubernetes or OpenShift cluster. For this reason, as far as modern applications are concerned, the security policies of NetSecOps often focus more on compliance and audit purposes. However, DevSecOps wants the application-specific security policies for different types of applications to be operating inside their Kubernetes or OpenShift clusters. This is possible since DevSecOps understands how their applications work and they want to apply more optimized security policies for their backend applications. This is why it is sometimes difficult to achieve both security team’s goals with a single security solution. This is why the enterprise needs to deploy two different WAFs to meet the different requirements from both NetSecOps and DevSecOps. This article will cover how two different security teams can achieve their goals with two separate WAF (Web Application Firewall) deployments in the network - F5 Advanced WAF for NetSecOps and NGINX App Protect for DevSecOps. Solution Overview The solution includes two F5 components – F5 Advanced WAF and NGINX App Protect. From a technological point of view, NGINX App Protectutilizes s a subset of F5 Advanced WAF functionality, meaning that their underlying technologies are the same. Each of those WAF components can run with different security policies in order to achieve different goals. In F5 Advanced WAF, NetSecOps can apply the WAF policy for the ‘coarse-grained model’ of security, while DevSecOps adopts the ‘fine-grained model’ with the NAP. In other words, this means that F5 Advanced WAF can be configured with a ‘Negative Policy,’ and NGINX App Protect can be configured with a ‘Positive Policy.’ In our use-case, we assumed that NetSecOps wants to block the OWASP Top 10 threats while DevSecOps has a different 'file accessing' policy for each backend application. The brief architecture is depicted below. Combining F5 Advanced WAF and NGINX App Protect enables layered application security policies to prevent the most complicated and advanced application-based attacks efficiently. This architecture utilizes the following workflow: 1.The F5 Advanced WAF blocks the most commonly used attack types including ‘Command Injection,’ ‘SQL Injection,’ ‘Cross-Site Scripting,’ and ‘Server Side Request Forgery’ attacks. 2.When the attacker tries to access the different files in each application, NGINX App Protect manually specifies the file types that are allowed (or disallowed) in traffic based on the security policies configured by the DevSecOps team. 3.All alert details from F5 Advanced WAF and NGINX App Protect are sent to the ‘Elasticsearch’ for central monitoring purposes. Each of the above workflows will be discussed in the following sections. ·This blog doesn’t include all the required steps to reproduce the use-case in the environment. Please refer to this link for all the required configuration steps. NGINX App Protect provides ‘Application-Specific’ policies NGINX App Protect can provide security protection and controls at the microservice level inside the Kubernetes or OpenShift cluster. The NGINX App Protect can be deployed in the OpenShift cluster as a container image. The NGINX App Protect policy configuration uses the declarative format built on a pre-defined base template. The policy uses the JSON format to represent the policy details. This file can be edited to apply a unique security policy to the NGINX App Protect instance. Once the policy is created, the policy can be attached to the 'nginx.conf' file by referencing the policy file. In this example, we used the ‘nginx_sre.conf’ file as the main configuration file for NGINX and the ‘NginxSRELabPolicy.json’ file represents the NGINX App Protect policy. NginxSRELabPolicy.json: | { "policy": { "name": "SRE_DVWA01_POLICY", "template": { "name": "POLICY_TEMPLATE_NGINX_BASE" }, "applicationLanguage": "utf-8", "enforcementMode": "blocking", "response-pages": [ { "responseContent": "<html><head><title>SRE DevSecOps - DVWA01 - Blocking Page</title></head><body><font color=green size=10>NGINX App Protect Blocking Page - DVWA01 Server</font><br><br>Please consult with your administrator.<br><br>Your support ID is: <%TS.request.ID()%><br><br><a href='javascript:history.back();'>[Go Back]</a></body></html>", "responseHeader": "HTTP/1.1 302 OK\\r\\nCache-Control: no-cache\\r\\nPragma: no-cache\\r\\nConnection: close", "responseActionType": "custom", "responsePageType": "default" } ], "blocking-settings": { "violations": [ { "name": "VIOL_FILETYPE", "alarm": true, "block": true } ] }, "filetypes": [ { "name": "*", "type": "wildcard", "allowed": true, "checkPostDataLength": false, "postDataLength": 4096, "checkRequestLength": false, "requestLength": 8192, "checkUrlLength": true, "urlLength": 2048, "checkQueryStringLength": true, "queryStringLength": 2048, "responseCheck": false }, { "name": "pdf", "allowed": false } ] } } --- The above configuration file shows the NAP policy of application #01, where the DevSecOps team wants to disallow file access to the ‘PDF’ file format. For application #02, the NAP policy is configured to reject the access to the ‘JPG’ file. And the ‘remote logging’ configuration needs to be applied on the NGINX to export the NGINX App Protect's alert details. The below configuration shows how we exported the NGINX App Protect logging details to an external device, Elasticsearch. server { listen 8080; server_name dvwa02-http; proxy_http_version 1.1; real_ip_header X-Forwarded-For; set_real_ip_from 0.0.0.0/0; app_protect_enable on; app_protect_security_log_enable on; app_protect_policy_file "/etc/nginx/NginxSRELabPolicy.json"; app_protect_security_log "/etc/app_protect/conf/log_default.json" syslog:server=your_elk_ip_here; location / { client_max_body_size 0; default_type text/html; proxy_pass http://dvwa02; proxy_set_header Host $host; } Preventing OWASP Top 10 threats in F5 Advanced WAF F5 Advanced WAF is the next-generation WAF solution designed to prevent advanced application-based attacks. It supports 1000+ proven application-level signatures, custom signatures, Machine-Learning based DDoS prevention, Intelligence-based attack mitigation, and Behavioural-based WAF functions. But in this use-case, we focused on the prevention of the OWASP Top 10 attacks, which is only a small part of the F% Advanced WAF attack overall coverage. The important point here is how we can configure the F5 Advanced WAF to apply the WAF's efficient ‘Negative Security’ model. In order to configure the correct F5 Advanced WAF policy, one should follow the procedures below: 1. Go to 'Security' -> 'Application Security' -> 'Security Policies' -> 'Create' 2. Click the security policy that was just created (SRE_DEVSEC_01) ·Click the 'View Learning and Blocking Settings' under the 'Enforcement Mode' menu 3. Expand 'Attack Signatures' and Click 'Change' menu 4. Apply the check box. ·Click 'Close' ->click 'Save' -> click 'Apply Policy' ·Apply the policy to the virtual server. (Please make sure that we're on OCP partition.) 5. 'Local Traffic' -> 'Virtual Servers' -> 'devsecops_http_vs' -> Security -> Policies Please note that the ‘virtual server’ configuration is required in the BIG-IP before proceeding to this step. Configuring custom blocking page for F5 Advanced WAF 1.Click the security policy that was created (SRE_DEVSEC_01) 2.Go to 'Response and Blocking page' -> 'Blocking page default' -> 'Custom response' -> 'Response Body' <html><head><title>SRE DevSecOps Blocking Page</title></head><body><font color=red size=12>F5 Advanced WAF Blocking Page</font><br><br>Please consult with your administrator.<br><br>Your support ID is: <%TS.request.ID()%><br><br><a href='javascript:history.back();'>[Go Back]</a></body></html> Simulating the Attack The following steps show how to simulate the application-based attacks and to see how F5 Advanced WAF and NGINX App Protect can protect the applications efficiently. Preventing OWASP Top 10 Attacks - NetSecOps First, log in to the application through the GUI and go to the ‘Command Injection’ menu. And type the command ‘8.8.8.8 | cat /etc/passwd’ and click the ‘Submit’ button. If F5 Advanced WAF works correctly, you should be able to see the below ‘blocking page’. ·You can find the instructions from the Github link here how to simulate other attack types – SQL Injection, SSRF and XSS. Restrict file accessing based on the application types - DevSecOps 1.Access to application 01 on the browser with URL -> "http://your_app_domain.com/hackable/uploads/" 2.When the ‘PDF’ file is clicked on in this directory, the following blocking screen should be shown. Summary In modern application architectures, security concerns are becoming more serious. WAF is the major security solution available to enterprise applications. The security policy of the WAF has to protect backend applications correctly, but at the same time, it must also ensure legitimate user traffic access to the backend resources without creating issues. This sounds straightforward, but it is not easy to configure the right security policies to achieve both goals simultaneously. When it comes to modern application architectures, it is even more difficult to achieve this goal. Since traditional security teams lack understanding about the application flow inside a Kubernetes or OpenShift environment, it is challenging to apply the required security policies in the WAF to protect the microservices. Due to the nature of their microservices, different applications spin up and down frequently, and security requirements are also changed on a regular basis. The cybersecurity team needs to have a solution that can fit these unique requirements. For NetSecOps, they would require a solution that can have enterprise-level protection features and operational-efficiency for their SOC team. F5 Advanced WAF is designed to efficiently prevent known and unknown types of advanced application-based attacks, while NGINX App Protect easily provides ‘application-specific’ security policies for each application inside the microservice environment. The enterprises can acquire the proper protection for their modern app environment through the combination of F5 Advanced WAF and NGINX App Protect. Please visit the DevCentral GitHub repo and follow the guidelines to try this use-case in your environment.1.3KViews1like1CommentAdopting SRE practices with F5: Targeted Canary deployment
In the last article, we covered a blue-green deployment in depth. Another approach to promote availability for SRE SLO is the canary deployment. In some cases, swapping out the entire deployment via a blue-green environment may not be desired. In a canary deployment, you upgrade an application on a subset of the infrastructure and allow a limited set of users to access the new version.This approach allows you to test the new software under a production load for a limited set of user connections, evaluate how well it meets users’ needs, and assess whether new features are functioning as designed. This article is focused on howwe can use F5 technologies (BIG-IP and NGNIX Plus) to implement the Canary deployment in an OpenShift environment. Solution Overview The solution combines the F5 Container Ingress Services (CIS) with the NGINX Plus for a microservice environment. The BIG-IP provides comprehensive L4-7 security services for N-S traffic into, out of, and between OpenShift clusters, while leveraging NGINX Plus as a micro-gateway to manage and secure (E-W) traffic inside cluster. This architecture is depicted below. Stitching the technologies together, this architecture enables the targeted canary use case. In “targeted” model, it takes canary deployment one step further by routing the users to different application versions based on the user identification, or their respective risk tolerance levels. It utilizes the following workflow: 1.The BIG-IP Access Policy Manager (APM) authenticates each user before it enters the OpenShift cluster 2.BIG-IP identifies users belonging to the ring 1, 2, or 3 user groups, and injects a group-based identifier into the HTTP header via a URI value 3.The above user identification is passed on to NGINX Plus micro-gateway, which will direct users to the correct microservice versions Each of above components will be discussed with implementation details in the following sections. APM provides user authentication BIG-IP APM is in the N-S traffic flow to authenticate and identify the users before their network traffic enters the cluster. To achieve this, we would need to: Create an APM policy as shown below Attach the above policy to HTTPS virtual server (manually, or using AS3 override) Note that in our demonstration, we simplified the deployment with 2 user groups: 1) a user group “Test1” for ring 1, to represent early adopters who voluntarily preview releases; 2) a user group “User1” for ring 2, to who consume the applications, after passing through the early adopters. We could follow above steps to configure three rings as needed. We use the AS3 override function of CIS to attach the APM policy, so that CIS remains as the source of truth. The AS3 override functionality allows us to alter the existing BIG-IP configuration using AS3 with a user-defined configmap without affecting the existing Kubernetes resources. In order to do so, we would need to add a new argument to the CIS deployment file. Run the following command to enable AS3 override functionality: --override-as3-declaration=<namespace>/<user_defined_configmap_name> An example of user-defined configmap to attach APM policy to the HTTPS virtual server is shown below (created by an OpenShift route): apiVersion: v1 kind: ConfigMap metadata: name: f5-override-as3-declaration namespace: default data: template: | { "declaration": { "openshift_AS3": { "Shared": { "bookinfo_https_dc1": { "policyIAM": { "bigip": "/Common/bookinfo" } } } } } } Next, we run the following command to create the configmap: oc create f5-override-as3-declaration.yaml Note: Restart the CIS deployment after deploying the configmap. When a user is trying to access the Bookinfo application, now it will first be authenticated with BIG-IP APM: BIG-IP injects user identification into HTTP header After the user is authenticated, BIG-IP creates a user identification and passes it on to NGINX Plus micro-gateway in order to direct users to the correct microservice version. It does so by mapping the user to a group and injecting the HTTP header with a URI value (http_x_request_id). Steps to configure the BIG-IP: Create a policy with the rule shown below: Attach the policy to the HTTPS virtual server (manually, or using AS3 override) NGINX Plus steers traffic to different versions NGINX Plus running inside OpenShift cluster will extract the user information from the HTTP header http_x_request_id, and steer traffic to the different versions of the Bookinfo review page accordingly. In the below example, we used configmap to configure the NGINX Plus POD that acts as the reverse proxy for the review services. ################################################################################################## # Configmap Review Services ################################################################################################## apiVersion: v1 kind: ConfigMap metadata: name: bookinfo-review-conf data: review.conf: |- log_format elk_format_review 'time=[$time_local] client_ip=$remote_addr virtual=$server_name client_port=$remote_port xff_ip=$remote_addr lb_server=$upstream_addr http_host=$host http_method=$request_method http_request_uri=$request_uri status_code=$status content_type="$sent_http_content_type" content_length="$sent_http_content_length" response_time=$request_time referer="$http_referer" http_user_agent="$http_user_agent" x-request-id=$myid '; upstream reviewApp { server reviews-v1:9080; } upstream reviewApp_test { server reviews-v1:9080; server reviews-v2:9080; server reviews-v3:9080; } # map to different upstream backends based on header map $http_x_request_id $pool { ~*test.* "reviewApp_test"; default "reviewApp"; } server { listen 5000; server_name review; #error_log /var/log/nginx/internalApp.error.log info; access_log syslog:server=10.69.33.1:8516 elk_format_review; #access_log /var/tmp/nginx-access.log elk_format_review; set$myid $http_x_request_id; if ($http_x_request_id ~* "(\w+)-(\w+)" ) { set$myid$2; } location / { proxy_pass http://$pool; } } NIGINX Plus will direct the user traffic to the right version of services: ·If it is “User1” or a normal user, it will be forwarded to “Ring 1” for the old version of the application ·If it is “Test1” or an early adopter, it will be forwarded to “Ring 2” for the newer version of the same application Summary Today’s enterprises increasingly rely on different expertise and skillsets, like DevOps, DevSecOps, and app developers, to work with NetOps teams to manage the sprawling application services that are driving their accelerated digital transformation.Combining BIG-IP and NGINX Plus, this architecture uniquely gives SRE the flexibility to adapt to the changing conditions of the application environments. It means we can deliver services that meet the needs of a broader set of application stakeholders. We may use BIG-IP to define the global service control and security for NetOps or SecOps, while using NGINX Plus to extend controls for more granular and application specific security to DevOps or app developers. So, go ahead: go to the DevCentral GitHub repo, download the source code behind our technologies, and follow the guide to try it out in your environment.1.1KViews0likes1Comment