logging
87 TopicsImplementing BIG-IP WAF logging and visibility with ELK
Scope This technical article is useful for BIG-IP users familiar with web application security and the implementation and use of the Elastic Stack.This includes, application security professionals, infrastructure management operators and SecDevOps/DevSecOps practitioners. The focus is for WAF logs exclusively.Firewall, Bot, or DoS mitigation logging into the Elastic Stack is the subject of a future article. Introduction This article focusses on the required configuration for sending Web Application Firewall (WAF) logs from the BIG-IP Advanced WAF (or BIG-IP ASM) module to an Elastic Stack (a.k.a. Elasticsearch-Logstash-Kibana or ELK). First, this article goes over the configuration of BIG-IP.It is configured with a security policy and a logging profile attached to the virtual server that is being protected. This can be configured via the BIG-IP user interface (TMUI) or through the BIG-IP declarative interface (AS3). The configuration of the Elastic Strack is discussed next.The configuration of filters adapted to processing BIP-IP WAF logs. Finally, the article provides some initial guidance to the metrics that can be taken into consideration for visibility.It discusses the use of dashboards and provides some recommendations with regards to the potentially useful visualizations. Pre-requisites and Initial Premise For the purposes of this article and to follow the steps outlined below, the user will need to have at least one BIG-IP Adv. WAF running TMOS version 15.1 or above (note that this may work with previous version but has not been tested).The target BIG-IP is already configured with: A virtual Server A WAF policy An operational Elastic Stack is also required. The administrator will need to have configuration and administrative privileges on both the BIG-IP and Elastic Stack infrastructure.They will also need to be familiar with the network topology linking the BIG-IP with the Elastic Search cluster/infrastructure. It is assumed that you want to use your Elastic Search (ELK) logging infrastructure to gain visibility into BIG-IP WAF events. Logging Profile Configuration An essential part of getting WAF logs to the proper destination(s) is the Logging Profile.The following will go over the configuration of the Logging Profile that sends data to the Elastic Stack. Overview of the steps: Create Logging Profile Associate Logging Profile with the Virtual Server After following the procedure below On the wire, logs lines sent from the BIG-IP are comma separated value pairs that look something like the sample below: Aug 25 03:07:19 localhost.localdomainASM:unit_hostname="bigip1",management_ip_address="192.168.41.200",management_ip_address_2="N/A",http_class_name="/Common/log_to_elk_policy",web_application_name="/Common/log_to_elk_policy",policy_name="/Common/log_to_elk_policy",policy_apply_date="2020-08-10 06:50:39",violations="HTTP protocol compliance failed",support_id="5666478231990524056",request_status="blocked",response_code="0",ip_client="10.43.0.86",route_domain="0",method="GET",protocol="HTTP",query_string="name='",x_forwarded_for_header_value="N/A",sig_ids="N/A",sig_names="N/A",date_time="2020-08-25 03:07:19",severity="Error",attack_type="Non-browser Client,HTTP Parser Attack",geo_location="N/A",ip_address_intelligence="N/A",username="N/A",session_id="0",src_port="39348",dest_port="80",dest_ip="10.43.0.201",sub_violations="HTTP protocol compliance failed:Bad HTTP version",virus_name="N/A",violation_rating="5",websocket_direction="N/A",websocket_message_type="N/A",device_id="N/A",staged_sig_ids="",staged_sig_names="",threat_campaign_names="N/A",staged_threat_campaign_names="N/A",blocking_exception_reason="N/A",captcha_result="not_received",microservice="N/A",tap_event_id="N/A",tap_vid="N/A",vs_name="/Common/adv_waf_vs",sig_cves="N/A",staged_sig_cves="N/A",uri="/random",fragment="",request="GET /random?name=' or 1 = 1' HTTP/1.1\r\n",response="Response logging disabled" Please choose one of the methods below.The configuration can be done through the web-based user interface (TMUI), the command line interface (TMSH), directly with a declarative AS3 REST API call, or with the BIG-IP native REST API.This last option is not discussed herein. TMUI Steps: Create Profile Connect to the BIG-IP web UI and login with administrative rights Navigate to Security >> Event Logs >> Logging Profiles Select “Create” Fill out the configuration fields as follows: Profile Name (mandatory) Enable Application Security Set Storage Destination to Remote Storage Set Logging Format to Key-Value Pairs (Splunk) In the Server Addresses field, enter an IP Address and Port then click on Add as shown below: Click on Create Add Logging Profile to virtual server with the policy Select target virtual server and click on the Security tab (Local Traffic >> Virtual Servers : Virtual Server List >> [target virtualserver] ) Highlight the Log Profile from the Available column and put it in the Selected column as shown in the example below (log profile is “log_all_to_elk”): Click on Update At this time the BIG-IP will forward logs Elastic Stack. TMSH Steps: Create profile ssh into the BIG-IP command line interface (CLI) from the tmsh prompt enter the following: create security log profile [name_of_profile] application add { [name_of_profile] { logger-type remote remote-storage splunk servers add { [IP_address_for_ELK]:[TCP_Port_for_ELK] { } } } } For example: create security log profile dc_show_creation_elk application add { dc_show_creation_elk { logger-type remote remote-storage splunk servers add { 10.45.0.79:5244 { } } } } 3. ensure that the changes are saved: save sys config partitions all Add Logging Profile to virtual server with the policy 1.From the tmsh prompt (assuming you are still logged in) enter the following: modify ltm virtual [VS_name] security-log-profiles add { [name_of_profile] } For example: modify ltm virtual adv_waf_vs security-log-profiles add { dc_show_creation_elk } 2.ensure that the changes are saved: save sys config partitions all At this time the BIG-IP sends logs to the Elastic Stack. AS3 Application Services 3 (AS3) is a BIG-IP configuration API endpoint that allows the user to create an application from the ground up.For more information on F5’s AS3, refer to link. In order to attach a security policy to a virtual server, the AS3 declaration can either refer to a policy present on the BIG-IP or refer to a policy stored in XML format and available via HTTP to the BIG-IP (ref. link). The logging profile can be created and associated to the virtual server directly as part of the AS3 declaration. For more information on the creation of a WAF logging profile, refer to the documentation found here. The following is an example of a pa rt of an AS3 declaration that will create security log profile that can be used to log to Elastic Stack: "secLogRemote": { "class": "Security_Log_Profile", "application": { "localStorage": false, "maxEntryLength": "10k", "protocol": "tcp", "remoteStorage": "splunk", "reportAnomaliesEnabled": true, "servers": [ { "address": "10.45.0.79", "port": "5244" } ] } In the sample above, the ELK stack IP address is 10.45.0.79 and listens on port 5244 for BIG-IP WAF logs.Note that the log format used in this instance is “Splunk”.There are no declared filters and thus, only the illegal requests will get logged to the Elastic Stack.A sample AS3 declaration can be found here. ELK Configuration The Elastic Stack configuration consists of creating a new input on Logstash.This is achieved by adding an input/filter/ output configuration to the Logstash configuration file.Optionally, the Logstash administrator might want to create a separate pipeline – for more information, refer to this link. The following is a Logstash configuration known to work with WAF logs coming from BIG-IP: input { syslog { port => 5244 } } filter { grok { match => { "message" => [ "attack_type=\"%{DATA:attack_type}\"", ",blocking_exception_reason=\"%{DATA:blocking_exception_reason}\"", ",date_time=\"%{DATA:date_time}\"", ",dest_port=\"%{DATA:dest_port}\"", ",ip_client=\"%{DATA:ip_client}\"", ",is_truncated=\"%{DATA:is_truncated}\"", ",method=\"%{DATA:method}\"", ",policy_name=\"%{DATA:policy_name}\"", ",protocol=\"%{DATA:protocol}\"", ",request_status=\"%{DATA:request_status}\"", ",response_code=\"%{DATA:response_code}\"", ",severity=\"%{DATA:severity}\"", ",sig_cves=\"%{DATA:sig_cves}\"", ",sig_ids=\"%{DATA:sig_ids}\"", ",sig_names=\"%{DATA:sig_names}\"", ",sig_set_names=\"%{DATA:sig_set_names}\"", ",src_port=\"%{DATA:src_port}\"", ",sub_violations=\"%{DATA:sub_violations}\"", ",support_id=\"%{DATA:support_id}\"", "unit_hostname=\"%{DATA:unit_hostname}\"", ",uri=\"%{DATA:uri}\"", ",violation_rating=\"%{DATA:violation_rating}\"", ",vs_name=\"%{DATA:vs_name}\"", ",x_forwarded_for_header_value=\"%{DATA:x_forwarded_for_header_value}\"", ",outcome=\"%{DATA:outcome}\"", ",outcome_reason=\"%{DATA:outcome_reason}\"", ",violations=\"%{DATA:violations}\"", ",violation_details=\"%{DATA:violation_details}\"", ",request=\"%{DATA:request}\"" ] } break_on_match => false } mutate { split => { "attack_type" => "," } split => { "sig_ids" => "," } split => { "sig_names" => "," } split => { "sig_cves" => "," } split => { "staged_sig_ids" => "," } split => { "staged_sig_names" => "," } split => { "staged_sig_cves" => "," } split => { "sig_set_names" => "," } split => { "threat_campaign_names" => "," } split => { "staged_threat_campaign_names" => "," } split => { "violations" => "," } split => { "sub_violations" => "," } } if [x_forwarded_for_header_value] != "N/A" { mutate { add_field => { "source_host" => "%{x_forwarded_for_header_value}"}} } else { mutate { add_field => { "source_host" => "%{ip_client}"}} } geoip { source => "source_host" } } output { elasticsearch { hosts => ['localhost:9200'] index => "big_ip-waf-logs-%{+YYY.MM.dd}" } } After adding the configuration above to the Logstash parameters, you will need to restart the Logstash instance to take the new logs into configuration.The sample above is also available here. The Elastic Stack is now ready to process the incoming logs.You can start sending traffic to your policy and start seeing logs populating the Elastic Stack. If you are looking for a test tool to generate traffic to your Virtual Server, F5 provides a simpleWAF tester tool that can be found here. At this point, you can start creating dashboards on the Elastic Stack that will satisfy your operational needs with the following overall steps: ·Ensure that the log index is being created (Stack Management >> Index Management) ·Create a Kibana Index Pattern (Stack Management>>Index patterns) ·You can now peruse the logs from the Kibana discover menu (Discover) ·And start creating visualizations that will be included in your Dashboards (Dashboards >> Editing Simple WAF Dashboard) A complete Elastic Stack configuration can be found here – note that this can be used with both BIG-IP WAF and NGINX App Protect. Conclusion You can now leverage the widely available Elastic Stack to log and visualize BIG-IP WAF logs.From dashboard perspective it may be useful to track the following metrics: -Request Rate -Response codes -The distribution of requests in term of clean, blocked or alerted status -Identify the top talkers making requests -Track the top URL’s being accessed -Top violator source IP An example or the dashboard might look like the following:13KViews5likes6CommentsPerformance Logging iRule (Rule_http_log)
Problem this snippet solves: Here's a logging iRule. You'll need a HSL syslog pool to log too. Various bits gathered from other posts on DevCentral. Sharing in case there is interest. Make sure your rsyslogd is setup to use the newer syslog format like RFC-5424 including milliseconds and timezone info.Includes Country (co) and logs individual request times for each request on a HTTP/1.1 connection. To configure F5 logging to use milliseconds and timezone, disable logging in the gui and use tmsh edit sys syslog and something like: include " # short hostnames options { use_fqdn(no); }; # Remote syslog in RFC5424 - Tim Riker <Tim@Rikers.org> destination remotesyslog { syslog(\"10.1.2.3\" transport(\"udp\") port(51443) ts_format(iso)); }; log { source(s_syslog_pipe); destination(remotesyslog); }; " Uses upvar and proc. Tested on 11.6 - 15.1 This tracks connection info in a table and then copies that down to the per-request log() to handle reporting on http2. This version works around a BIG-IP bug where HTTP::version does not report 2 or higher for http2 and later requests. With http2 profiles, subsequent requests using the same connection can generate this error in the logs if HTTP::respond HTTP::redirect or HTTP::retry is called from and earlier iRule. Reorder your iRules to avoid this. <HTTP_REQUEST> - No HTTP header is cached - ERR_NOT_SUPPORTED (line 1)invoked from within "HTTP::method" How to use this snippet: Add this iRule to whatever virtual hosts you desire. I always add it as the first rule. If you have a rule that sets headers you want to track, you may want this after the rule that sets headers. Interesting Splunk queries can be created like: index=* perflog | timechart avg(cpu_5sec) by host limit=10 to show load across multiple F5s. index=* perflog | timechart max(upstream_time) by http_host limit=10 to show long request times by http_host Any other iRule may add things to the log() array and those will get added to the single hsl output. If you create a dg_http_log datagroup, that will be used to filter what gets logged. Tested on version: 13.0 - 15.1 # Rule_http_log # http logging - Tim Riker <Tim@Rikers.org> # bits taken from this post: # https://devcentral.f5.com/questions/irule-for-getting-total-response-time-server-response-time-and-server-connection-time # iRule performance tracking # https://devcentral.f5.com/questions/Timing-iRules timing on # timing is on by default in 11.5.0+ to see stats: # tmsh show ltm rule Rule_http_log # # if the dg_http_log datagroup exists then vips or hosts/paths in dg_http_log that start with # "NONE" no logging (really anything other than empty) # "INFO" normal logging # "FINE" full request and response headers and CLIENT_CLOSED # # upstream_time := 15000 in the datagroup to log all requests over 15 seconds # # example: # "/Common/vs_www.example.com_HTTPS" := "FINE" - logged including CLIENT_CLOSED # "www.example.com/" := "INFO" - logged # "www.example.com/somepath" := "FINE" - full headers # "www.example.com/otherpath" := "NONE" - not logged when RULE_INIT { # hostname up to first dot set static::hostname [getfield [info hostname] "." 1] } # not calling /Common/proc:hsllog as this logs when the request occurred # instead of the time it calls hsllog at the end of the request proc hsllog {time mylog} { upvar 1 $mylog log # https://tools.ietf.org/html/rfc5424 <local0.info>version rfc-3339time host procid msgid structured_data log # should be able to use a "Z" here instead of "+00:00" but our splunk logs don't handle that # 134 = local0.info set output "<134>1 [clock format [string range $time 0 end-3] -gmt 1 -format %Y-%m-%dT%H:%M:%S.[string range $time end-2 end]+00:00] ${static::hostname} httplog [TMM::cmp_group].[TMM::cmp_unit] - -" foreach key [lsort [array names log]] { if { ($log($key) matches_regex {[\" ;,:]}) } { append output " $key=\"[string map {\" "|"} $log($key)]\"" } else { append output " $key=$log($key)" } } # avoid marking virtual server up when hsl pool is up # https://support.f5.com/csp/article/K14505 set hsl pool_syslog HSL::send [HSL::open -proto UDP -pool $hsl] $output } when CLIENT_ACCEPTED { # calculate and track milliseconds # is this / 1000 guaranteed to be clock seconds? TCL docs say no, but it looks like on f5 it is. set tcp_start_time [clock clicks -milliseconds] set log(loglevel) 0 if { [class exists dg_http_log] } { # virtual name entries need to be full path, ie: /Common/vs_www.example.com_HTTP switch -- [string range [class match -value -- [virtual name] equals dg_http_log] 0 3] { "FINE" { set log(loglevel) 2 } "INFO" { set log(loglevel) 1 } default { set log(loglevel) 0 } } } table set -subtable [IP::client_addr]:[TCP::client_port] loglevel $log(loglevel) table set -subtable [IP::client_addr]:[TCP::client_port] tmm "[TMM::cmp_group].[TMM::cmp_unit]" table set -subtable [IP::client_addr]:[TCP::client_port] client_addr [IP::client_addr] table set -subtable [IP::client_addr]:[TCP::client_port] client_port [TCP::client_port] table set -subtable [IP::client_addr]:[TCP::client_port] cpu_5sec [cpu usage 5secs] table set -subtable [IP::client_addr]:[TCP::client_port] virtual_name [virtual name] set co [whereis [IP::client_addr] country] if { $co eq "" } { set co unknown } table set -subtable [IP::client_addr]:[TCP::client_port] co $co } when HTTP_REQUEST { set http_request_time [clock clicks -milliseconds] set keys [table keys -subtable [IP::client_addr]:[TCP::client_port]] foreach key $keys { set log($key) "[table lookup -subtable "[IP::client_addr]:[TCP::client_port]" "$key"]" } if {[HTTP::has_responded]} { # The rule should come BEFORE any rules that do things like redirects set log(http_has_responded) [HTTP::has_responded] set log(loglevel) 1 set log(event) HTTP_REQUEST call hsllog $http_request_time log return } if { [class exists dg_http_log] } { set logsetting [class match -value -- [HTTP::host][HTTP::uri] starts_with dg_http_log] if { $logsetting ne "" } { # override log(loglevel) if we found something switch -- [string range $logsetting 0 3] { "FINE" { set log(loglevel) 2 } "INFO" { set log(loglevel) 1 } default { set log(loglevel) 0 } } } } set log(http_host) [HTTP::host] set log(http_uri) [HTTP::uri] set log(http_method) [HTTP::method] # request_num might not be accurate for HTTP2 set log(request_num) [HTTP::request_num] set log(request_size) [string length [HTTP::request]] # BUG http2 reported as http1 in pre 16.x # https://cdn.f5.com/product/bugtracker/ID842053.html set log(http_version) [HTTP::version] if { [catch \[HTTP2::version\] result] == 1 } { if { $result contains "Operation not supported" } { #log local0. "HTTP version is: [HTTP::version]" } else { set h2ver [eval "\HTTP2::version"] # we might have http2 support, but not be http2 if { $h2ver != 0 } { set log(http_version) $h2ver } } } #log local0. "http_version = $log(http_version)" if { $log(loglevel) > 1 } { foreach {header} [HTTP::header names] { set log(req-$header) [HTTP::header $header] } } else { foreach {header} {"connection" "content-length" "keep-alive" "last-modified" "policy-cn" "referer" "transfer-encoding" "user-agent" "x-forwarded-for" "x-forwarded-proto" "x-forwarded-scheme"} { if { [HTTP::header exists $header] } { set log(req-$header) [HTTP::header $header] } } } } when LB_SELECTED { set lb_selected_time [clock clicks -milliseconds] set log(server_addr) [LB::server addr] set log(server_port) [LB::server port] set log(pool) [LB::server pool] } when SERVER_CONNECTED { set log(connection_time) [expr {[clock clicks -milliseconds] - $lb_selected_time}] set log(snat_addr) [IP::local_addr] set log(snat_port) [TCP::local_port] } when LB_FAILED { set log(event_info) [event info] } when HTTP_REJECT { set log(http_reject) [HTTP::reject_reason] } when HTTP_REQUEST_SEND { set http_request_send_time [clock clicks -milliseconds] } when HTTP_RESPONSE { set log(upstream_time) [expr {[clock clicks -milliseconds] - $http_request_send_time}] set log(http_status) [HTTP::status] if { $log(loglevel) > 1 } { foreach {header} [HTTP::header names] { set log(res-$header) [HTTP::header $header] } } else { foreach {header} {"cache-control" "connection" "content-encoding" "content-length" "content-type" "content-security-policy" "keep-alive" "last-modified" "location" "server" "www-authenticate"} { if { [HTTP::header exists $header] } { set log(res-$header) [HTTP::header $header] } } } # if logging is off, but upstream_time is over threshold in datagroup, log anyway if { ($log(loglevel) < 1) && [class exists dg_http_log] } { set log_upstream_time [class match -value -- upstream_time equals dg_http_log] if {$log_upstream_time ne "" && $log(upstream_time) >= $log_upstream_time} { set log(over_upstream_time) $log_upstream_time set log(loglevel) 1 } } } when HTTP_RESPONSE_RELEASE { if { [info exists http_request_time] } { set log(http_time) "[expr {[clock clicks -milliseconds] - $http_request_time}]" # push http_time into table so CLIENT_CLOSED can see it in HTTP/2 table set -subtable [IP::client_addr]:[TCP::client_port] http_time $log(http_time) } else { set http_request_time [clock clicks -milliseconds] } set log(event) HTTP_RESPONSE_RELEASE if { $log(loglevel) > 0 } { call hsllog $http_request_time log } } when HTTP_DISABLED { set log(http_passthrough_reason) [HTTP::passthrough_reason] } when CLIENT_CLOSED { # grab log() values from table set keys [table keys -subtable [IP::client_addr]:[TCP::client_port]] foreach key $keys { set log($key) "[table lookup -subtable "[IP::client_addr]:[TCP::client_port]" "$key"]" } set log(tcp_time) "[expr {[clock clicks -milliseconds] - $tcp_start_time}]" set log(event) CLIENT_CLOSED # http_time didn't get set, log here (HTTP_RESPONSE_RELEASE never called, catch redirects, aborted connections) if { not ([info exists log(http_time)]) } { if { [info exists http_request_time] } { # called HTTP_REQUEST but not HTTP_RESPONSE_RELEASE using HTTP 1.0 or 1.1 set log(http_time) "[expr {[clock clicks -milliseconds] - $http_request_time}]" } call hsllog $tcp_start_time log } elseif { $log(loglevel) > 1 } { call hsllog $tcp_start_time log } # clean out table when client disconnects table delete -subtable [IP::client_addr]:[TCP::client_port] -all }3.5KViews3likes7CommentsL3/4/DNS DDoS Reporting with Elastic Search and Kibana
Dear Reader, In this article, I would like to, in collaboration with my colleague Mohamed Shaath, show you how to use DDoS reporting and visibility dashboards that we have created based on an ELK (Elastic Search Logstash and Kibana) stack. The goal is to give you templates based on Open-Source software to address typical questions DDoS operators have and need to answer when an incident happens. Another component we added is the visualization of incoming packets, dropped packets, detection, and mitigation thresholds per attack vector. The idea here is to give you insights into auto-calculated thresholds compared to incoming rates. It will also give you the possibility to see anomalies in traffic behavior. Hopefully, the visualization will also help you with fine-tuning the DoS vector configuration (a typical example of this is the floor value of a vector). This article will give you an introduction to some of the graphs we provide together with the templates. Feel free to arrange or modify them in the way you need when you use the solution. We are also very happy to get your feedback, so we can optimize the dashboards and graphs in a way that is most useful for DDoS operators. Fundamental understanding of log events All DDoS configuration relay basically on two thresholds, regardless of the chosen threshold (manual, fully automatic, multiplier, …): Detection and Mitigation Figure 1: Detection and Mitigation rate “Detection” means, inform the DDoS operator that the incoming rate is above the configured (or auto-calculated rate based on the history) rate. Do not block traffic, just send out specific log information. The “detection” value is usually set or calculated to a rate that is just within the expected “normal” rate. That also means, everything above that value is not “normal” and therefore suspicious, but not necessarily an attack. But the DDoS operator should be aware of that event. Exactly this is happening when a packet rate crosses the detection rate: BIG-IP will send out log messages to the log server (when configured). Within the ELK solution we are introducing, we use the “Splunk” logging format, which sends the information in key/value format. That makes the understanding of the fields much easier. Here is an example of a log message, which is sent out when the packet rate has crossed the detection threshold. Jun 17 23:08:46 172.30.107.11 action="Allow",hostname="lon-i5800-1.pme.itc.f5net.com",bigip_mgmt_ip="172.30.107.11",context_name="/Common/www_10_103_2_80_80",date_time="Jun 17 2021 22:58:12",dest_ip="10.103.2.80",dest_port="80",device_product="DDoS Hybrid Defender",device_vendor="F5",device_version="15.1.2.1.0.317.10",dos_attack_event="Attack Sampled",dos_attack_id="550542726",dos_attack_name="TCP Push Flood",dos_packets_dropped="0",dos_packets_received="117",errdefs_msgno="23003138",errdefs_msg_name="Network DoS Event",flow_id="0000000000000000",severity="4",dos_mode="Enforced",dos_src="Volumetric, Per-SrcIP, VS-specific attack, metric:PPS",partition_name="Common",route_domain="0",source_ip="10.103.6.10",source_port="39219",vlan="/Common/vlan3006_client" Explanation of the message content: Action = “Allow” indicates that BIG-IP is not dropping packets (from the DoS point of view), it’s just giving the operator the information that within the last second the protected context (here: /Common/www_10_103_80_80) has received 117 (dos_packets_received) push packets (dos_attack_name) from source IP 10.103.6.10 (source_ip) within the last second. Btw., because this is a “Volumetric, Per-SrcIP, VS-specific attack” (dos_src) log message, it also tells you that the source IP has been identified as a bad actor (Also see my article: Increasing accuracy using Bad Actor and Attacked Destination). Therefore, this event was triggered by the Bad Actor configuration of the TCP Push flood vector. Mitigation threshold Once the incoming packet rate has crossed the mitigation threshold of a DoS vector or an attack signature, then BIG-IP starts to drop (rate-limit) traffic above that value. This is when we declare being under an DDoS attack because the protected context (server, service, network, BIG-IP, etc.) will be negatively affected by this high number of packets per second. Now the BIG-IP DoS device (AFM/DHD) needs to lower the number of packets hitting the affected context and that’s why it starts to drop packets on the identified vector. Again, this mitigation threshold can be set manually or auto-calculated based on history or a multiplication of the detection threshold. (Explanation of the F5 DDoS threshold modes) Here is an example of a drop log message: Jun 17 23:05:03 172.30.107.11 action="Drop",hostname="lon-i5800-1.pme.itc.f5net.com",bigip_mgmt_ip="172.30.107.11",context_name="Device",date_time="Jun 17 2021 22:54:29",dest_ip="10.103.2.80",dest_port="0",device_product="DDoS Hybrid Defender",device_vendor="F5",device_version="15.1.2.1.0.317.10",dos_attack_event="Attack Sampled",dos_attack_id="3221546531",dos_attack_name="Bad TCP flags (all cleared)",dos_packets_dropped="152224",dos_packets_received="152224",errdefs_msgno="23003138",errdefs_msg_name="Network DoS Event",flow_id="0000000000000000",severity="4",dos_mode="Enforced",dos_src="Volumetric, Aggregated across all SrcIP's, Device-Wide attack, metric:PPS",partition_name="Common",route_domain="0",source_ip="10.103.6.10",source_port="12826",vlan="/Common/vlan3006_client" In this example, the message is an aggregation of all source IPs (dos_src="Volumetric, Aggregated across all SrcIP's, Device-Wide attack) of the dropped packets (dos_packets_dropped="152224") during the last second. Therefore, the source IP (source_ip="10.103.6.10”) is just a representer for all source IPs with dropped packets within the last second. This is because there was no “bad actor” identified. This is usually the case, when the bad actor functionality is not configured, or when every packet has a different source IP. Structure of the dashboards These two main logging events (allow and drop) are what we have adapted to the visualization of the DDoS dashboard. The DDoS operator needs to know when there is an anomaly in the network and which vectors are triggered by the anomaly. The operator also needs to know what destinations are involved and which sources cause the anomaly. But, it is also important to know when the network is under attack and again what mitigation has taken place on which destinations and sources. How many packets have been dropped etc.? When you open the “DDoS Dashboard” and choose the “Overview Dashboard” you will notice that the dashboard is divided into two halves. On the left side, you get the information when a DDoS device has dropped packets and on the right side, you get the information about “suspicious” packets, which means when traffic was above a detection threshold without being dropped (action “Allow”). Figure 2: Structure of the dashboard Within this dashboard, you will also find graphs or tables which do not split the dashboard into two sides. Here you find combined information from both events/areas (mitigation and suspicious). Explanation of some dashboards In the menu section Home/Analytics/Dashboard you will find all dashboards we created. Figure 3: Dashboard menu Let’s briefly explain what the main Dashboards are for. Figure 4: Dashboard overview DDoS_Dashboard is the board where you can see all events during a chosen timeframe, which you can select in the upper right corner within that dashboard. Figure 5: Period of time selection On the top of the page, you find the Dashboard Explorer. From here you can easily navigate between all the relevant dashboards without going through the Analytics section of the main menu. Figure 6: Dashboard Explorer DDOS STATS Dashboard: shows details of the rates and thresholds (packet rate, detection, and mitigation threshold, drop rate) for all vectors including bad actor and attacked destination thresholds. Here you need to select the relevant vector, and context to see the details. DDOS Network Vectors: show details of the incoming rate and drop rate per network vector on a one-pager. DDOS DNS Vectors: show details of the incoming rate and drop rate per DNS vector on a one-pager. DDOS Bad Header Vectors: show details of the incoming rate and drop rate per bad header vector on one page. DDOS SIP vector: show details of the incoming rate and drop rate per SIP vector on a one pager. Please note that all “stats” dashboards are based on the “dos_stats” table, which you need to you to your server. It is not done via the DoS logs. On the GitHub page, you will find instructions on how to do it. Next, you see the Stats Control Panel Figure 7: Stats Control Panel By default, it will show the events (drop/allow) for all vectors in all contexts (VS/PO, Device) on all DoS devices. But by using the drop-down menu you can filter on specific data. All filters you set can also easily be saved and used again. Kibana gives a lot of flexibility. Next, you get to the Top Attacks Timeline, which shows you the top 10 attack vectors, which have dropped packets. Figure 8: Attack Timeline When you mouseover then you get the number of dropped packets for that vector. To the right of this graph, you see the Attack Event Details. Figure 9: Attack Event Details This simply shows you how many logs you have received per log event. Remember every mechanism (for example per source event, per destination, aggregated, …) has its own logs. The next row shows on the left side how many packets had been dropped during the chosen time frame. Figure 10: Dropped vs. suspicious packets On the right side you see how many packets had been identified as suspicious because the rate was above the detection threshold, but not above the mitigation threshold. This event message has the action “Allow”. In the middle graph, you see the relation of suspicious packets vs. dropped packets vs. incoming packets (incoming packets is the summarization of dropped and suspicious packets). The next graph gives you also an overview of received packets vs. dropped packets. Figure 11: Incoming vs. dropped packets But here the data comes from the dos_stats table, so again it is only visible when you send the information. Keep in mind this is not done via the log messages. This is the part where you send the output of the “tmctl -c dos_stat” command to your log device. If you are not doing it, then you can remove this graph from the dashboard. The main difference to the graph in the middle above is, that you will see data also when there is no event (allow/drop) because depending on the configured frequency you send the “dos_stat” table, you get the data (snapshot). Graphs based on log events of course can only appear when there is an event and logs are sent. This graph shows all incoming packets counted by all enabled vectors, regardless of they are counted on bad actors, attacked destinations, or the global stats per vector. Same for the dropped packets. It gives an overall overview of incoming packets vs. dropped packets. To get more details on which vector or mechanism (BA, AD) did the mitigation, you need to go to the DDOS STATS Dashboard. A piece of important information for a DDoS Operator is to know which services (IPs) are under attack and which contexts or protected objects have been involved. Figure 12: Target information Of course, also which vectors are used by the attacker. This is what is shown in the next row. On the left two graphs you get this information for dropped packets. On the right two graphs you see it for packets above the detection threshold but below the mitigation. Attacked IP and Destination Port, shows you the attacked IPs including the destination ports. Attacked Protected Objects, shows you the Context (VS/PO, Device, Global) in relation to the attack vectors. Context “Global” is used for IPI (IP-Intelligence). In this example packets got dropped because source IPs were configured within the IPI policy “my_IPI” and the category “denial of service”. The mitigation was executed on the global level. IPI activities are shown as attack vectors. Figure 13: IP-Intelligence information When you mouseover you can get the full line. More details on the attack vectors and IPI activity you will see lower on the page. Attacked destination details In the next row, you find a table with information on IP addresses that have been identified as being attacked by “Attacked Destination Detection” configured on a vector. Figure 14: Attacked Destination Details Figure 15: Vector configuration What are the sources of an attack? The next graph gives you the information of the identified attackers. “Top AttackerIPs” shows you the top 10 attacks based on aggregated logs. When you have configured “Bad Actor Detection” then you will also get the information for the top 10 “bad actors” IPs. Identified “bad actor” IPs are certainly important information you want to keep an eye on. Figure 16: Source address information Bad Actor Details To get more information on “bad actors”, you can use the “Bad Actor Details” table, which will show you relevant information. Here is an example: Figure 17: Bad Actor details You can see that the UDF flood vector identified a flood for the bad actor IP “4.4.4.4” at 11:02 on the Device level. Most of the packets had been dropped (PPS vs. Dropped Packets). Within the next multiple 30 second intervals, you get again details for that bad actor IP. But at 11:06 you can see that the IP address got programmed into the “denial of service” category and after that, all traffic coming from that IP got dropped via the IP-Intelligence policy “my_IPI” on the “Global” level/context. BDoS Details The Dashboard will also give you information about BDoS signatures and their events. Figure 18: BDoS details In this example, you can see that the system generated (Signature Add) a BDoS signature at 11:23:30. Then this signature was used (Re-USED) for mitigation (Drop). Keep in mind you will only see the details of a signature when it gets created. If a signature is re-used and you want to see the details of the signatures which may have gotten created days or weeks before, then you need to filter for that signature within the timeframe it got created. Another view on attacked IPs The dashboards give you also another, comprehensive view on attacked and targeted (action allow) IP addresses. Here you probably best start to mouseover from the inner circle going outside and you will get information per attacked context. Figure 19: Combined view on sources, destination and vectors Details about DNS attacks Within the DNS section, you get details about DNS-related attacks. Figure 20: DNS attack overview per vector Figure 21: Detailed DNS attack overview Also, a different view on Bad Actor activities Figure 22: Bad Actor / attack vector / destination overview Since we hope the graphs are mostly self-explaining we don´t want to go through all of them. We also plan to add more or modify them based on your feedback. Now it’s time to talk about another component, which we already touched on multiple times within this article. Attack vector visualization A second component we have created is the visualization of the stats (incoming, detection, mitigation, etc.) per attack vector. This is an optional part and is not related to the DoS logging. It is based on the “dos_stat” table and gives a snapshot of the statistics based on the interval you have configured to send the data from BIG-IP into your ELK stack. In my article “Demonstration of Device DoS and Per-Service DoS protection,” I already introduced you to the “dos_stat” table, when I used it within my “show_DoS_stats_script”. Figure 23: DDoS stat table This script shows you the stats for all vectors and their threshold etc. By sending this data frequently into your ELK stack, you can visualize the data and get graphs for them. You then can easily see trends or anomalies within a defined time frame. You can also easily see what thresholds (detection/mitigation) the system has calculated. Figure 24: Activity (detection/mitigation)graph per vector In this example you can see, what the system has done during an attack. The green line shows the incoming packet rate for that vector. The yellow line shows the expected auto-calculated rate (detection rate). The blue line is the auto-calculated mitigation rate, which is at the beginning of this graph very high because the protected context has no stress. Then we can see that the packet rate increases massively and crossed the detection rate. This is when the DDoS operator needs to be informed because this rate is not “normal” (based on history) and therefore suspicious. This high packet rate has an impact on the stress of the protected context and the mitigation rate got adjusted below the incoming rate. At that point, the system started to defend and mitigate. But the incoming packet rate went down again for a short time. Here the mitigation stopped because the rate was below the mitigation threshold, which also got increased again because of no stress on the protected context anymore. Then the flood happened again. The mitigation threshold got adjusted, mitigation started. Later we can see the incoming rate sometimes climbed above the detection threshold but was not strong enough to affect the health of the protected context. Therefore, no mitigation took place. At around 11:53 we can see the flood increased again and enabled the mitigation. Please keep in mind that the granularity of this graph depends of course on the frequency you send the data into the ELK stack and the data is always a snapshot of the current stats. How to configure logging on BIG-IP tmsh create ltm pool pool_log_server members add { 1.1.1.1:5558 } tmsh create sys log-config destination remote-high-speed-log HSL_LOG_DEST { pool-name pool_log_server protocol udp } tmsh create sys log-config destination splunk SPLUNK_LOG_DEST forward-to HSL_LOG_DEST tmsh create sys log-config publisher KIBANA_LOG_PUBLISHER destinations add { SPLUNK_LOG_DEST } tmsh create security log profile LOG_PROFILE dos-network-publisher KIBANA_LOG_PUBLISHER protocol-dns-dos-publisher KIBANA_LOG_PUBLISHER protocol-sip-dos-publisher KIBANA_LOG_PUBLISHER ip-intelligence { log-translation-fields enabled log-publisher KIBANA_LOG_PUBLISHER } traffic-statistics { syncookies enabled log-publisher KIBANA_LOG_PUBLISHER } tmsh modify security log profile global-network dos-network-publisher KIBANA_LOG_PUBLISHER ip-intelligence { log-geo enabled log-rtbh enabled log-scrubber enabled log-shun enabled log-translation-fields enabled log-publisher KIBANA_LOG_PUBLISHER } protocol-dns-dos-publisher KIBANA_LOG_PUBLISHER protocol-sip-dos-publisher KIBANA_LOG_PUBLISHER traffic-statistics { log-publisher KIBANA_LOG_PUBLISHER syncookies enabled } tmsh modify security dos device-config dos-device-config log-publisher KIBANA_LOG_PUBLISHER Figure 25: Overview of logging configuration How to send the dos_stats table data modify (crontab -e) the crontab on BIG-IP and add: * * * * * nb_of_tmms=$(tmsh show sys tmm-info | grep Sys::TMM | wc -l);tmctl -c dos_stat -s context_name,vector_name,attack_detected,stats_rate,drops_rate,int_drops_rate,ba_stats_rate,ba_drops_rate,bd_stats_rate,bd_drops_rate,detection,mitigation_low,mitigation_high,detection_ba,mitigation_ba_low,mitigation_ba_high,detection_bd,mitigation_bd_low,mitigation_bd_high | grep -v "context_name" | sed '/^$/d' | sed "s/$/,$nb_of_tmms/g" | logger -n 1.1.1.1 --udp --port 5558 Modify IP and port appropriate. Better approach then using the crontab is to use an external monitor: https://support.f5.com/csp/article/K71282813 Anyhow, keep in mind more frequently logging generates more data on you logging device! Conclusion The DDoS dashboards based on an ELK stack give the DDoS operators visibility into their DDoS events. The dashboard consumes logs sent by BIG-IP based on L3/4/DNS DDoS events and visualizes them in graphs. These graphs provide relevant information on what kinds of attacks from which sources are going to which destinations. Based on your BIG-IP DoS config you get “bad actor” details or “attacked destinations” details listed. You will also see if IPs that have been blocked by certain IPI categories and more. In addition to other information shown, the ELK stack is also able to consume data from the dos_stats table, which gives you details about your network behaviors on a vector level. Further, you can see how “auto thresholds” calculate detection and mitigation thresholds. We hope that this article gives you an introduction to the DDoS ELK Dashboards. We also plan to publish another article on the explanation of the underlying architecture. Sven Mueller & Mohamed Shaat3.5KViews1like0CommentsConfiguring Decision Logging for the F5 BIG-IP Global Traffic Manager
I was working on a GTM solution and with my limited lab I wanted to make sure that the decisions that F5 BIG-IP Global Traffic Manager made at the wideIP and pool level were as evident in the logs as they were consistent in my test results. It turns out there are some fancy little checkboxes in the wideIP configuration that you can check to enable such logs. You might notice, however, that upon enabling these checkboxes the logs are nowhere to be found. This is because there are other necessary steps. You need to configure a few objects to get those logs flowing. Log Publisher The first object is the log publisher. For as much detail as flows in the decision logging, I’d highly recommend using an HSL profile to log to a remote server, but for the purposes of testing I used the local syslog. This can also be done with tmsh. sys log-config publisher gtm_decision_logging { destinations { local-syslog { } } } DNS Logging Profile Next, create a DNS logging profile, make sure to select the Log Publisher you created in the previous step. For testing purpose I enabled the log responses and query ID as well, but those are disabled by default. This also can be created in tmsh. ltm profile dns-logging gtm_decision_logging { enable-response-logging yes include-query-id yes log-publisher gtm_decision_logging } Custom DNS Profile Now create a custom DNS profile. The only custom properties necessary are at the bottom of the profile where you enable logging and select the logging profile. This can also be configured in tmsh. ltm profile dns gtm_decision_logging { app-service none defaults-from dns enable-logging yes log-profile gtm_decision_logging } Apply the DNS Profile Now that all the objects are created, you can reference the DNS profile in the listener. in tmsh, you can modify the listener by adding the profile or if one already exists, replacing it. modify gtm listener gtmlistener profiles replace-all-with { udp_gtm_dns gtm_decision_logging } Log Details Once you have all the objects configured and the DNS profile referenced in your listener, the logging should be hitting /var/log/ltm now. For this first query, the emea pool is selected, but there is no probe data for my primary load balancing method, and the none alternate method skips to the fallback, which uses the configured fallback IP to respond to the client. 2015-06-03 08:54:21 ltm1.dc.test qid 11139 from 192.168.102.1#64536: view none: query: my.example.com IN A + (192.168.102.5%0) 2015-06-03 08:54:21 ltm1.dc.test qid 11139 from 192.168.102.1#64536 [my.example.com A] [round robin selected pool (emea)] [pool member check succeeded (vip3:192.168.103.12) - pool member state is available (green)] [QoS skipped pool member (vip3:192.168.103.12) - path has unmeasured RTT] [pool member check succeeded (vip4:192.168.103.13) - pool member state is available (green)] [QoS skipped pool member (vip4:192.168.103.13) - path has unmeasured RTT] [failed to select pool member by preferred load balancing method] [Using none load balancing method] [failed to select pool member by alternate load balancing method] [selected configured fallback IP] 2015-06-03 08:54:21 ltm1.dc.test qid 11139 to 192.168.102.1#64536: [NOERROR qr,aa,rd] response: my.example.com. 30 IN A 192.168.103.99; In this second request, the emea pool is again selected, but now there is probe data, so the pool member is selected as appropriate. 2015-06-03 08:55:43 ltm1.dc.test qid 6201 from 192.168.102.1#61503: view none: query: my.example.com IN A + (192.168.102.5%0) 2015-06-03 08:55:43 ltm1.dc.test qid 6201 from 192.168.102.1#61503 [my.example.com A] [round robin selected pool (emea)] [pool member check succeeded (vip3:192.168.103.12) - pool member state is available (green)] [QoS selected pool member (vip3:192.168.103.12) - QoS score (2082756232) is higher] [pool member check succeeded (vip4:192.168.103.13) - pool member state is available (green)] [QoS skipped pool member (vip4:192.168.103.13) from two pool members with equal scores] [QoS selected pool member (vip3:192.168.103.12)] 2015-06-03 08:55:43 ltm1.dc.test qid 6201 to 192.168.102.1#61503: [NOERROR qr,aa,rd] response: my.example.com. 30 IN A 192.168.103.12; In this final request, the americas pool is selected, but there is no valid topology score for the pool members, so query is refused. 2015-06-03 08:55:53 ltm1.dc.test qid 23580 from 192.168.102.1#59437: view none: query: my.example.com IN A + (192.168.102.5%0) 2015-06-03 08:55:53 ltm1.dc.test qid 23580 from 192.168.102.1#59437 [my.example.com A] [round robin selected pool (americas)] [pool member check succeeded (vip1:192.168.103.10) - pool member state is available (green)] [QoS selected pool member (vip1:192.168.103.10) - QoS score (0) is higher] [pool member check succeeded (vip2:192.168.103.11) - pool member state is available (green)] [QoS skipped pool member (vip2:192.168.103.11) from two pool members with equal scores] [QoS selected pool member (vip1:192.168.103.10)] [topology load balancing method failed to select pool member (vip1:192.168.103.10) - topology score is 0] [failed to select pool member by preferred load balancing method] [selected configured option Return To DNS] 2015-06-03 08:55:53 ltm1.dc.test qid 23580 to 192.168.102.1#59437: [REFUSED qr,rd] response: empty Yeah, yeah, skip all that and give me the good stuff If you want to test it quickly, you can save the config below to a file (/var/tmp/gtmlogging.txt in this example) and then merge it in. Finally, modify the wideIP and listener and you’re good to go! ### ### configuration: /var/tmp/gtmlogging.txt ### sys log-config publisher gtm_decision_logging { destinations { local-syslog { } } } ltm profile dns-logging gtm_decision_logging { enable-response-logging yes include-query-id yes log-publisher gtm_decision_logging } ltm profile dns gtm_decision_logging { app-service none defaults-from dns enable-logging yes log-profile gtm_decision_logging } ### ### Merge Command ### tmsh load sys config merge file /var/tmp/gtmlogging.txt ### ### Modify wideIP and Listener ### tmsh modify gtm wideip my.example.com load-balancing-decision-log-verbosity { pool-member-selection pool-member-traversal pool-selection pool-traversal } tmsh modify gtm listener gtmlistener profiles replace-all-with { udp_gtm_dns gtm_decision_logging } tmsh save sys config2KViews1like3Comments