logging
19 TopicsImplementing BIG-IP WAF logging and visibility with ELK
Scope This technical article is useful for BIG-IP users familiar with web application security and the implementation and use of the Elastic Stack.This includes, application security professionals, infrastructure management operators and SecDevOps/DevSecOps practitioners. The focus is for WAF logs exclusively.Firewall, Bot, or DoS mitigation logging into the Elastic Stack is the subject of a future article. Introduction This article focusses on the required configuration for sending Web Application Firewall (WAF) logs from the BIG-IP Advanced WAF (or BIG-IP ASM) module to an Elastic Stack (a.k.a. Elasticsearch-Logstash-Kibana or ELK). First, this article goes over the configuration of BIG-IP.It is configured with a security policy and a logging profile attached to the virtual server that is being protected. This can be configured via the BIG-IP user interface (TMUI) or through the BIG-IP declarative interface (AS3). The configuration of the Elastic Strack is discussed next.The configuration of filters adapted to processing BIP-IP WAF logs. Finally, the article provides some initial guidance to the metrics that can be taken into consideration for visibility.It discusses the use of dashboards and provides some recommendations with regards to the potentially useful visualizations. Pre-requisites and Initial Premise For the purposes of this article and to follow the steps outlined below, the user will need to have at least one BIG-IP Adv. WAF running TMOS version 15.1 or above (note that this may work with previous version but has not been tested).The target BIG-IP is already configured with: A virtual Server A WAF policy An operational Elastic Stack is also required. The administrator will need to have configuration and administrative privileges on both the BIG-IP and Elastic Stack infrastructure.They will also need to be familiar with the network topology linking the BIG-IP with the Elastic Search cluster/infrastructure. It is assumed that you want to use your Elastic Search (ELK) logging infrastructure to gain visibility into BIG-IP WAF events. Logging Profile Configuration An essential part of getting WAF logs to the proper destination(s) is the Logging Profile.The following will go over the configuration of the Logging Profile that sends data to the Elastic Stack. Overview of the steps: Create Logging Profile Associate Logging Profile with the Virtual Server After following the procedure below On the wire, logs lines sent from the BIG-IP are comma separated value pairs that look something like the sample below: Aug 25 03:07:19 localhost.localdomainASM:unit_hostname="bigip1",management_ip_address="192.168.41.200",management_ip_address_2="N/A",http_class_name="/Common/log_to_elk_policy",web_application_name="/Common/log_to_elk_policy",policy_name="/Common/log_to_elk_policy",policy_apply_date="2020-08-10 06:50:39",violations="HTTP protocol compliance failed",support_id="5666478231990524056",request_status="blocked",response_code="0",ip_client="10.43.0.86",route_domain="0",method="GET",protocol="HTTP",query_string="name='",x_forwarded_for_header_value="N/A",sig_ids="N/A",sig_names="N/A",date_time="2020-08-25 03:07:19",severity="Error",attack_type="Non-browser Client,HTTP Parser Attack",geo_location="N/A",ip_address_intelligence="N/A",username="N/A",session_id="0",src_port="39348",dest_port="80",dest_ip="10.43.0.201",sub_violations="HTTP protocol compliance failed:Bad HTTP version",virus_name="N/A",violation_rating="5",websocket_direction="N/A",websocket_message_type="N/A",device_id="N/A",staged_sig_ids="",staged_sig_names="",threat_campaign_names="N/A",staged_threat_campaign_names="N/A",blocking_exception_reason="N/A",captcha_result="not_received",microservice="N/A",tap_event_id="N/A",tap_vid="N/A",vs_name="/Common/adv_waf_vs",sig_cves="N/A",staged_sig_cves="N/A",uri="/random",fragment="",request="GET /random?name=' or 1 = 1' HTTP/1.1\r\n",response="Response logging disabled" Please choose one of the methods below.The configuration can be done through the web-based user interface (TMUI), the command line interface (TMSH), directly with a declarative AS3 REST API call, or with the BIG-IP native REST API.This last option is not discussed herein. TMUI Steps: Create Profile Connect to the BIG-IP web UI and login with administrative rights Navigate to Security >> Event Logs >> Logging Profiles Select “Create” Fill out the configuration fields as follows: Profile Name (mandatory) Enable Application Security Set Storage Destination to Remote Storage Set Logging Format to Key-Value Pairs (Splunk) In the Server Addresses field, enter an IP Address and Port then click on Add as shown below: Click on Create Add Logging Profile to virtual server with the policy Select target virtual server and click on the Security tab (Local Traffic >> Virtual Servers : Virtual Server List >> [target virtualserver] ) Highlight the Log Profile from the Available column and put it in the Selected column as shown in the example below (log profile is “log_all_to_elk”): Click on Update At this time the BIG-IP will forward logs Elastic Stack. TMSH Steps: Create profile ssh into the BIG-IP command line interface (CLI) from the tmsh prompt enter the following: create security log profile [name_of_profile] application add { [name_of_profile] { logger-type remote remote-storage splunk servers add { [IP_address_for_ELK]:[TCP_Port_for_ELK] { } } } } For example: create security log profile dc_show_creation_elk application add { dc_show_creation_elk { logger-type remote remote-storage splunk servers add { 10.45.0.79:5244 { } } } } 3. ensure that the changes are saved: save sys config partitions all Add Logging Profile to virtual server with the policy 1.From the tmsh prompt (assuming you are still logged in) enter the following: modify ltm virtual [VS_name] security-log-profiles add { [name_of_profile] } For example: modify ltm virtual adv_waf_vs security-log-profiles add { dc_show_creation_elk } 2.ensure that the changes are saved: save sys config partitions all At this time the BIG-IP sends logs to the Elastic Stack. AS3 Application Services 3 (AS3) is a BIG-IP configuration API endpoint that allows the user to create an application from the ground up.For more information on F5’s AS3, refer to link. In order to attach a security policy to a virtual server, the AS3 declaration can either refer to a policy present on the BIG-IP or refer to a policy stored in XML format and available via HTTP to the BIG-IP (ref. link). The logging profile can be created and associated to the virtual server directly as part of the AS3 declaration. For more information on the creation of a WAF logging profile, refer to the documentation found here. The following is an example of a pa rt of an AS3 declaration that will create security log profile that can be used to log to Elastic Stack: "secLogRemote": { "class": "Security_Log_Profile", "application": { "localStorage": false, "maxEntryLength": "10k", "protocol": "tcp", "remoteStorage": "splunk", "reportAnomaliesEnabled": true, "servers": [ { "address": "10.45.0.79", "port": "5244" } ] } In the sample above, the ELK stack IP address is 10.45.0.79 and listens on port 5244 for BIG-IP WAF logs.Note that the log format used in this instance is “Splunk”.There are no declared filters and thus, only the illegal requests will get logged to the Elastic Stack.A sample AS3 declaration can be found here. ELK Configuration The Elastic Stack configuration consists of creating a new input on Logstash.This is achieved by adding an input/filter/ output configuration to the Logstash configuration file.Optionally, the Logstash administrator might want to create a separate pipeline – for more information, refer to this link. The following is a Logstash configuration known to work with WAF logs coming from BIG-IP: input { syslog { port => 5244 } } filter { grok { match => { "message" => [ "attack_type=\"%{DATA:attack_type}\"", ",blocking_exception_reason=\"%{DATA:blocking_exception_reason}\"", ",date_time=\"%{DATA:date_time}\"", ",dest_port=\"%{DATA:dest_port}\"", ",ip_client=\"%{DATA:ip_client}\"", ",is_truncated=\"%{DATA:is_truncated}\"", ",method=\"%{DATA:method}\"", ",policy_name=\"%{DATA:policy_name}\"", ",protocol=\"%{DATA:protocol}\"", ",request_status=\"%{DATA:request_status}\"", ",response_code=\"%{DATA:response_code}\"", ",severity=\"%{DATA:severity}\"", ",sig_cves=\"%{DATA:sig_cves}\"", ",sig_ids=\"%{DATA:sig_ids}\"", ",sig_names=\"%{DATA:sig_names}\"", ",sig_set_names=\"%{DATA:sig_set_names}\"", ",src_port=\"%{DATA:src_port}\"", ",sub_violations=\"%{DATA:sub_violations}\"", ",support_id=\"%{DATA:support_id}\"", "unit_hostname=\"%{DATA:unit_hostname}\"", ",uri=\"%{DATA:uri}\"", ",violation_rating=\"%{DATA:violation_rating}\"", ",vs_name=\"%{DATA:vs_name}\"", ",x_forwarded_for_header_value=\"%{DATA:x_forwarded_for_header_value}\"", ",outcome=\"%{DATA:outcome}\"", ",outcome_reason=\"%{DATA:outcome_reason}\"", ",violations=\"%{DATA:violations}\"", ",violation_details=\"%{DATA:violation_details}\"", ",request=\"%{DATA:request}\"" ] } break_on_match => false } mutate { split => { "attack_type" => "," } split => { "sig_ids" => "," } split => { "sig_names" => "," } split => { "sig_cves" => "," } split => { "staged_sig_ids" => "," } split => { "staged_sig_names" => "," } split => { "staged_sig_cves" => "," } split => { "sig_set_names" => "," } split => { "threat_campaign_names" => "," } split => { "staged_threat_campaign_names" => "," } split => { "violations" => "," } split => { "sub_violations" => "," } } if [x_forwarded_for_header_value] != "N/A" { mutate { add_field => { "source_host" => "%{x_forwarded_for_header_value}"}} } else { mutate { add_field => { "source_host" => "%{ip_client}"}} } geoip { source => "source_host" } } output { elasticsearch { hosts => ['localhost:9200'] index => "big_ip-waf-logs-%{+YYY.MM.dd}" } } After adding the configuration above to the Logstash parameters, you will need to restart the Logstash instance to take the new logs into configuration.The sample above is also available here. The Elastic Stack is now ready to process the incoming logs.You can start sending traffic to your policy and start seeing logs populating the Elastic Stack. If you are looking for a test tool to generate traffic to your Virtual Server, F5 provides a simpleWAF tester tool that can be found here. At this point, you can start creating dashboards on the Elastic Stack that will satisfy your operational needs with the following overall steps: ·Ensure that the log index is being created (Stack Management >> Index Management) ·Create a Kibana Index Pattern (Stack Management>>Index patterns) ·You can now peruse the logs from the Kibana discover menu (Discover) ·And start creating visualizations that will be included in your Dashboards (Dashboards >> Editing Simple WAF Dashboard) A complete Elastic Stack configuration can be found here – note that this can be used with both BIG-IP WAF and NGINX App Protect. Conclusion You can now leverage the widely available Elastic Stack to log and visualize BIG-IP WAF logs.From dashboard perspective it may be useful to track the following metrics: -Request Rate -Response codes -The distribution of requests in term of clean, blocked or alerted status -Identify the top talkers making requests -Track the top URL’s being accessed -Top violator source IP An example or the dashboard might look like the following:13KViews5likes6CommentsL3/4/DNS DDoS Reporting with Elastic Search and Kibana
Dear Reader, In this article, I would like to, in collaboration with my colleague Mohamed Shaath, show you how to use DDoS reporting and visibility dashboards that we have created based on an ELK (Elastic Search Logstash and Kibana) stack. The goal is to give you templates based on Open-Source software to address typical questions DDoS operators have and need to answer when an incident happens. Another component we added is the visualization of incoming packets, dropped packets, detection, and mitigation thresholds per attack vector. The idea here is to give you insights into auto-calculated thresholds compared to incoming rates. It will also give you the possibility to see anomalies in traffic behavior. Hopefully, the visualization will also help you with fine-tuning the DoS vector configuration (a typical example of this is the floor value of a vector). This article will give you an introduction to some of the graphs we provide together with the templates. Feel free to arrange or modify them in the way you need when you use the solution. We are also very happy to get your feedback, so we can optimize the dashboards and graphs in a way that is most useful for DDoS operators. Fundamental understanding of log events All DDoS configuration relay basically on two thresholds, regardless of the chosen threshold (manual, fully automatic, multiplier, …): Detection and Mitigation Figure 1: Detection and Mitigation rate “Detection” means, inform the DDoS operator that the incoming rate is above the configured (or auto-calculated rate based on the history) rate. Do not block traffic, just send out specific log information. The “detection” value is usually set or calculated to a rate that is just within the expected “normal” rate. That also means, everything above that value is not “normal” and therefore suspicious, but not necessarily an attack. But the DDoS operator should be aware of that event. Exactly this is happening when a packet rate crosses the detection rate: BIG-IP will send out log messages to the log server (when configured). Within the ELK solution we are introducing, we use the “Splunk” logging format, which sends the information in key/value format. That makes the understanding of the fields much easier. Here is an example of a log message, which is sent out when the packet rate has crossed the detection threshold. Jun 17 23:08:46 172.30.107.11 action="Allow",hostname="lon-i5800-1.pme.itc.f5net.com",bigip_mgmt_ip="172.30.107.11",context_name="/Common/www_10_103_2_80_80",date_time="Jun 17 2021 22:58:12",dest_ip="10.103.2.80",dest_port="80",device_product="DDoS Hybrid Defender",device_vendor="F5",device_version="15.1.2.1.0.317.10",dos_attack_event="Attack Sampled",dos_attack_id="550542726",dos_attack_name="TCP Push Flood",dos_packets_dropped="0",dos_packets_received="117",errdefs_msgno="23003138",errdefs_msg_name="Network DoS Event",flow_id="0000000000000000",severity="4",dos_mode="Enforced",dos_src="Volumetric, Per-SrcIP, VS-specific attack, metric:PPS",partition_name="Common",route_domain="0",source_ip="10.103.6.10",source_port="39219",vlan="/Common/vlan3006_client" Explanation of the message content: Action = “Allow” indicates that BIG-IP is not dropping packets (from the DoS point of view), it’s just giving the operator the information that within the last second the protected context (here: /Common/www_10_103_80_80) has received 117 (dos_packets_received) push packets (dos_attack_name) from source IP 10.103.6.10 (source_ip) within the last second. Btw., because this is a “Volumetric, Per-SrcIP, VS-specific attack” (dos_src) log message, it also tells you that the source IP has been identified as a bad actor (Also see my article: Increasing accuracy using Bad Actor and Attacked Destination). Therefore, this event was triggered by the Bad Actor configuration of the TCP Push flood vector. Mitigation threshold Once the incoming packet rate has crossed the mitigation threshold of a DoS vector or an attack signature, then BIG-IP starts to drop (rate-limit) traffic above that value. This is when we declare being under an DDoS attack because the protected context (server, service, network, BIG-IP, etc.) will be negatively affected by this high number of packets per second. Now the BIG-IP DoS device (AFM/DHD) needs to lower the number of packets hitting the affected context and that’s why it starts to drop packets on the identified vector. Again, this mitigation threshold can be set manually or auto-calculated based on history or a multiplication of the detection threshold. (Explanation of the F5 DDoS threshold modes) Here is an example of a drop log message: Jun 17 23:05:03 172.30.107.11 action="Drop",hostname="lon-i5800-1.pme.itc.f5net.com",bigip_mgmt_ip="172.30.107.11",context_name="Device",date_time="Jun 17 2021 22:54:29",dest_ip="10.103.2.80",dest_port="0",device_product="DDoS Hybrid Defender",device_vendor="F5",device_version="15.1.2.1.0.317.10",dos_attack_event="Attack Sampled",dos_attack_id="3221546531",dos_attack_name="Bad TCP flags (all cleared)",dos_packets_dropped="152224",dos_packets_received="152224",errdefs_msgno="23003138",errdefs_msg_name="Network DoS Event",flow_id="0000000000000000",severity="4",dos_mode="Enforced",dos_src="Volumetric, Aggregated across all SrcIP's, Device-Wide attack, metric:PPS",partition_name="Common",route_domain="0",source_ip="10.103.6.10",source_port="12826",vlan="/Common/vlan3006_client" In this example, the message is an aggregation of all source IPs (dos_src="Volumetric, Aggregated across all SrcIP's, Device-Wide attack) of the dropped packets (dos_packets_dropped="152224") during the last second. Therefore, the source IP (source_ip="10.103.6.10”) is just a representer for all source IPs with dropped packets within the last second. This is because there was no “bad actor” identified. This is usually the case, when the bad actor functionality is not configured, or when every packet has a different source IP. Structure of the dashboards These two main logging events (allow and drop) are what we have adapted to the visualization of the DDoS dashboard. The DDoS operator needs to know when there is an anomaly in the network and which vectors are triggered by the anomaly. The operator also needs to know what destinations are involved and which sources cause the anomaly. But, it is also important to know when the network is under attack and again what mitigation has taken place on which destinations and sources. How many packets have been dropped etc.? When you open the “DDoS Dashboard” and choose the “Overview Dashboard” you will notice that the dashboard is divided into two halves. On the left side, you get the information when a DDoS device has dropped packets and on the right side, you get the information about “suspicious” packets, which means when traffic was above a detection threshold without being dropped (action “Allow”). Figure 2: Structure of the dashboard Within this dashboard, you will also find graphs or tables which do not split the dashboard into two sides. Here you find combined information from both events/areas (mitigation and suspicious). Explanation of some dashboards In the menu section Home/Analytics/Dashboard you will find all dashboards we created. Figure 3: Dashboard menu Let’s briefly explain what the main Dashboards are for. Figure 4: Dashboard overview DDoS_Dashboard is the board where you can see all events during a chosen timeframe, which you can select in the upper right corner within that dashboard. Figure 5: Period of time selection On the top of the page, you find the Dashboard Explorer. From here you can easily navigate between all the relevant dashboards without going through the Analytics section of the main menu. Figure 6: Dashboard Explorer DDOS STATS Dashboard: shows details of the rates and thresholds (packet rate, detection, and mitigation threshold, drop rate) for all vectors including bad actor and attacked destination thresholds. Here you need to select the relevant vector, and context to see the details. DDOS Network Vectors: show details of the incoming rate and drop rate per network vector on a one-pager. DDOS DNS Vectors: show details of the incoming rate and drop rate per DNS vector on a one-pager. DDOS Bad Header Vectors: show details of the incoming rate and drop rate per bad header vector on one page. DDOS SIP vector: show details of the incoming rate and drop rate per SIP vector on a one pager. Please note that all “stats” dashboards are based on the “dos_stats” table, which you need to you to your server. It is not done via the DoS logs. On the GitHub page, you will find instructions on how to do it. Next, you see the Stats Control Panel Figure 7: Stats Control Panel By default, it will show the events (drop/allow) for all vectors in all contexts (VS/PO, Device) on all DoS devices. But by using the drop-down menu you can filter on specific data. All filters you set can also easily be saved and used again. Kibana gives a lot of flexibility. Next, you get to the Top Attacks Timeline, which shows you the top 10 attack vectors, which have dropped packets. Figure 8: Attack Timeline When you mouseover then you get the number of dropped packets for that vector. To the right of this graph, you see the Attack Event Details. Figure 9: Attack Event Details This simply shows you how many logs you have received per log event. Remember every mechanism (for example per source event, per destination, aggregated, …) has its own logs. The next row shows on the left side how many packets had been dropped during the chosen time frame. Figure 10: Dropped vs. suspicious packets On the right side you see how many packets had been identified as suspicious because the rate was above the detection threshold, but not above the mitigation threshold. This event message has the action “Allow”. In the middle graph, you see the relation of suspicious packets vs. dropped packets vs. incoming packets (incoming packets is the summarization of dropped and suspicious packets). The next graph gives you also an overview of received packets vs. dropped packets. Figure 11: Incoming vs. dropped packets But here the data comes from the dos_stats table, so again it is only visible when you send the information. Keep in mind this is not done via the log messages. This is the part where you send the output of the “tmctl -c dos_stat” command to your log device. If you are not doing it, then you can remove this graph from the dashboard. The main difference to the graph in the middle above is, that you will see data also when there is no event (allow/drop) because depending on the configured frequency you send the “dos_stat” table, you get the data (snapshot). Graphs based on log events of course can only appear when there is an event and logs are sent. This graph shows all incoming packets counted by all enabled vectors, regardless of they are counted on bad actors, attacked destinations, or the global stats per vector. Same for the dropped packets. It gives an overall overview of incoming packets vs. dropped packets. To get more details on which vector or mechanism (BA, AD) did the mitigation, you need to go to the DDOS STATS Dashboard. A piece of important information for a DDoS Operator is to know which services (IPs) are under attack and which contexts or protected objects have been involved. Figure 12: Target information Of course, also which vectors are used by the attacker. This is what is shown in the next row. On the left two graphs you get this information for dropped packets. On the right two graphs you see it for packets above the detection threshold but below the mitigation. Attacked IP and Destination Port, shows you the attacked IPs including the destination ports. Attacked Protected Objects, shows you the Context (VS/PO, Device, Global) in relation to the attack vectors. Context “Global” is used for IPI (IP-Intelligence). In this example packets got dropped because source IPs were configured within the IPI policy “my_IPI” and the category “denial of service”. The mitigation was executed on the global level. IPI activities are shown as attack vectors. Figure 13: IP-Intelligence information When you mouseover you can get the full line. More details on the attack vectors and IPI activity you will see lower on the page. Attacked destination details In the next row, you find a table with information on IP addresses that have been identified as being attacked by “Attacked Destination Detection” configured on a vector. Figure 14: Attacked Destination Details Figure 15: Vector configuration What are the sources of an attack? The next graph gives you the information of the identified attackers. “Top AttackerIPs” shows you the top 10 attacks based on aggregated logs. When you have configured “Bad Actor Detection” then you will also get the information for the top 10 “bad actors” IPs. Identified “bad actor” IPs are certainly important information you want to keep an eye on. Figure 16: Source address information Bad Actor Details To get more information on “bad actors”, you can use the “Bad Actor Details” table, which will show you relevant information. Here is an example: Figure 17: Bad Actor details You can see that the UDF flood vector identified a flood for the bad actor IP “4.4.4.4” at 11:02 on the Device level. Most of the packets had been dropped (PPS vs. Dropped Packets). Within the next multiple 30 second intervals, you get again details for that bad actor IP. But at 11:06 you can see that the IP address got programmed into the “denial of service” category and after that, all traffic coming from that IP got dropped via the IP-Intelligence policy “my_IPI” on the “Global” level/context. BDoS Details The Dashboard will also give you information about BDoS signatures and their events. Figure 18: BDoS details In this example, you can see that the system generated (Signature Add) a BDoS signature at 11:23:30. Then this signature was used (Re-USED) for mitigation (Drop). Keep in mind you will only see the details of a signature when it gets created. If a signature is re-used and you want to see the details of the signatures which may have gotten created days or weeks before, then you need to filter for that signature within the timeframe it got created. Another view on attacked IPs The dashboards give you also another, comprehensive view on attacked and targeted (action allow) IP addresses. Here you probably best start to mouseover from the inner circle going outside and you will get information per attacked context. Figure 19: Combined view on sources, destination and vectors Details about DNS attacks Within the DNS section, you get details about DNS-related attacks. Figure 20: DNS attack overview per vector Figure 21: Detailed DNS attack overview Also, a different view on Bad Actor activities Figure 22: Bad Actor / attack vector / destination overview Since we hope the graphs are mostly self-explaining we don´t want to go through all of them. We also plan to add more or modify them based on your feedback. Now it’s time to talk about another component, which we already touched on multiple times within this article. Attack vector visualization A second component we have created is the visualization of the stats (incoming, detection, mitigation, etc.) per attack vector. This is an optional part and is not related to the DoS logging. It is based on the “dos_stat” table and gives a snapshot of the statistics based on the interval you have configured to send the data from BIG-IP into your ELK stack. In my article “Demonstration of Device DoS and Per-Service DoS protection,” I already introduced you to the “dos_stat” table, when I used it within my “show_DoS_stats_script”. Figure 23: DDoS stat table This script shows you the stats for all vectors and their threshold etc. By sending this data frequently into your ELK stack, you can visualize the data and get graphs for them. You then can easily see trends or anomalies within a defined time frame. You can also easily see what thresholds (detection/mitigation) the system has calculated. Figure 24: Activity (detection/mitigation)graph per vector In this example you can see, what the system has done during an attack. The green line shows the incoming packet rate for that vector. The yellow line shows the expected auto-calculated rate (detection rate). The blue line is the auto-calculated mitigation rate, which is at the beginning of this graph very high because the protected context has no stress. Then we can see that the packet rate increases massively and crossed the detection rate. This is when the DDoS operator needs to be informed because this rate is not “normal” (based on history) and therefore suspicious. This high packet rate has an impact on the stress of the protected context and the mitigation rate got adjusted below the incoming rate. At that point, the system started to defend and mitigate. But the incoming packet rate went down again for a short time. Here the mitigation stopped because the rate was below the mitigation threshold, which also got increased again because of no stress on the protected context anymore. Then the flood happened again. The mitigation threshold got adjusted, mitigation started. Later we can see the incoming rate sometimes climbed above the detection threshold but was not strong enough to affect the health of the protected context. Therefore, no mitigation took place. At around 11:53 we can see the flood increased again and enabled the mitigation. Please keep in mind that the granularity of this graph depends of course on the frequency you send the data into the ELK stack and the data is always a snapshot of the current stats. How to configure logging on BIG-IP tmsh create ltm pool pool_log_server members add { 1.1.1.1:5558 } tmsh create sys log-config destination remote-high-speed-log HSL_LOG_DEST { pool-name pool_log_server protocol udp } tmsh create sys log-config destination splunk SPLUNK_LOG_DEST forward-to HSL_LOG_DEST tmsh create sys log-config publisher KIBANA_LOG_PUBLISHER destinations add { SPLUNK_LOG_DEST } tmsh create security log profile LOG_PROFILE dos-network-publisher KIBANA_LOG_PUBLISHER protocol-dns-dos-publisher KIBANA_LOG_PUBLISHER protocol-sip-dos-publisher KIBANA_LOG_PUBLISHER ip-intelligence { log-translation-fields enabled log-publisher KIBANA_LOG_PUBLISHER } traffic-statistics { syncookies enabled log-publisher KIBANA_LOG_PUBLISHER } tmsh modify security log profile global-network dos-network-publisher KIBANA_LOG_PUBLISHER ip-intelligence { log-geo enabled log-rtbh enabled log-scrubber enabled log-shun enabled log-translation-fields enabled log-publisher KIBANA_LOG_PUBLISHER } protocol-dns-dos-publisher KIBANA_LOG_PUBLISHER protocol-sip-dos-publisher KIBANA_LOG_PUBLISHER traffic-statistics { log-publisher KIBANA_LOG_PUBLISHER syncookies enabled } tmsh modify security dos device-config dos-device-config log-publisher KIBANA_LOG_PUBLISHER Figure 25: Overview of logging configuration How to send the dos_stats table data modify (crontab -e) the crontab on BIG-IP and add: * * * * * nb_of_tmms=$(tmsh show sys tmm-info | grep Sys::TMM | wc -l);tmctl -c dos_stat -s context_name,vector_name,attack_detected,stats_rate,drops_rate,int_drops_rate,ba_stats_rate,ba_drops_rate,bd_stats_rate,bd_drops_rate,detection,mitigation_low,mitigation_high,detection_ba,mitigation_ba_low,mitigation_ba_high,detection_bd,mitigation_bd_low,mitigation_bd_high | grep -v "context_name" | sed '/^$/d' | sed "s/$/,$nb_of_tmms/g" | logger -n 1.1.1.1 --udp --port 5558 Modify IP and port appropriate. Better approach then using the crontab is to use an external monitor: https://support.f5.com/csp/article/K71282813 Anyhow, keep in mind more frequently logging generates more data on you logging device! Conclusion The DDoS dashboards based on an ELK stack give the DDoS operators visibility into their DDoS events. The dashboard consumes logs sent by BIG-IP based on L3/4/DNS DDoS events and visualizes them in graphs. These graphs provide relevant information on what kinds of attacks from which sources are going to which destinations. Based on your BIG-IP DoS config you get “bad actor” details or “attacked destinations” details listed. You will also see if IPs that have been blocked by certain IPI categories and more. In addition to other information shown, the ELK stack is also able to consume data from the dos_stats table, which gives you details about your network behaviors on a vector level. Further, you can see how “auto thresholds” calculate detection and mitigation thresholds. We hope that this article gives you an introduction to the DDoS ELK Dashboards. We also plan to publish another article on the explanation of the underlying architecture. Sven Mueller & Mohamed Shaat3.5KViews1like0CommentsConfiguring Decision Logging for the F5 BIG-IP Global Traffic Manager
I was working on a GTM solution and with my limited lab I wanted to make sure that the decisions that F5 BIG-IP Global Traffic Manager made at the wideIP and pool level were as evident in the logs as they were consistent in my test results. It turns out there are some fancy little checkboxes in the wideIP configuration that you can check to enable such logs. You might notice, however, that upon enabling these checkboxes the logs are nowhere to be found. This is because there are other necessary steps. You need to configure a few objects to get those logs flowing. Log Publisher The first object is the log publisher. For as much detail as flows in the decision logging, I’d highly recommend using an HSL profile to log to a remote server, but for the purposes of testing I used the local syslog. This can also be done with tmsh. sys log-config publisher gtm_decision_logging { destinations { local-syslog { } } } DNS Logging Profile Next, create a DNS logging profile, make sure to select the Log Publisher you created in the previous step. For testing purpose I enabled the log responses and query ID as well, but those are disabled by default. This also can be created in tmsh. ltm profile dns-logging gtm_decision_logging { enable-response-logging yes include-query-id yes log-publisher gtm_decision_logging } Custom DNS Profile Now create a custom DNS profile. The only custom properties necessary are at the bottom of the profile where you enable logging and select the logging profile. This can also be configured in tmsh. ltm profile dns gtm_decision_logging { app-service none defaults-from dns enable-logging yes log-profile gtm_decision_logging } Apply the DNS Profile Now that all the objects are created, you can reference the DNS profile in the listener. in tmsh, you can modify the listener by adding the profile or if one already exists, replacing it. modify gtm listener gtmlistener profiles replace-all-with { udp_gtm_dns gtm_decision_logging } Log Details Once you have all the objects configured and the DNS profile referenced in your listener, the logging should be hitting /var/log/ltm now. For this first query, the emea pool is selected, but there is no probe data for my primary load balancing method, and the none alternate method skips to the fallback, which uses the configured fallback IP to respond to the client. 2015-06-03 08:54:21 ltm1.dc.test qid 11139 from 192.168.102.1#64536: view none: query: my.example.com IN A + (192.168.102.5%0) 2015-06-03 08:54:21 ltm1.dc.test qid 11139 from 192.168.102.1#64536 [my.example.com A] [round robin selected pool (emea)] [pool member check succeeded (vip3:192.168.103.12) - pool member state is available (green)] [QoS skipped pool member (vip3:192.168.103.12) - path has unmeasured RTT] [pool member check succeeded (vip4:192.168.103.13) - pool member state is available (green)] [QoS skipped pool member (vip4:192.168.103.13) - path has unmeasured RTT] [failed to select pool member by preferred load balancing method] [Using none load balancing method] [failed to select pool member by alternate load balancing method] [selected configured fallback IP] 2015-06-03 08:54:21 ltm1.dc.test qid 11139 to 192.168.102.1#64536: [NOERROR qr,aa,rd] response: my.example.com. 30 IN A 192.168.103.99; In this second request, the emea pool is again selected, but now there is probe data, so the pool member is selected as appropriate. 2015-06-03 08:55:43 ltm1.dc.test qid 6201 from 192.168.102.1#61503: view none: query: my.example.com IN A + (192.168.102.5%0) 2015-06-03 08:55:43 ltm1.dc.test qid 6201 from 192.168.102.1#61503 [my.example.com A] [round robin selected pool (emea)] [pool member check succeeded (vip3:192.168.103.12) - pool member state is available (green)] [QoS selected pool member (vip3:192.168.103.12) - QoS score (2082756232) is higher] [pool member check succeeded (vip4:192.168.103.13) - pool member state is available (green)] [QoS skipped pool member (vip4:192.168.103.13) from two pool members with equal scores] [QoS selected pool member (vip3:192.168.103.12)] 2015-06-03 08:55:43 ltm1.dc.test qid 6201 to 192.168.102.1#61503: [NOERROR qr,aa,rd] response: my.example.com. 30 IN A 192.168.103.12; In this final request, the americas pool is selected, but there is no valid topology score for the pool members, so query is refused. 2015-06-03 08:55:53 ltm1.dc.test qid 23580 from 192.168.102.1#59437: view none: query: my.example.com IN A + (192.168.102.5%0) 2015-06-03 08:55:53 ltm1.dc.test qid 23580 from 192.168.102.1#59437 [my.example.com A] [round robin selected pool (americas)] [pool member check succeeded (vip1:192.168.103.10) - pool member state is available (green)] [QoS selected pool member (vip1:192.168.103.10) - QoS score (0) is higher] [pool member check succeeded (vip2:192.168.103.11) - pool member state is available (green)] [QoS skipped pool member (vip2:192.168.103.11) from two pool members with equal scores] [QoS selected pool member (vip1:192.168.103.10)] [topology load balancing method failed to select pool member (vip1:192.168.103.10) - topology score is 0] [failed to select pool member by preferred load balancing method] [selected configured option Return To DNS] 2015-06-03 08:55:53 ltm1.dc.test qid 23580 to 192.168.102.1#59437: [REFUSED qr,rd] response: empty Yeah, yeah, skip all that and give me the good stuff If you want to test it quickly, you can save the config below to a file (/var/tmp/gtmlogging.txt in this example) and then merge it in. Finally, modify the wideIP and listener and you’re good to go! ### ### configuration: /var/tmp/gtmlogging.txt ### sys log-config publisher gtm_decision_logging { destinations { local-syslog { } } } ltm profile dns-logging gtm_decision_logging { enable-response-logging yes include-query-id yes log-publisher gtm_decision_logging } ltm profile dns gtm_decision_logging { app-service none defaults-from dns enable-logging yes log-profile gtm_decision_logging } ### ### Merge Command ### tmsh load sys config merge file /var/tmp/gtmlogging.txt ### ### Modify wideIP and Listener ### tmsh modify gtm wideip my.example.com load-balancing-decision-log-verbosity { pool-member-selection pool-member-traversal pool-selection pool-traversal } tmsh modify gtm listener gtmlistener profiles replace-all-with { udp_gtm_dns gtm_decision_logging } tmsh save sys config1.9KViews1like3CommentsiRules logging to multiple locations with ease
Some months back I was at an account where we were developing some iRules to provide logging detail. One of the complications was that some of the infrastructure to support remote logging was in the process of being implemented and was not immediately available. We also wanted to be able to log to local resources as an alternative and as the probing for requirements continued it appeared that the logging portion of the iRule was going to vastly outweigh the core functionality of the iRule. Problem Domain Typically when you have a choice between one or two items a simple if-else construct is sufficient to choose between the different options for logging. This can also be made a little more flexible by adding a variable so that all the logging can be changed from a single location. For example: 1: when CLIENT_ACCEPTED { 2: # Set the type of logging required- 0=local, 1=remote 3: set logType = 0 4: 5: if { $logType == 0 } { 6: log -noname local0. "This is a local log event" 7: else 8: log 192.168.50.50 local0. "This is a remote event" 9: } 10: } In the example above, the variable logType provides the means to either log to the local syslog on the BigIP or to a remote syslog server. Examples like this can be seen throughout devcentral and are well understood. If the choices become larger then you risk getting stuck with an if-else-if block that compares the logging type variable to each possibility and then acts on it. A switch statement (reference) can be useful in cleaning up the code to make it easier to read and maintain: 1: switch logType { 2: 0 3: {log -noname local0. "This is a local log event"} 4: 1 5: {log 192.168.50.50 local0. "This is a remote event"} 6: 2 7: { HSL::send $hsl $log} 8: default 9: {log -noname local0. "DEFAULT: This is a local log event"} 10: } Most of the time solutions like these will be fine. Logging statements will get “turned on” or uncommented during the development and debugging cycle and then commented out when placed in production. This preserves the logging code should a problem materialize later or a modification be required that mandates scrutiny to evaluate the change. However writing your logging this way can be inefficient and cumbersome to maintain, especially if your iRule is or will become more extensive in the future. Additionally, if the requirements surrounding where and how you will log are fluid then you may be constantly revisiting the code which increases the chance that a functional mistake may be introduced when only a change to the logging was desired. Another problem occurs when the choices for logging, in regards to the conjunction, become “and” as opposed to “or”. Say for example that you have the following choices with regards to logging: logType Requirement 1 Requirement 2 Requirement 3 No logging Local syslog X X X Remote syslog X Remote HSL X STATS profile X Initially your requirement was to only log information to the local syslog. This is easy and takes little time to implement and maintain. Operational changes a few weeks later evolve the requirements so that remote syslog is also required in addition to local logging. This is still relatively easy but if you need to log data in 3-4 places your iRule is going to start to contain more logging code than functional code. Weeks after that, it is discovered that a single remote syslog is not adequate and that the amount of logging is detrimental to performance. It is desired that simple messages be logged locally, remote High Speed Logging is used for the heavy parts, and a stats profile will be used some of the time on a case by case, at our whim, basis. Now what? Requirements such as these can, and should, be mitigated at design time but there are occasions when this is either not practical or the entire purpose of the iRule is logging of some sort. It’s important to be able to deliver the iRule in a manner that will be easy to maintain and modify but that also doesn’t have extensive branching code blocks just to log informational details. My solution to this was to take a page out of my previous development experiences which turned out to be a template for how I write iRules today. Unix file permissions Those familiar with a Unix file system know the Spartan mechanism for controlling access to files and directories. The system works by toggling bits in a large value that designate whether that specific permission is allowable for the resource it is configured for. In Unix permissions you will hear things like 644 or 777. These refer to whether you can read/write/execute as an owner/group/other user on the system. Each of those specific bits corresponds to the allowed permission and their location specifies who is granted that permission. The following table summarizes a single set of permissions: Octal digit Text Binary value Result 0 --- 000 All access denied 1 --x 001 Execute only 2 -w- 010 Write only 3 -wx 011 Write and Execute 4 r-- 100 Read access only 5 r-x 101 Read and Execute 6 rw- 110 Read and Write 7 rwx 111 Everything allowed Applying this concept to our current problem, we get an elegant solution: Using bitwise operators, it’s possible to select not only one OR another option, but any combination desired. More importantly, you can very quickly determine which option is relevant using bitwise operations (these are almost universally the fastest operations to perform in code) while avoiding several if-else-if blocks. If you are comfortable with bitwise operations you may want to skip down to the next sections otherwise read on. Twiddling bits Bitwise operations are very simple to understand but it helps to have a chart to cement the concept if you are not familiar with them. To the right is a simple chart that that shows the bitwise result of an AND operation. If both of the inputs are a “1” then the result is a “1” or “True”. Otherwise the result of the operation will be a “0” or “False”. We will expand on how this will be useful to us in a moment. A B A and B 0 0 0 0 1 0 1 0 0 1 1 1 To get an example started, let’s look at a few logging options and chart out how they will work. PAY CLOSE ATTENTION TO THE DECIMAL COLUMN: Option Decimal Binary Value Result No Logging 0 000 No logging performed Local Syslog 1 001 Syslog on local device Remote syslog 2 010 Syslog to remote device Remote HSL 4 100 Syslog using HSL Notice how the Decimal value increases in powers of 2. This is important. If we expand on the previous table we can get our list of logging options we can select from: Option Decimal Binary Value Result No Logging 0 000 No logging performed Local Syslog 1 001 Syslog on local device Remote syslog 2 010 Syslog to remote device All syslog 3 011 Local and Remote syslog Remote HSL 4 100 Syslog using HSL Local / HSL 5 101 Local syslog and HSL Remote / HSL 6 011 Remote syslog and HSL Kitchen Sink 7 111 Everything The above table fills in the blanks if we add in the additional decimal values that are missing and determine what the result would be. To see why this works look at the Binary value for the option “All Syslog”. The value 011 would result in a Boolean “True” if compared to 1 and 2. This is also why on the previous table I mentioned that you should pay close attention to the Decimal values in the table. You want EACH option to be represented by a single binary digit – hence a power of two. Which options you select can be any value on this table. Clear it up with an example Enough theory, let’s put some of this into code to determine how this will work: 1: when RULE_INIT { 2: # Sets iRule debugging (0=no logging; 1=local syslog; 2=remote syslog; 4=remote HSL logging;) 3: set static::log_enable 0 4: 5: # logging resources 6: set static::r_syslog_srv 192.168.50.50 7: set static::hsl_pool HSL_POOL 8: } 9: 10: when CLIENT_ACCEPTED { 11: # Create HSL handle if using HSL logging 12: if {$static::log_enable & 4} { set hsl [HSL::open -proto UDP -pool $static::hsl_pool] } 13: 14: # Log based on requested destination 15: if {$static::log_enable & 1} {log -noname local0. “logged to local syslog” } 16: if {$static::log_enable & 2} {log $r_syslog_srv local0. “logged to remote syslog” } 17: if {$static::log_enable & 4} {HSL::send $hsl “logged to HSL”} 18: } Before you can test this fully you will need to create a pool named HSL_POOL with at least one syslog server in it and a syslog server at 192.168.50.50 or an IP address of your remote syslog box. We have one small piece of housekeeping we need to do in regards to HSL logging and that is opening up the HSL channel. We do that by checking to see if we indeed are logging to HSL using a bitwise comparison and if that is true then opening the HSL channel. There is no HSL::close so there is no need to check at the end of the iRule to clean anything up. Now we have a clean framework where we can easily change our logging option by modifying the log_enable variable. This also removes the end user from having to sort through lines of code and make changes that may not be obvious or well known to them. Formatting wise, this is a lot easier to read and understand and we also have the option to log to multiple places, or nowhere, if desired. Lastly, it would be easy to expand other options like a different remote syslog server or even programmatically changing the logging based on conditions. A last thought is that we get this by only adding three lines where we want to log information to. If we were doing this with if-else-if blocks it would require 7 (we could ignore the no logging option as a moot optimization) blocks. Even worse, as you add more options your logging code would quickly grow out of control. One last thing… There is an unsupported command you can use to condense your logging even further. While this is an improvement to having a lot of logging code to maintain any time we add or modify our iRule we still have to add all the logging options each time we have a spot we want to log. In actuality, there is no requirement that says you need to put the same options in each location. It is perfectly valid that only certain log actions would apply in certain locations and not another. That being said, we can condense this even further by using a function. iRules unfortunately does not have a mechanism for creating and using your own function but we can use a tcl command called eval to get the same result. First, let’s encapsulate our log message in a string: 1: … 2: when CLIENT_ACCEPTED { 3: # Create HSL handle if using HSL logging 4: if {$static::log_enable & 4} { set hsl [HSL::open -proto UDP -pool $static::hsl_pool] } 5: 6: # Create log message 7: set log "I log, therefore I am" 8: 9: # Log based on requested destination 10: if {$static::log_enable & 1} {log -noname local0. $log } 11: if {$static::log_enable & 2} {log $r_syslog_srv local0. $log } 12: if {$static::log_enable & 4} {HSL::send $hsl $log} 13: } The eval command (reference) takes one or more arguments together and evaluates them as a tcl script and then returns the evaluation of that script. Keep in mind that this is not a supported mechanism by F5 and the need to use it MUST outweigh the challenge of migrating or dealing with incompatibilities as you update the OS. First the code: 1: when RULE_INIT { 2: # Sets iRule debugging (0 = none; 1 = local syslog; 2 = remote syslog; 4 = remote HSL logging;) 3: set static::log_enable 0 4: 5: # logging resources 6: set static::r_syslog_srv 192.168.50.50 7: set static::hsl_pool HSL_POOL 8: 9: # create log 'function' 10: set static::logFunction { 11: if {$static::log_enable & 1} {log -noname local0. $log } 12: if {$static::log_enable & 2} {log $r_syslog_srv local0. $log } 13: if {$static::log_enable & 4} {HSL::send $hsl $log} 14: } 15: } 16: when CLIENT_ACCEPTED { 17: # Create HSL handle if using HSL logging 18: if {$static::log_enable & 4} { set hsl [HSL::open -proto UDP -pool $static::hsl_pool] } 19: 20: # Create log message 21: set log "I log, therefore I am" 22: 23: # Log based on requested destination 24: if { [catch {eval $static::logFunction;} ] } { 25: log -noname local0. "Exception caught while trying to log" 26: } 27: } As you can see we have moved the block of code that logs up into the RULE_INIT event and essentially made the static variable logFunction equal to that block of code. In order to call our ‘function’ we set the variable log to the desired message and then call eval on the variable. Just to be careful, we catch any exceptions/errors from evaluating the script and log it locally. If you are not concerned with catching an error in the logging script, the last bit could be condensed further to: 1: # Log based on requested destination 2: catch {eval $static::logFunction;} Keep in mind, nothing is for free. Creating the framework takes additional lines of code and chews up a tiny bit of memory. The eval command, depending on how verbose you get with the logging code, could get expensive from a performance perspective. All of these factors should be considered when using these techniques. I find that using a template built like this in my lab allows me to quickly implement an iRule and log in numerous places while debugging. I can easily modify how and where the logging goes and I can focus on the functional pieces quickly. For production, this code could easily be commented out or even wrapped in a debug flag or left in. Beyond logging This technique does not need to be confined to logging, although it does express the problem and solution very easily. This idea could also be used for functional constructs where five to ten different items may or may not be needed during the execution of an iRule but it cannot be determined until runtime. You can even wrap a block of code like this into a switch statement so that a procedural, state machine-like process can be modeled.900Views0likes0CommentsSSL Orchestrator Enhanced Uses Case: Remote Logging
Introduction This use case allows you to configure the BIG-IP SSL Orchestrator to send detailed logging to a remote Syslog server.Logging is an important aspect of SSL Orchestrator operation and troubleshooting.The volume of data created by debug logging is significant and should ideally be sent off-box for analysis and archiving.The following instructions demonstrate how to configure Remote Logging. Logging Level Logging verbosity is configured in the BIG-IP Configuration Utility.Under SSL Orchestrator select Configuration > Logs > Settings. Logging verbosity is set to Error by default.Change this to Debug for the Per-Request Policy and SSL Orchestrator Generic.Click Save when done. Note: For simplified logging that combines each connection flow into a single summary log, only enable SSL Orchestrator Generic at level Informational of higher.These log settings are Global and can be over ridden by per-Topology logging settings. Create a Pool for the Syslog server Under Local Traffic select Pools. Click Create. Give it a name, Remote_syslog_pool in this example.Give the Node a Name, syslog_server in this example.Enter the IP address of the syslog server and port 514 for the Service Port.Click Add. Note: 514 is the common port for syslogd but may be different in your environment. If desired, add a Health Monitor like gateway_icmp.Use the << to move it from Available to Active. Click Finished when done. Create Logging Destination Under SSL Orchestrator select Configuration > Logs > System. Then select Configuration > Log Destinations. Click Create. Give it a name, remote_syslog in this example.Select Remote High-Speed Log as the Type. Note: The Remote High-Speed Log (HSL) uses the data plane while Remote Syslog uses the management plane.HSL logging is preferred due to better, sustained performance.For more information on HSL click here. For Pool Name select the Pool created previously, Remote_syslog_pool in this example. Set the Protocol to UDP or TCP (typically UDP).Click Finished. Configure the Log Publisher From the same screen click Configuration > Log Publishers. Click on sys-sslo-publisher to edit it. Select the local-syslog and click the >> to move it to Available. Select the remote_syslog and click << to move it to Selected. Click Update when done. The configuration is now complete.Detailed logs should now be sent to your Syslog server. Verify it’s Working Check the Syslog Pool Statistics.From the BIG-IP Configuration Utility select Local Traffic > Pools. Select Statistics. If it is working you should see a non-zero value for Bits and Packets. Check your Syslog Server to verify it is receiving logs from SSL Orchestrator.In this example I’m running a packet capture on the Syslog Server to check that packets are being sent from the BIG-IP to the Syslog Server. In the example above you can see that the BIG-IP (10.0.0.1) is sending packets to the Syslog Server (10.0.0.2) on UDP port 514.You can also see the details of the Syslog message in the circle. Note: BIG-IP SSL Orchestrator needs a Self IP Address in order to send detailed logging to the Syslog Server.If deployed in Layer 2 mode you will need to configure a new Self IP Address.You cannot assign an IP address to an interface in an L2 vwire group.If deployed in Layer 3 mode you can use an existing Self IP Address as long as it can reach the Syslog Server.Ideally though, the Syslog traffic should not be on the same interface(s) as client/server traffic.In this example BIG-IP is configured with the Self IP 10.0.0.1 which is on the same subnet as the Syslog Server at IP address 10.0.0.2. Summary In this SSL Orchestrator Use Case you learned how to enable detailed logging on BIG-IP and have the logs sent to a remote Syslog Server.799Views0likes0CommentsBIG-IP Logging and Reporting Toolkit - part one
Joe Malek, one of the many awesome engineers here at F5, took it upon himself to delve deeply into a very interesting but often unsung part of the BIG-IP advanced configuration world: logging and reporting. It’s my great pleasure to get to share with you his awesome study and the findings therein, along with (eventually) a toolkit to help you get started in the world of custom log manipulation. If you’ve ever questioned or been curious about your options when it comes to information gathering and reporting, this is definitely something you should read. There will be multiple parts, so stay tuned. This one is just the intro. Logging & Reporting Toolkit - Part 1 Logging & Reporting Toolkit - Part 2 Logging & Reporting Toolkit - Part 3 Logging & Reporting Toolkit - Part 4 Description F5 products occupy critical positions in application delivery infrastructure. They serve as gateways, proxies, accelerators and traffic flow arbiters. In these roles customer expectations vary for the degree and amount of event information recorded. Several opportunities exist within our current product capabilities for our customers and partners to produce and consume log messages from and via F5 products. Efforts to date include generating W3C style log messages on LTM via iRules, close integration with leading vendors and ASM (requires askf5 login), and creating relationships with leading vendors to best serve our customers. Significant capabilities exist for customers and partners to create their own logging and reporting solutions. Problems and opportunity In the many products offered by F5, there exists a variety of logging structures. The common log protocols used to emit messages by F5 products are Syslog (requires askf5 login) and SNMP (requires askf5 login), along with built-in iRulescapabilities. Though syslog-ng is commonplace, software components tend to vary in transport, verbosity, message formatting and sometimes syslog facility. This can result in a high degree of data density in our logs, and messages our systems emit can vary from version to version.[i] The combination of these factors results in a challenge that requires a coordinated solution for customers who are compelled by regulation, industry practice, or by business process, to maintain log management infrastructure that consumes messages from F5 devices.[ii] By utilizing the unique product architecture TMOS employs by sharing its knowledge about networks and applications as well as capabilities built into iRules, TMOS can provide much of this information to log management infrastructure in a simple and knowledgeable manner. In effect, we can emit messages about appliance state and offload many message logging tasks from application servers. Based on our connection knowledge we can also improve the utility and value of information obtained from vendor provided log management infrastructure.[iii] Objectives and success criteria The success criteria for including an item in the toolkit is: 1. A capability to deliver reports on select items using the leading platforms without requiring core development work on an F5 product. 2. An identified extensibility capability for future customization and report building. Assumptions and dependencies Vendors to include in the toolkit are Splunk, Q1Labs and PresiNET ASM logging and reporting is sufficient and does not need further explanation Information to be included in sample reports should begin to assist in diagnostic activities, demonstrate ROI by including ROI in an infrastructure and advise on when F5 devices are nearing capacity Vendor products must be able to accept event data emitted by F5 products. This means that some vendors might have more comprehensive support than others. Products currently supported but not in active development are not eligible for inclusion in the toolkit. Examples are older versions of BIG-IP and FirePass, and all WANJet releases. Some vendor products will require code modifications on the vendor’s side to understand the data F5 products send them. [i] As a piece of customer evidence, Microsoft implemented several logging practices around version 9.1. When they upgraded to version 9.4 their log volume increased several-fold because F5 added log messages and changed existing messages. As a result existing message taxonomy needed to be deprecated and we caused them to need to redesign filters, reports and create a new set of logging practices. [ii] Regulations such as the Sarbanes-Oxley Act, Gramm Leach Blyley Act, Federal Information Security Management Act, PCI DSS, and HIPPA. [iii] It is common for F5 products to manipulate connections via OneConnect, NATs and SNATs. These operations are unknown to external log collectors, and pose a challenge when assembling a complete view of the network connections between a client and a server via an F5 device for a single application transaction. What’s Next? In the next installment we’ll get into the details of the different vendors in question, their offerings, how they work and integrate with BIG-IP, and more. Logging and Reporting Toolkit Series: Part Two | Part Three721Views0likes1CommentBIG-IP Logging and Reporting Toolkit – part four
So far we’ve covered the initial problem, the players involved and one in-depth analysis of one of the options (splunk). Next let’s dig into Q1labs’ Qradar offering. The first thing you’ll need to do, just like last time, is make sure your BIG-IP to pass syslog traffic off the box. Here’s a simple example of how you can get that done in your config file. These are the same as last time, so nothing shockingly new here, though this bit is important. Logging & Reporting Toolkit - Part 1 Logging & Reporting Toolkit - Part 2 Logging & Reporting Toolkit - Part 3 Logging & Reporting Toolkit - Part 4 Bigip v9 syslog { remote server 10.10.200.31 } Bigip v10 syslog { remote server { qradar { host 10.11.100.31 } } } This will send all syslog messages from the BIG-IP to the QRadar system; both BIG-IP system messages and any messages from iRules. If you’re interested in having iRules log to the QRadar system directly you can use the HSL statements or the log statements with a destination host defined. Ex) RULE_INIT has set ::QRadarHost “10.10.200.31” and then in the iRules event you’re interested in you assemble $log_message and then sent it to the log with log $::QRadarHost $log_message . A good practice would be to also record it locally on something like local0 incase the message doesn’t make it to the QRadar system. In my testing I used a single QRadar system running the log collector and event processor. If you’re using a more sophisticated deployment you’ll need to use the Deployment Manager to ensure that the QRadar log collectors are forwarding messages onto the Event Processor you’re going to work with. My QRadar system was already setup to receive syslog messages on port 514, so there wasn’t anything more to do to get messages flowing. The key to working with QRadar is defining regular expressions to extract the message data you’re interested in – once you have that done most things are done using the same process. In this section I’ll walk through all the tasks needed to extract custom data through build a report for the w3c case. Then I’ll show a summary using NEDS and dashboard data. Here are my regexes for QRadar for w3c, NEDS and the dashboard data script: message source attribute name regex capture group sample message dashboard script Compression Deflate uses deflate\.out\.uses='(\d+)' 1 in dc post dashboard script Compression LZO uses lzo\.out\.uses='(\d+)' 1 in dc post dashboard script Compression Null uses null\.out\.uses='(\d+)' 1 in dc post dashboard script Dashboard-messageType message_type='(.+?)' 1 in dc post dashboard script Dashboard-reportingSystem HostName='(.+?)' 1 in dc post dashboard script Dashboard-routingEnabled routing='(.+?)' 1 in dc post NEDS iRule NEDSv1-Flow-clientside-http "(neds\.f5\.conn\.start\.v1)",(\"[\w\.resp\.v1]+\"\,)+\"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}\-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}@\d+\.\d+)\" 1 in NEDS Spec NEDS iRule NEDSv1-clientIPaddress "(neds\.f5\.conn\.start\.v1","[\w\.]+)","(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) 2 in NEDS Spec NEDS iRule NEDSv1-clientPort "(neds\.f5\.conn\.start\.v1","[\w\.]+)","(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:)(\d{1,5}) 3 in NEDS Spec NEDS iRule NEDSv1-clientCloseBytesIn (neds\.f5\.conn\.end\.v1)","([\w\.]+)","(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}@\d+\.\d+)",(\d+.\d+),(\d+),(\d+),(\d+) 7 in NEDS Spec NEDS iRule NEDSv1-clientCloseBytesOut (neds\.f5\.conn\.end\.v1)","([\w\.]+)","(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}@\d+\.\d+)",(\d+.\d+),(\d+),(\d+),(\d+),(\d+) 8 in NEDS Spec NEDS iRule NEDSv1-clientClosePktsIn (neds\.f5\.conn\.end\.v1)","([\w\.]+)","(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}@\d+\.\d+)",(\d+.\d+),(\d+), 5 in NEDS Spec NEDS iRule NEDSv1-clientClosePktsOut (neds\.f5\.conn\.end\.v1)","([\w\.]+)","(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}@\d+\.\d+)",(\d+.\d+),(\d+),(\d+) 6 in NEDS Spec NEDS iRule NEDSv1-clientCloseTimestamp (neds\.f5\.conn\.end\.v1)","([\w\.]+)","(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}@\d+\.\d+)",(\d+.\d+), 4 in NEDS Spec NEDS iRule NEDSv1-clientConnectionIngressVlan (neds[\w\.]+start\.v1\",\"[\w\.]+",")(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}@\d+\.\d+)\",(\d+\.\d+)\,"(\w+)" 4 in NEDS Spec NEDS iRule NEDSv1-clientConnectionPolicyName (neds[\w\.]+start\.v1\",\"[\w\.]+",")(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}@\d+\.\d+)\",(\d+\.\d+)\,"(\w+)"\,(\d+),(\d+),(\d+),\"([\w\.]+)\" 8 in NEDS Spec NEDS iRule NEDSv1-clientConnectionStartTimestamp (neds[\w\.]+start\.v1\",\"[\w\.]+",")(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}@\d+\.\d+)\",(\d+\.\d+) 3 in NEDS Spec NEDS iRule NEDSv1-clientIPProtocol (neds[\w\.]+start\.v1\",\"[\w\.]+",")(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}@\d+\.\d+)\",(\d+\.\d+)\,"(\w+)",(\d+), 5 in NEDS Spec NEDS iRule NEDSv1-httpRequestHost (neds\.f5\.http\.req\.v1)",("[\w\.\"\:\-\@]+)","([\w\.\:\-\@]+)",(\d+\.\d+,\d+),"([\w\.\_\-]+) 5 in NEDS Spec NEDS iRule NEDSv1-httpRequestServerPort (neds\.f5\.http\.resp\.v1)","([\w\.]+)","([\d\.:]+)-([\d\.]+):(\d{1,5}) 5 in NEDS Spec NEDS iRule NEDSv1-httpRequestTCPReplyNumber (neds\.f5\.http\.req\.v1)",("[\w\.\"\:\-\@]+)","([\w\.\:\-\@]+)",(\d+\.\d+),(\d+) 5 in NEDS Spec NEDS iRule NEDSv1-httpRequestUserAgent (neds\.f5\.http\.req\.v1)",("[\w\.\"\,\:\-\@]+)","([\w/\._\%\@]+)",("[\w\@\.]*?"),"([\w/\.\s(;\-\:\)]+) 5 in NEDS Spec NEDS iRule NEDSv1-httpResponseContentLength (neds\.f5\.http\.resp\.v1)","([\w\."\,\:\-\@]+)","([\w\/\;\s\=\-]+)","(\d+) 4 in NEDS Spec NEDS iRule NEDSv1-httpResponseContentType (neds\.f5\.http\.resp\.v1)","([\w\."\,\:\-\@]+)","([\w\/\;\s\=\-]+) 3 in NEDS Spec NEDS iRule NEDSv1-httpResponseLBTarget (neds\.f5\.http\.resp\.v1)","([\w\.\,:\-@/;\s\="]+),"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5})" 3 in NEDS Spec NEDS iRule NEDSv1-reportingSystem (\"neds.+[\w]\.v1\"),\"([\w.]+)\" 2 in NEDS Spec NEDS iRule NEDSv1-responseHTTPContentLength (neds[\w\.]+resp\.v1\",\"[\w\.]+",")(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}@\d+\.\d+)\",(\d+\.\d+),(\d+),"(\d{3})","([\w\/]+)","(\d+)" 7 in NEDS Spec NEDS iRule NEDSv1-responseHTTPServerResponseCode (neds[\w\.]+resp\.v1\",\"[\w\.]+",")(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}-\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}@\d+\.\d+)\",(\d+\.\d+),(\d+),"(\d{3})" 5 in NEDS Spec w3c iRule W3C Client IP address client_ip=(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) 1 in dc post w3c iRule W3C Client Port client_port=(\d{1,5}) 1 in dc post w3c iRule W3C Client username username=([\w]+) 1 in dc post w3c iRule W3C Content Length content_length=(\d+) 1 in dc post w3c iRule W3C HTTP Request request="(.*)"\ss 1 in dc post w3c iRule W3C HTTP version HTTP/(\d\.\d)" 1 in dc post w3c iRule W3C Host header host=(.+?) 1 in dc post w3c iRule W3C Member server lb_server=(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}:\d{1,5}) 1 in dc post w3c iRule W3C Server Response Code server_status=(\d{3}) 1 in dc post w3c iRule W3C User Agent user_agent="(.*)" 1 in dc post w3c iRule W3C VIrtual Server name virtual=(.*?)\s 1 in dc post w3c iRule w3c Server Port lb_server=\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:(\d{1,5}) 1 in dc post w3c iRule w3c referer referer=([\w\./\:\-]+) 1 in dc post w3c iRule w3c resp_time resp_time=(\d+) 1 in dc post W3C offload case Now that BIG-IP is setup to send messages to the QRadar system double check to see that you’ve installed the w3c-client-logging iRule on a vip and we’ll see what it looks like when everything is put together. Login to your QRadar management console and navigate to the Events tab. You should see events streaming into QRadar if everything is configured correctly. If you can’t find what you’re looking for in the normalized events view change it to the raw view – I also opted to have my console autorefresh every minute. There last message is the one I’m looking for. If you find a similar message and double click on it we can start to extract data from the message and build up some searches and reports. I’ve not fully customized my QRadar deployment, so I’m ignoring the fact that the Log Source for the iRule messages has been identified by the system’s FastIronDsm. After your screen is showing the Event Viewer click on the Extract Property button in the button bar. This will launch the Custom Event Property Definition tool, which will allow you to categorize event elements and write the regex for extracting the information you’re interested in. Right now, I’m interested in the HTTP server status codes. For your custom field extractions, here’s a sample message from the W3C iRule: Feb 9 14:23:21 tmm tmm[5088]: Rule w3c-client-logging : virtual=www.f5demo.com_http client_ip=65.197.145.92 client_port=37227 lb_server=10.10.200.1:80 host=www.f5demo.com username= request="GET /compression HTTP/1.0" server_status=301 content_length=322 resp_time=1 user_agent="check_http/1.96 (nagios-plugins 1.4.5)]" referer= The regular expressions for key value pairings are pretty easy to create. In this window you can see that the regex has located the item in the log message I’m interested in – it’s highlighted in yellow. Save the regex extraction and you’ll be returned to the Event Viewer and look for the new property listed on the page. Now the attribute shows up on the Event list page, down at the bottom. I’ve already entered several regular expressions for the NEDS data, and since I’ve assigned them all to the same Device Support Module (DSM) they’re showing up on this page; and this isn’t a NEDS message so they’re not applicable to this stream. With the extraction we just assigned to this DSM and log source we can return to the Event List and build a search. After the search is built it can be used to filter the events list, and we can build a report from it. In the Event Viewer Click on the Search button and select the New Event Search option. I’ve also added extractions to my system for response time, and member server. This helps to further illustrate what’s happening in my environment – I can see what hosts are sending which response codes and get a rough idea on what the client and server performance is. Pick the fields you’re interested in including in the search and click on the ‘Filter’ button. QRadar composes the search and saves it to the list of defined searches. I could also add a regex to extract the BIG-IP name from the message and group by that attribute to get an idea of what’s happening across the various BIG-IPs in my environment. Now the search runs – and I find that in the last 6 hours there have been 147 HTTP 304 response codes recorded by the system. Here’s my search result: /p> I see that in the last 6 hours there have been 147 HTTP 304’s recorded by the system. To turn this search into a report or make it available to the dashboard, click on the Save Criteria button in the toolbar. I’ve found that it helps to group searches together, so I’ve created a group called BIG-IP for all my BIG-IP related searches. For this search to appear on your dashboard you’ll also need to click the “Add item…” button on the dashboard and locate your search. To generate the report, click the Reports tab and find the Actions dropdown in the tool bar – select Create. I’m building a manual report for this step. My report uses a single frame and Events/Logs as the information source. And here’s my report: Accounting for the date format, there were a lot of 304’s returned to clients on March 5 th and I probably have data missing from March 10 th onwards because my BIG-IP was sending log messages somewhere else. NEDS case While the w3c offload case used an iRule with key/value tuples NEDS uses a comma delimited string to convey information in the message. I spent some time with the specification and wrote several regular expressions to extract the data. The process is identical to what’s outlined in the w3c case, so I’ll save the screen real estate and skip the screen shots of the process. You can find my regular expressions here – I’m fairly new to regular expressions, so I’m sure that there are improvements that can be made to make mine more efficient/maintainable. After defining the custom attributes here’s what I get when viewing a NEDS message in the Event Viewer. To syslog, A NEDS message looks like Mar 30 10:44:59 tmm tmm[5088]: Rule networkEventDataStream <HTTP_RESPONSE>: "neds.f5.http.resp.v1","bigip9.f5demo.com","65.197.145.92:42709-65.197.145.93:80@1269971099.951082",1269971099.952527,1,"301","text/html; charset=iso-8859-1","322","10.10.200.1:80","65.197.145.92:42709-10.10.200.1:80" Here’s the detailed view using the regular expressions for http response, client close, http request and client accepted messages. HTTP Response Client close HTTP Request Client Accepted < Here’s a select result of a search I composed for the connection close data. The clientCloseBytesOut is not N/A filters out all the non-client-close NEDS messages. The process to generate a report for NEDS data mirrors the process for the W3C case: 1. Save the search 2. Create a report template 3. Add the search to the report template 4. Save the template 5. Run the report Dashboard data case To get the dashboard data streaming into QRadar, I had to modify my base script to send the messages via syslog, instead of just printing the string. In addition to the QRadar and BIG-IP systems, you’ll need another host with the requisite Perl modules installed to relay the data from the BIG-IP to the QRadar. Here’s the script: Dashboard Syslog Pearl Script To use it you’ll need to: 1. On line 66 configure the username the script should use to access the dashboard interface 2. On line 67 configure the password for the username from step 1 3. On line 48 configure the IP address or name that for the QRadar event collector 4. Schedule the script to run periodically on your relay host – cron would do nicely. Once you’ve got the data into QRadar you’ll need to: 1. Find the endpoint-isession-stat data log message and write your regular expressions for the data you’re interested in 2. Find the remote endpoint log message you’re interested in and write your regular expressions for the data you’re interested in 3. Build and save your searches 4. Build and run your report template Here’s a sample endpoint-isession-stat data log message in syslog: Mar 15 17:05:46 127.0.0.1 10.11.100.73: device_timestamp='Wed Mar 15 00:05:46 2010 GMT' HostName='bigip3900c.demo.f5demo.com' version.version='10.1.0' message_type='endpoint_isession_stat' name='_tunnel_ctrl_10.20.50.103' peer_ref='00:00:00:00:00:00:00:00:00:00:ff:ff:0a:14:32:67' null.in.uses='292670' null.in.errors='0' null.in.bytes_opt='31371493' null.in.bytes_raw='29030133' null.out.uses='292670' null.out.errors='0' null.out.bytes_opt='31371493' null.out.bytes_raw='29030133' lzo.in.uses='1250' lzo.in.errors='0' lzo.in.bytes_opt='139167' lzo.in.bytes_raw='124178' lzo.out.uses='1250' lzo.out.errors='0' lzo.out.bytes_opt='139166' lzo.out.bytes_raw='124177' deflate.in.uses='0' deflate.in.errors='0' deflate.in.bytes_opt='0' deflate.in.bytes_raw='0' deflate.out.uses='0' deflate.out.errors='0' deflate.out.bytes_opt='0' deflate.out.bytes_raw='0' dedup.in.uses='0' dedup.in.errors='0' dedup.in.bytes_opt='0' dedup.in.bytes_raw='0' dedup.out.uses='0' dedup.out.errors='0' dedup.out.bytes_opt='0' dedup.out.bytes_raw='0' dedup_in.hit_bytes='0' dedup_in.hits='0' dedup_in.hit_hist.bucket_1k='0' dedup_in.hit_hist.bucket_2k='0' dedup_in.hit_hist.bucket_4k='0' dedup_in.hit_hist.bucket_8k='0' dedup_in.hit_hist.bucket_16k='0' dedup_in.hit_hist.bucket_32k='0' dedup_in.hit_hist.bucket_64k='0' dedup_in.hit_hist.bucket_128k='0' dedup_in.hit_hist.bucket_256k='0' dedup_in.hit_hist.bucket_512k='0' dedup_in.hit_hist.bucket_1m='0' dedup_in.hit_hist.bucket_large='0' dedup_in.miss_bytes='0' dedup_in.misses='0' dedup_in.miss_hist.bucket_1k='0' dedup_in.miss_hist.bucket_2k='0' dedup_in.miss_hist.bucket_4k='0' dedup_in.miss_hist.bucket_8k='0' dedup_in.miss_hist.bucket_16k='0' dedup_in.miss_hist.bucket_32k='0' dedup_in.miss_hist.bucket_64k='0' dedup_in.miss_hist.bucket_128k='0' dedup_in.miss_hist.bucket_256k='0' dedup_in.miss_hist.bucket_512k='0' dedup_in.miss_hist.bucket_1m='0' dedup_in.miss_hist.bucket_large='0' dedup_out.hit_bytes='0' dedup_out.hits='0' dedup_out.hit_hist.bucket_1k='0' dedup_out.hit_hist.bucket_2k='0' dedup_out.hit_hist.bucket_4k='0' dedup_out.hit_hist.bucket_8k='0' dedup_out.hit_hist.bucket_16k='0' dedup_out.hit_hist.bucket_32k='0' dedup_out.hit_hist.bucket_64k='0' dedup_out.hit_hist.bucket_128k='0' dedup_out.hit_hist.bucket_256k='0' dedup_out.hit_hist.bucket_512k='0' dedup_out.hit_hist.bucket_1m='0' dedup_out.hit_hist.bucket_large='0' dedup_out.miss_bytes='0' dedup_out.misses='0' dedup_out.miss_hist.bucket_1k='0' dedup_out.miss_hist.bucket_2k='0' dedup_out.miss_hist.bucket_4k='0' dedup_out.miss_hist.bucket_8k='0' dedup_out.miss_hist.bucket_16k='0' dedup_out.miss_hist.bucket_32k='0' dedup_out.miss_hist.bucket_64k='0' dedup_out.miss_hist.bucket_128k='0' dedup_out.miss_hist.bucket_256k='0' dedup_out.miss_hist.bucket_512k='0' dedup_out.miss_hist.bucket_1m='0' dedup_out.miss_hist.bucket_large='0' outgoing.conns_idle_cur='0' outgoing.conns_idle_max='0' outgoing.conns_idle_tot='0' outgoing.conns_active_cur='2' outgoing.conns_active_max='3' outgoing.conns_active_tot='3' outgoing.conns_errors='0' outgoing.conns_passthru_tot='0' incoming.conns_idle_cur='0' incoming.conns_idle_max='0' incoming.conns_idle_tot='0' incoming.conns_active_cur='2' incoming.conns_active_max='6' incoming.conns_active_tot='127' incoming.conns_errors='0' incoming.conns_passthru_tot='0' dedup_status_array='cccc ' Here’s a sample remote endpoint log message in syslog: Mar 15 17:05:50 127.0.0.1 10.11.100.73: device_timestamp='Wed Mar 15 00:05:46 2010 GMT' HostName='bigip3900c.demo.f5demo.com' version.version='10.1.0' message_type='woc_peer' peer_ref='10.20.50.103' name='bigip3900b.demo.f5demo.com' UUID='cd18:4840:f9e0:' mgmt_addr='10.11.100.72' version='10.1.0' dedup_cache='203588' dedup_action='DEDUP_ACTION_NONE' dedup_cache_refresh_flag='false' dedup_cache_refresh_count='0' state='WOC_PEER_STATE_READY' is_enabled='true' origin='MCP_ORIGIN_CONFIGURED' profile_serverssl='' tunnel_encrypt_data='true' tunnel_port='443' behind_nat='false' source_address='WOC_PEER_NAT_SOURCE_ADDRESS_NONE' config_status='none' routing='true' addr_list='' And lastly if you’re an ASM or APM user there’s an F5 Networks DSM that will recognize your ASM logs; all you need to do is define the hostname or IP address of your QRadar system on your ASM logging profile.608Views0likes1CommentDire (Load Balancing) Straits: I Want My Client IP
load balancing fu for developers to avoid losing what is vital business data I want my client IP Now read the manuals that's the way you do it Give the IP to the SLB That ain't workin' that’s the way to do it Set the gateway on the server to your SLB (many apologies to Dire Straits) My brother called me last week with a load balancing emergency. See, for most retailers the “big day” of the year is the Friday after Thanksgiving. In Wisconsin, particularly for retailers that focus on water sports, the “big day” of the year is Memorial Day because it’s the first long weekend of the summer. My brother was up late trying to help one of those retailers prepare for the weekend and the inevitable spike in traffic to the websites of people looking for boats and water-skis and other water-related sports “things”. They’d just installed a pair of load balancers and though they worked just fine they had a huge problem on their hands: the application wasn’t getting the client IP but instead was logging the IP address of the load balancers. This is a common problem that crops up the first time an application is horizontally scaled out using a Load balancer. The business (sales and marketing) analysts need the IP address of the client in order to properly slice and dice data collected. The application must log the correct IP address or the data is virtually useless to the business. There are a couple of solutions to this problem; which one you choose will depend on the human resources you have available and whether or not you can change your network architecture. The latter would be particularly challenging for applications deployed in a cloud computing environment that are taking advantage of a load balancing solution deployed as a virtual network appliance or “softADC” along with the applications, because you only have control over the configuration of your virtual images and the virtual network appliance-hosted load balancer. THE PROBLEM The problem is that often times load balancers in small-medium sized datacenters and cloud computing environments are deployed in a flat network architecture, with the load balancer essentially in a one-armed (or side-armed) configuration. This means the load balancer needs only one interface, and there’s no complex network routing necessary because the load balancer and the servers are on the same network. It’s also often used as a transparent option for a new load balancing implementation when applications are already live and in use and disrupting service is not an option. But herein lies the source of the problem: because the load balancer and the servers are on the same network the load balancer must pretend to be the client in order to force the responses back through the load balancer (This is also true of larger and more complex configurations in which the load balancer acts as a full proxy). What happens generally is the load balancer replaces the client IP address in the network layer with its own IP address to force responses to come back to it, then rewrites the Ethernet header on the way back out. When the web server logs the transaction, the client IP address is the IP address of the load balancer, and that’s what gets written into the logs. This is also the case in larger deployments when the load balancer is a full proxy; the requests appear to come from the load balancer in order to apply the appropriate optimization and security policies on the responses, e.g. content scrubbing and TCP multiplexing. The load balancer/ADC cannot provide any kind of TCP connection optimization (such as is used to increase VM density) and application acceleration capabilities in such a mode because it never sees the responses. The result is a DSR (Direct Server Return) mode of operation, meaning that while requests traverse the load balancer, the responses don’t. This is the situation my brother found himself in (on his birthday, no less, what a dedicated guy) when he called me. It was Friday afternoon of Memorial Day weekend and the site absolutely had to be up and logging client IP addresses accurately or the client – a retailer of water-related goods - was going to be very, very, very angry. THE SOLUTION(s) You might be thinking the obvious solution to this problem is to turn on the spoofing option on the load balancer. It is, but it isn’t. Or at least it isn’t without a configuration change to the servers. See, the real problem in these scenarios is that the servers are still configured with a default gateway other than the load balancer. Enabling spoofing, i.e. the ability of the load balancer to pretend to be the client IP address, in this situation would have resulted in the web server’s responses still being routed around the load balancer because the servers, having no static route to the client IP, would automatically send them to the default gateway (the router). The solution is to change the network configuration of the web servers to use a default gateway that is the load balancer’s IP address, thus insuring that responses are routed back through the load balancer regardless of the client IP. Once that’s accomplished then you can enable SNAT and/or spoofing or “preserve client IP” or whatever the vendor refers to the functionality as and the web server logs will show the actual IP address of the client instead of the load balancer. Another option is to enable the load balancer to insert the “X-Forwarded-For” custom HTTP header. Most modern load balancers offer this as a checkbox configuration feature. Every HTTP request is modified by the load balancer to include the custom HTTP header with a value that is the client’s true IP address. It is then the application’s responsibility to extract that value instead of the default for logging and per-client personalization/authorization. The extent of the modification of the application depends on how reliant on the client IP address is. If it’s only for logging then a web/app server configuration change is likely the only change necessary. If the application makes use of that IP to apply policies or make application-specific decisions, then modification to the application itself will be necessary. THIS is WHY DEVOPS is NECESSARY As more developers take to the cloud as a deployment option for applications this very basic – but very frustrating – scenario is likely to occur more often than not as developers become more familiar with load balancers and deploy them in an IaaS environment. IaaS providers offering “auto-scaling” or “load balancing” services as a capability will have taken this problem into consideration when they architected their underlying infrastructure and thus the customer is relieved of responsibility for the nitty-gritty details. Customers deploying a separate virtualized load balancing solution – to take advantage of more robust load balancing algorithms and application delivery features like network-side scripting – will need to be aware of the impact of network configuration on the application’s ability to “see” the client IP address. Even if the customer is not concerned about the ability to extract the real client IP address it is still important to understand the impact to the return data path as well as the loss of functionality (optimization, security, acceleration) from routing around the load balancing solution on the application response. This particular knowledge falls into the responsibility demesne of what folks like James Urquhart and Damon Edwards and Shlomo Swidler (among many others) calls “devops”; developers whose skill set and knowledge expands into operations in order to successfully take advantage of cloud computing. Developers must be involved with and understand the implications of what has before been primarily an operational concern because it impacts their application and potentially impacts the ability of their applications to function properly. Development and operations must collaborate on just such concerns because it impacts the business and its ability to succeed in its functions, such as understanding the demographics of its customer base by applying GeoLocation technologies to client IP addresses. Without the real client IP address such demographic data mining fails and the business cannot make the adjustments necessary to its products/services. Business, operations, and development must collaborate to understand the requirements and then implement them in the most operationally efficient manner possible. Devops promises to enable that collaboration by tearing down the wall that has long existed between developers and operations.557Views0likes0CommentsBIG-IP Logging and Reporting Toolkit - part two
In this second offering from Joe Malek’s delve into some advanced configuration concepts, and more specifically the logging and reporting world, we take a look at the vendors that he investigated, what they offer, and how they integrate with F5 products. He discusses some of the capabilities of each, their strengths and weaknesses and some of the things you might use each for. If you’ve been wondering what your options are for more in-depth log analysis and reporting, take a look to see what his thoughts are on a couple of the leading solutions. Logging & Reporting Toolkit - Part 1 Logging & Reporting Toolkit - Part 2 Logging & Reporting Toolkit - Part 3 Logging & Reporting Toolkit - Part 4 Vendor descriptions: Splunk - http://www.splunk.com/ “IT Search” is Splunk’s self identified core functionality. Splunk’s software contains multiple ways to obtain data from IT systems, indexes the data and reports on the data using a web interface. Splunk has invested in creating a Splunk for F5 application containing dashboard style views into log data for F5 products. Currently included in the application are LTM, GTM, ASM, APM and FirePass. The application is able to consume log messages sent to Splunk servers via syslog – and by extension iRules using High Speed Logging. Splunk is deployed as software to be installed on a customer provided system. Windows, Mac OS, Linux, AIX, and BSD variants are all supported host operating systems. Splunk can receive messages via files, syslog, SNMP, SCP, SFTP, FTP, generic network ports, FIFO queues, directory crawling and scripting. Splunk has a very intuitive and “Google like” interface allowing users to easily navigate and report on data in the system. Users are able to define reports, indices, dashboards and applications to present data as an organization requires. Upon receipt of data, Splunk can process the data according to in-built training or according to a user constructed taxonomy. Q1 Labs - http://www.q1labs.com/ Q1 Labs brings a product called QRadar to market. QRadar combines functionality commonly found in SIEM, log management and network behavior analysis products. Q1 products are able to consume event messages as well as record information on a network connection basis. QRadar is available as a pay-for appliance and a no-charge edition in a virtual machine. Differences between the two editions are the SIEM and advanced correlation functionality. The no-charge edition is a log management tool only. QRadar can receive messages via syslog SNMP, JDBC connectors, SFTP, FTP, SCP, and SDEE. Additionally QRadar con obtain network flow information in a port mirror/span mode. Customizing data views and report building are based on regular expressions. Customers can create their own regular expressions and build upon pre-configured expressions for reporting. In the SIEM module, QRadar includes approximately 250 events that can be sequenced together into complex “Offenses” in a manner similar to building a rule in Microsoft Outlook. “Universal Device Support Modules” can be created and shared among Q1 Labs customers. PresiNET – http://www.presinet.com/ Whereas tcpdump is like an x-ray for your network, Total View One is like an MRI. Total View One enables customers to maximize the use of infrastructure resources and network performance. Total View One sensors collect protocol state information by tracking connections through a network. This is commonly done out-of-line from traffic streams via port mirroring or network tap technologies. Currently PresiNET has implemented the NEDS specification which enables Total View One to receive messages from BIG-IP products to process them as if they’d come from a PresiNET sensor. This integration started with the NEDS iRule and specification and from this PresiNET created their own parser. PresiNET products are delivered as appliances in both a central unit and sensor unit mode. Optionally one may subscribe to PresiNET on a managed service basis. After you install a Total View One product in your network you get access to extensive views of available state information – with little or no additional work. If the included reporting capabilities aren’t enough, you can export data from the system as a CSV file. What’s Next? Now that you know who the players are and what they can do, be sure to check back next week to look at how the F5 products generate logs, how these technologies deal with them, and some testing results. To give you more of an idea of what’s to come, I’ll leave you with a look at the facts that will be delivered to the reporting systems from the F5 device(s) to see how they’re handled: Virtual server accessed, client IP address, client port, LB decision results, http host, http username, user-agent string, content encoding, requested URI, requested path, content type, content length, request time, server string, server port, status code, device identifier, referrer, host header, response time, VLAN id, IP protocol, IP type of service, connection end time, packets, bytes, anything sent to a dashboard, firewall messages, client source geography, extended application log data, health information for back end filers, audit logs, SNMP trap information, dedup efficacy, compression codec efficacy, wom error counters, link characteristics as known, system state Logging and Reporting Toolkit Series: Part One | Part Three447Views0likes0CommentsF5 Friday: Latency, Logging, and Sprawl
#v11 Logging, necessary for a variety of reasons in the data center, can consume resources and introduce undesirable latency. Avoiding that latency improves application performance and in some cases, the quality of logs. Logging. It’s mandatory and, in some industries, critical. Logs are used not only for auditing and tracking but for debugging, for data mining and analysis, and in some tiers of the architecture, replication and synchronization of data. Logs are a critical component across the data center, of that there is no doubt. That’s why it’s particularly frustrating to know that the cost in terms of performance is also one of the highest, lagging only slightly behind graphics in terms of performance costs. Given that there is very little graphics-related processing that goes on in data center components, disk I/O leaps to the top of the stack when it comes to performance impeding operations. The latency introduced by writing to a log often impacts the overall performance experienced by end-users because of the consumption of resources on the component by the logging operations. While generally out-of-band and thus non-blocking today, the consumption of resources can negatively impede performance by draining memory and using CPU cycles to perform its required tasks. Increasingly, as components are deployed in pairs, triples and more – owing to both scaling out physically and virtually – these logs also introduce “log sprawl” that can increase the cost associated with administration and make it more difficult to troubleshoot. After all, if you aren’t sure through which instance of a device a specific request was sent, you can’t easily find it in the log file. For all these reasons, centralized and generally off-box logging for data center components is becoming more critical. Consider it “logging as a service” if you will. This is not a new concept; centralized syslog servers have long been leveraged to provide a centralized, easier to manage log service that can be leveraged by just about every data center component. For load balancing services, the need is to not only centralize web-related logs but to ensure that they are written as fast as possible, to keep up with today’s demanding application environments. BIG-IP is no stranger to the need for high-speed, off-device logging and with v11 brings an open application, high-speed logging engine to bear. BIG-IP HIGH-SPEED LOGGING ENGINE One of the benefits of a unified, internal architecture is the ability to share improvements in the underlying platform across all products ultimately deployed on that platform. This is the case with TMOS, F5’s core application delivery technology. By enabling TMOS with a high-speed logging engine capable of up to 200,000 UDP/TCP messages per second, all modules – LTM, GTM, APM, ASM, WA, etc… – deployed on the TMOS platform automatically gain the benefits. Support for both local and external (off-box) logging enables you to centralize the data in third-party logging engines and meet security and compliance requirements. That means you can, ostensibly, leverage the visibility of a strategic point of control in the network to perform logging of web requests (and responses if required) rather than spread the responsibility across what may be an unknown number of web servers. Consider that in a highly-virtualized or cloud computing -based architecture, the number of servers required to meet current demand is variable and makes collection of web-server written logs more difficult unless an off-server log service is leveraged. That’s because virtualized servers often simply write logs to the local disk, which may or may not be persistent enough to meet compliance – or operational – demands. It’s also the case that some upstream infrastructure may modify the request and/or response, leaving logs with incomplete information. This is the case when an external application delivery controller acts as a Cookie gateway, a common function for adding security and consistency to web applications. Thus, logging at a strategic point, closest to the client, provides the most accurate picture of the request. Consider, too, the impact on writing logs in the face of an attack. DDoS counts on the consumption of resources to drain server and network component capacity, and by increasing the number of requests a server has to handle, it also gets an added consumption bonus from the need to write to the log. This is true on upstream network components, which compounds the impact and drains more resources than necessary. By enabling high-speed logging on upstream devices, offloading responsibility to a log service, and eliminating the need for web servers to also write to disk, the impact of a concerted DDoS attack can be more effectively managed. And if you’re going to use an off-server log service, it is more efficient to do so at a point upstream from the web-servers and gain the benefits of reducing resource consumption on the servers. Eliminating the resource consumption required by logs on the web server can have a very positive impact on the performance and capacity of the web server, which when combined with improvements in logging speed and reduced consumption on the BIG-IP translate into faster web applications and simplified log management strategy. High speed logging (HSL) is configurable using the GUI (via the Request Logging Profile) and supports the W3C extended log format. Happy Logging!382Views0likes0Comments