Concept of F5 Device DoS and DoS profiles
Dear Reader, I´m about to write a series of DDoS articles, which will hopefully help you to better understand how F5 DDoS works and how to configure it. I plan to release them in a more or less regular frequency. Feel free to send me your feedback and let me know which topics are relevant for you and should get covered. This first article is intended to give an explanation of the BIG-IP Device DOS and Per Service-DOS protection (DOS profile). It covers the concepts of both approaches and explains in high level the threshold modes “Fully manual”, “Fully Automatic” and “Multiplier Based Mitigation” including the principles of stress measurement. In later articles I will describe this threshold modes in more detail. In the end of the article you will also find an explanation of the physical and logical data path of the BIG-IP. Device DOS with static DoS vectors The primary goal of “Device DOS” is to protect the BIG-IP itself. BIG-IP has to deal with all packets arriving on the device, regardless of an existing listener (Virtual Server (VS)/ Protected Object (PO)), connection entry, NAT entry or whatever. As soon as a packet hits the BIG-IP it´s CPU has basically to deal with it. Picking up each and every packet consumes CPU cycles, especially under heavy DOS conditions. The more operations each packet needs, depending on configurations, the higher the CPU load gets, because it consumes more CPU cycles. The earlier BIG-IP can drop a packet (if the packet needs to get dropped) the better it is for the CPU load. If the drop process is done in hardware (FPGA), the packet gets dropped before it consumes CPU cycles. Using static DOS vectors helps to lower the number of packets hitting the CPU when under DoS. If the number of packets is above a certain threshold (usually when a flood happens), BIG-IP rate-limits for that specific vector, which keeps the CPU available, because it sees less packets. Figure 1: Principle of Attack mitigation with static DoS vectors The downside with this approach is that the BIG-IP cannot differentiate between "legitimate" and "attacking" packets, when using static DOS rate-limit. BIG-IP (FPGA) just drops on the predicates it gets from the static DOS vector. Predicates are attributes of a packet like protocol “TCP” or “flag” set “ACK”. -Keep in mind, when BIG-IP runs stateful, "bad" packets will get dropped in software anyway by the operating system (TMOS), when it identifies them as not belonging to an existing connection or as invalid/broken (for example bad CRC checksum). This is of course different for SYN packets, because they create connection entries. This is where SYN-Cookies play an important role under DoS conditions and will be explained in a later article. I usually recommend running Behavioral DoS mitigation in conjunction with static DoS vectors. It is way more precise and able to filter out only the attack traffic, but I will discuss this in more detail also in one of the following articles. Manual threshold vs. Fully Automatic Before I start to explain the “per service DoS protection” (DoS profiles), I would like to give you a brief overview of some Threshold modes you can use per DoS vector. There are different ways to set thresholds to activate the detection and rate-limiting on a DOS vector. The operator can do it manual by using the option “Fully Manual” and then fine tune the pre-configured values on the DOS vectors, which can be challenging, because beside doing it all manually it´s mostly difficult to know the thresholds, especially because they are usually related to the day and time of the week. Figure 2: Example of a manual DoS vector configuration That’s why most of the vectors have the option "Fully Automatic" available. It means BIG-IP will learn from the history and “knows” how many packets it usually “sees” at that specific time for the specific vector. This is called the baselining and calculates the detection threshold. Figure 3: Threshold Modes As soon as a flood hits the BIG-IP and crosses the detection threshold for a vector, BIG-IP detects it as an attack for that vector, which basically means it identifies it as an anomaly. However, it does not start to mitigate (drop). This will only happen as soon as a TMM load (CPU load) is also above a certain utilization (Mitigation sensitivity: Low: 78%, Medium 68%, High: 51%). If both conditions (packet rate and TMM/CPU load is too high) are true, the mitigation starts and lowers (rate-limit) the number of packets for this vector going into the BIG-IP for the specific TMM. That means dropping packets by the DOS feature will only happen if necessary, in order to protect the BIG-IP CPU. This is a dynamic process which drops more packets when the attack becomes more aggressive and less when the attack becomes less aggressive. For TCP traffic keep in mind the “invalid” traffic will not get forwarded anyway, which is a strong benefit of running the device stateful. When a DOS vector is hardware supported then FPGAs drop the packets basically at the switch level of the BIG-IP. If it’s not hardware supported, then the packet is dropped at a very early stage of the life cycle of a packet “inside” a BIG-IP. Device DOS counts ALL packets going to the BIG-IP. It doesn´t matter if they have a destination behind the BIG-IP, or when it is the BIG-IP itself. BecauseDevice DOSis to protect the BIG-IP,the thresholdsyou can configure in the “manual configuration” mode on the DOS vectorsare per TMM! This means you configure the maximum number of packets one TMM can handle. Very often operators want to set an overall threshold of let’s say 10k, then they need to divide this limit by the number of TMMs.An Exception is the Sweep and Flood vector. Here the threshold is per BIG-IP and not per TMM. DOS profile for protected Objects Now let’s talk about the “per Service DoS Protection (DoS profile)”. The goal is to protect the service which runs behind the BIG-IP. Important to know, when BIG-IP runs stateful, the service is already protected by the state of BIG-IP (true for TCP traffic). That means a randomized ACK flood for example will never hit the service. On stateless traffic (UDP/ICMP) it is different, here the BIG-IP simply forward packets.Fully Automatic used on a DoS profile discovers thehealth of the service, which works well if its TCP or DNS traffic. (This is done by evaluating the TCP behavior like Window Size, Retransmission, Congestions, or counting DNS requests vs. responses, ....). If the detection rate for that service is crossed (anomaly detected) and stress on the service is identified, the mitigation starts. Again, keep in mind for TCP this will only happen for “legitimate” traffic, because out of the state traffic will never reach the service. It is already protected by the BIG-IP being configured state-full. An example could be a HTTP flood.My recommendation:If you have the option to configureL7 BaDOS(Behavioral DOS running on DHD/A-WAF), then go with that one. It gives much better results and better mitigation options on HTTP floods then just L3/4 DOS, because it does also operate on L7. Please also do thennot configureTCP DOS for “Push Flood”, "bad ACK Flood" and “SYN Flood”, on that DOS profile, except you use a TMOS version newer than 14.1. For other TCP services like POP3, SMTP, LDAP … you can always go with the L3/4 DOS vectors. Since TMOS version 15.0 L7 BaDOS and L3/4 DOS vectors work in conjunction. Stateless traffic (UDP/ICMP) For UDP traffic I would like to separate UDP into DNS traffic and non-DNS-UDP traffic. DNS is a very good example where the health detection mechanism works great. This is done by measuring the ratio of requests vs. responses. For example, if BIG-IP sees 100k queries/sec going to the DNS server and the server sends back 100k answers/sec, the server shows it can handle the load. If it sees 200k queries/sec going to the server and the server sends back only 150k queries/sec, then it is a good indication of that the server cannot handle the load. In that case the BIG-IP would start to rate-limit, when the current rate is also above the detection rate (The rate BIG-IP expects based on the history rates). The 'Device UDP' vector gives you the option to exclude UDP traffic on specific ports from the UDP vector (packet counting). For example, when you exclude port 53 and UDP traffic hits that port, then it will not count into the UDP counter. In that case you would handle the DNS traffic with the DNS vectors. Figure 4 : UDP Port exclusion Here you see an example for port 53 (DNS), 5060 (SIP), 123 (NTP) Auto Detection / Multiplier Based Mitigation If the traffic is non-DNS-UDP traffic or ICMP traffic the stress measurement does not work very accurate, so I recommend going with the“multiplier option” on the DOS profile. Here BIG-IP does the baselining (calculating the detection rate) similar to “fully automatic” mode, except it will kick in (rate-limit) if it sees more than the defined multiplication for the specific vector, regardless of the CPU load. For example, when the calculated detection rate is 250k packets/sec and the multiplication is set to 500 (which means 5 times), then the mitigation rate would be 250k x 5 = 1.250.000 packets/sec. The multiplier feature gives the nice option to configure a threshold based on the multiplication of an expected rate. Figure 5 : Multiplier based mitigation On aDOS profilethethreshold configuration is per service (all packets targeted to the protected service), which actually means on that BIG-IP and NOT per TMM like on the Device level. Here the goal is to set how many packets are allowed to pass the BIG-IP and reach the service. The distribution of these thresholds to the TMMs is done in a dynamic way: Every TMM gets a percentage of the configured threshold, based on the EPS (Events Per Second, which is in this context Packets Per Second) for the specific vector the system has seen in the second before on this TMM. This mechanism protects against hash type of attacks. Physical and logical data path There is aphysical and logical traffic pathin BIG-IP. When a packet gets dropped via a DOS profile on a VS/PO, then BIG-IP will not see it on the device level counter anymore. If the threshold on the device level is reached, the packet gets dropped by the device level and will not get to the DOS profile. So, the physical path is device first and then comes the VS/PO, but the logical path is VS/PO first and then device. That means, when a packet arrives on the device level (physical first), that counter gets incremented. If the threshold is not reached, the packet goes to the VS/PO (DOS profile) level. If the threshold is reached here, then the packet gets dropped and the counter for the device DOS vector decremented. If the packet does not get dropped, then the packet counts into both counters (Device/VS). This means that the VS/PO thresholds should always set lower than device thresholds (remember on Device level you set the thresholds per TMM!). The device threshold should be the maximum the BIG-IP (TMM) can handle overall. It is important to note that mitigation in manual mode are “hard” numbers and do not take into account the stress on the service. In “Fully Automatic” mode on the VS/PO level, mitigation kicks in when the detection threshold is reached AND the stress on the service is too high. Now let’s assume a protected TCP service behind the BIG-IP gets attacked by a TCP attack like a randomized Push/ACK food. Because of the high packet rate hitting that vector on the attached DOS profile for that PO, it will go into 'detect mode'. But because it is a TCP flood and BIG-IP is configured to run state-full, the attack packets will never reach the service behind the BIG-IP and therefore the stress on that service will never go high in this case. The flood is then handled by the session state (CPU) of the BIG-IP until this gets too much under pressure and then the Device DOS will kick in “upfront” and mitigate the flood. If you use the “multiplication” option, then the mitigation kicks in when the packet detection rate + multiplication is reached. This can happen on the VS/PO level and/or on the Device level, because it is independent of stress. “Fully automatic” will not work properly with asymmetric traffic for VS/PO, because here BIG-IP cannot identify if the service is under stress due to the fact that BIG-IP will only see half of the traffic. -Either request or response depending on where it is initiated. For deployments with asymmetric traffic I recommend using the manual or “multiplier option”, on VS/PO configurations. The “Multiplier option” keeps it dynamic in regard to the calculated detection and mitigation rate based on the history. “Fully Automatic” works with Asymmetric traffic on Device DOS, because it measures the CPU (TMM) stress of the BIG-IP itself. Of course, when you run asymmetric traffic, BIG-IP can´t be state-full. I recommend to always start first with the Device DOS configuration to protect the BIG-IP itself against DOS floods and then to focus on the DOS protection for the services behind the BIG-IP. On Device DOS you will probably mostly use “Fully Manual” and “Fully Automatic” threshold modes. To protect services, you will mostly go with “Fully Manual” and “Fully Automatic” or “Auto Detection/ Multiplier based mitigation”. In one of the next articles I will describe in more detail when to use what and why. As mentioned before, additional to the static DoS vectors I recommend enabling Behavioral DOS, which is way smarter than the static DOS vectors and can filter out specifically only the attack traffic which tremendously decrease the chance of false positives. You can enable it like the static DoS vectors on Device- an on VS/PO level. But this topic will be covered in another article as well. With that said I would like to finish my first article. Let me know your feedback. Thank you, sVen Mueller7.2KViews34likes7CommentsExplanation of F5 DDoS threshold modes
Der Reader, In my article “Concept of Device DOS and DOS profile”, I recommended to use the “Fully Automatic” or “Multiplier” based configuration option for some DOS vectors. In this article I would like to explain how these threshold modes work and what is happening behind the scene. When you configure a DOS vector you have the option to choose between different threshold modes: “Fully Automatic”, “Auto Detection / Multiplier Based”, “Manual Detection / Auto Mitigation” and “Fully Manual”. Figure 1: Threshold Modes The two options I normally use on many vectors are “Fully Automatic” and “Auto Detection / Multiplier Based”. But what are these two options do for me? To manually set thresholds is for some vectors not an easy task. I mean who really knows how many PUSH/ACK packets/sec for example are usually hitting the device or a specific service? And when I have an idea about a value, should this be a static value? Or should I better take the maximum value I have seen so far? And how many packets per second should I put on top to make sure the system is not kicking in too early? When should I adjust it? Do I have increasing traffic? Fully Automatic In reality, the rate changes constantly and most likely during the day I will have more PUSH/ACK packets/sec then during the night. What happens when there is a campaign or an event like “Black Friday” and way more users are visiting the webpage then usually? During these high traffic events, my suggested thresholds might be no longer correct which could lead to “good” traffic getting dropped. All this should be taken into consideration when setting a threshold and it ends up being very difficult to do manually. It´s better to make the machine doing it for you and this is what “Fully Automatic” is about. Figure 2: Expected EPS As soon as you use this option, it leverages from the learning it has done since traffic is passing through the BIG-IP, or since you have enforced the relearning, which resets everything learned so far and starts from new. The system continuously calculates the expected rates for all the vectors based on the historic rates of traffic. It takes the information up to one year and calculates them with different weights in order to know which packets rate should be expected at that time and day for that specific vector in the specific context (Device, Virtual Server/Protected Object). The system then calculates a padding on top of this expected rate. This rate is called Detection Rate and is dependent on the “threshold sensitivity” you have configured: Low Sensitivity means 66% padding Medium Sensitivity means 40% padding High Sensitivity means 0% padding Figure 3: Detection EPS As soon as the current rate is above the detection value, the BIG-IP will show the message “Attack detected”, which actually means anomaly detected, because it sees more packets of that specific vector then expected + the padding (detection_rate). But DoS mitigation will not start at that point! Figure 4: Current EPS Keep in mind, when you run the BIG-IP in stateful mode it will drop 'out of state' packets anyway. This has nothing to do with DoS functionalities. But what happens when there is a serious flood and the BIG-IP CPU gets high because of the massive number of packets it has to deal with? This is when the second part of the “Fully Automatic” approach comes into the game. Again, depending on your threshold sensitivity the DOS mitigation starts as soon as a certain level of stress is detected on the CPU of the BIG-IP. Figure 5: Mitigation Threshold Low Sensitivity means 78,3% TMM load Medium Sensitivity means 68,3% TMM load High Sensitivity means 51,6% TMM load Note, that the mitigation is per TMM and therefore the stress and rate per TMM is relevant. When the traffic rate for that vector is above the detection rate and the CPU of the BIG-IP (Device DOS) is “too” busy, the mitigation kicks in and will rate limit on that specific vector. When a DOS vector is hardware supported, FPGAs drop the packets at the switch level of the BIG-IP. If that DOS vector is not hardware supported, then the packet is dropped at a very early stage of the life cycle of a packet inside a BIG-IP. The rate at which are packets dropped is dynamic (mitigation rate), depending on the incoming number of packets for that vector and the CPU (TMM) stress of the BIG-IP. This allows the stress of the CPU to go down as it has to deal with less packets. Once the incoming rate is again below the detection rate, the system declares the attack as ended. Note: When an attack is detected, the packet rate during that time will not go into the calculation of expected rates for the future. This ensures that the BIG-IP will not learn from attack traffic skewing the automatic thresholds. All traffic rates below the detection (or below the floor value, when configured) rate modify the expected rate for the future and the BIG-IP will adjust the detection rate automatically. For most of the vectors you can configure a floor and ceiling value. Floor means that as long as the traffic value is below that threshold, the mitigation for that vector will never kick in. Even when the CPU is at 100%. Ceiling means that mitigation always kicks in at that rate, even when the CPU is idle. With these two values the dynamic and automatic process is done between floor and ceiling. Mitigation only gets executed when the rate is above the rate of the Floor EPS and Detection EPS AND stress on the particular context is measured. Figure 6: Floor and Ceiling EPS What is the difference when you use “Fully Automatic” on Device level compared to VS/PO (DOS profile) level? Everything is the same, except that on VS or Protected Object (PO) level the relevant stress is NOT the BIG-IP device stress, it is the stress of the service you are protecting (web-, DNS, IMAP-server, network, ...). BIG-IP can measure the stress of the service by measuring TCP attributes like retransmission, window size, congestion, etc. This gives a good indication on how busy a service is. This works very well for request/response protocols like TCP, HTTP, DNS. I recommend using this, when the Protected Object is a single service and not a “wildcard” Protected Object covering for example a network or service range. When the Protected Object is a “wildcard” service and/or a UDP service (except DNS), I recommend using “Auto Detection / Multiplier Option”. It works in the same way as the “Fully Automatic” from the learning perspective, but the mitigation condition is not stress, it is the multiplication of the detection rate. For example, the detection rate for a specific vector is calculated to be 100k packets/sec. By default, the multiplication rate is “500”, which means 5x. Therefore, the mitigation rate is calculated to 500k packets/sec. If that particular vector has more than 500k packets/sec those packets would be dropped. The multiplication rate can also be individually configured. Like in the screenshot, where it is set to 8x (800). Figure 7: Auto Detection / Multiplier Based Mitigation The benefit of this mode is that the BIG-IP will automatically learn the baseline for that vector and will only start to mitigate based on a large spike. The mitigation rate is always a multiplication of the detection rate, which is 5x by default but is configurable. When should I use “Fully Manual”? When you want to rate-limit a specific vector to a certain number of packets/sec, then “Fully Manual” is the right choice. Very good examples for that type of vector are the “Bad Header” vector types. These type of packets will never get forwarded by the BIG-IP so dropping them by a DoS vector saves the CPU, which is beneficial under DoS conditions. In the screenshot below is a vector configured as “Fully Manual”. Next I’ll describe what each of the options means. Figure 8: Fully Manual Detection Threshold EPS configures the packet rate/sec (pps) when you will get a log messages (NO mitigation!). Detection Threshold % compares the current pps rate for that vector with the multiplication of the configured percentage (in this example 5 for 500%) with the 1-minute average rate. If the current rate is higher, then you will get a log message. Mitigation Threshold EPS rate limits to that configured value (mitigation). I recommend setting the threshold (Mitigation Threshold EPS) to something relatively low like ‘10’ or ‘100’ on ‘Bad Header’ type of vectors. You can also set it also to ‘0’, which means all packets hitting this vector will get dropped by the DoS function which usually is done in hardware (FPGA). With the ‘Detection Threshold EPS’ you set the rate at which you want to get a log messages for that vector. If you do it this way, then you get a warning message like this one to inform you about the logging behavior: Warning generated: DOS attack data (bad-tcp-flags-all-set): Since drop limit is less than detection limit, packets dropped below the detection limit rate will not be logged. Another use-case for “Fully Manual” is when you know the maximum number of these packets the service can handle. But here my recommendation is to still use “Fully Automatic” and set the maximum rate with the Ceiling threshold, because then the protected service will benefit from both threshold options. Important: Please keep in mind, when you set manual thresholds for Device DoS the thresholds are to protect each TMM. Therefore the value you set is per TMM! An exception to this is the Sweep and Flood vector where the threshold is per BIG-IP/service and not per TMM like on DoS profiles. When using manual thresholds for aDOS profileof a Protected Object thethreshold configuration is per service (all packets targeted to the protected service) NOT per TMM like on the Device level. Here the goal is to set how many packets are allowed to pass the BIG-IP and reach the service. The distribution of these thresholds to the TMMs is done in a dynamic way: Every TMM gets a percentage of the configured threshold, based on the EPS (Events Per Second, which is in this context Packets Per Second) for the specific vector the system has seen in the second before on this TMM. This mechanism protects against hash type of attacks. Ok, I hope this article gives you a better understanding on how ‘Fully Manual’, ‘Fully Automatic’ and ‘Auto Detection / Multiplier Based Mitigation’ works. They are important concepts to understand, especially when they work in conjunction with stress measurement. This means the BIG-IP will only kick in with the DoS mitigation when the protected object (BIG-IP or the service behind the BIG-IP) is under stress. -Why risk false positives, when not necessary? With my next article I will demonstrate you how Device DoS and the DoS profiles work together andhow the stateful concept cooperates with the DoS mitigation. I will show you some DoS commands to test it and also commands to get details from the BIG-IP via CLI. Thank you, sVen Mueller9.8KViews21likes10CommentsDemonstration of Device DoS and Per-Service DoS protection
Dear Reader, This article is intended to show what effect the different threshold modes have on the Device and Per-Service (VS/PO) context. I will be using practical examples to demonstrate those effects. You will get to review a couple of scripts which will help you to do DoS flood tests and “visualizing” results on the CLI. In my first article (https://devcentral.f5.com/s/articles/Concept-of-F5-Device-DoS-and-DoS-profiles), I talked about the concept of F5 Device DoS and Per-Service DoS protection (DoS profiles). I also covered the physical and logical data path, which explains the order of Device DoS and Per-Service DoS using the DoS profiles. In the second article (https://devcentral.f5.com/s/articles/Explanation-of-F5-DDoS-threshold-modes), I explained how the different threshold modes are working. In this third article, I would like to show you what it means when the different modes work together. But, before I start doing some tests to show the behavior, I would like to give you a quick introduction into the toolset I´m using for these tests. Toolset First of all, how to do some floods? Different tools that can be found on the Internet are available for use. Whatever tools you might prefer, just download the tool and run it against your Device Under Test (DUT). If you would like to use my script you can get it from GitHub: https://github.com/sv3n-mu3ll3r/DDoS-Scripts With this script - it uses hping - you can run different type of attacks. Simply start it with: $ ./attack_menu.sh <IP> <PORT> A menu of different attacks will be presented which you can launch against the used IP and port as a parameter. Figure 1:Attack Menu To see what L3/4 DoS events are currently ongoing on your BIG-IP, simply go to the DoS Overview page. Figure 2:DoS Overview page I personally prefer to use the CLI to get the details I´m interested in. This way I don´t need to switch between CLI to launch my attacks and GUI to see the results. For that reason, I created a script which shows me what I am most interested in. Figure 3:DoS stats via CLI You can download that script here: https://github.com/sv3n-mu3ll3r/F5_show_DoS_stats_scripts Simply run it with the “watch” command and the parameter “-c” to get a colored output (-c is only available starting with TMOS version 14.0): What is this script showing you? context_name: This is the context, either PO/VS or the Device in which the vector is running vector_name: This is the name of the DoS vector attack_detected:When it shows “1”, then an attack has been detected, which means the ‘stats_rate’ is above the ‘detection-rate'. stats_rate: Shows you the current incoming pps rate for that vector in this context drops_rate: Shows you the number dropped pps rate in software (not FPGA) for that vector in this context int_drops_rate: Shows you the number dropped pps rate in hardware (FPGA) for that vector in this context ba_stats_rate: Shows you the pps rate for bad actors ba_drops_rate: Shows you the pps rate of dropped ‘bad actors’ in HW and SW bd_stats_rate:Shows you the pps rate for attacked destination bd_drop_rate: Shows you the pps rate for dropped ‘attacked destination’ mitigation_curr: Shows the current mitigation rate (per tmm) for that vector in that context detection: Shows you the current detection rate (per tmm) for that vector in that context wl_count: Shows you the number of whitelist hits for that vector in that context hw_offload: When it shows ‘1’ it means that FPGAs are involved in the mitigation int_dropped_bytes_rate: Gives you the rate of in HW dropped bytes for that vector in that context dropped_bytes_rate: Gives you the rate of in SW dropped bytes for that vector in that context When a line is displayed in green, it means packets hitting that vector. However, no anomaly is detected or anything is mitigated (dropped) via DoS. If a line turns yellow, then an attack - anomaly – has been detected but no packets are dropped via DoS functionalities. When the color turns red, then the system is actually mitigating and dropping packets via DoS functionalities on that vector in that context. Start Testing Before we start doing some tests, let me provide you with a quick overview of my own lab setup. I´m using a DHD (DDoS Hybrid Defender) running on a i5800 box with TMOS version 15.1 My traffic generator sends around 5-6 Gbps legitimate (HTTP and DNS) traffic through the box which is connected in L2 mode (v-wire) to my network. On the “client” side, where my clean traffic generator is located, my attacking clients are located as well by use of my DoS script. On the “server” side, I run a web service and DNS service, which I´m going to attack. Ok, now let’s do some test so see the behavior of the box and double check that we really understand the DDoS mitigation concept of BIG-IP. Y-MAS flood against a protected server Let’s start with a simple Y-MAS (all TCP flags cleared) flood. You can only configure this vector on the device context and only in manual mode. Which is ok, because this TCP packet is not valid and would get drop by the operating system (TMOS) anyway. But, because I want this type of packet get dropped in hardware (FPGA) very early, when they hit the box, mostly without touching the CPU, I set the thresholds to ‘10’ on the Mitigation Threshold EPS and to ‘10’ on Detection Threshold EPS. That means as soon as a TMM sees more then 10 pps for that vector it will give me a log message and also rate-limit this type of packets per TMM to 10 packets per second. That means that everything below that threshold will get to the operating system (TMOS) and get dropped there. Figure 4:Bad TCP Flags vector As soon as I start the attack, which targets the web service (10.103.1.50, port 80) behind the DHD with randomized source IPs. $ /usr/sbin/hping3 --ymas -p 80 10.103.1.50 --flood --rand-source I do get a log messages in /var/log/ltm: Feb5 10:57:52 lon-i5800-1 err tmm3[15367]: 01010252:3: A Enforced Device DOS attack start was detected for vector Bad TCP flags (all cleared), Attack ID 546994598. And, my script shows me the details on that attack in real time (the line is in ‘red’, indicating we are mitigating): Currently 437569 pps are hitting the device. 382 pps are blocked by DDoS in SW (CPU) and 437187 are blocked in HW (FPGA). Figure 5:Mitigating Y-Flood Great, that was easy. :-) Now, let’s do another TCP flood against my web server. RST-flood against a protected server with Fully manual Threshold mode: For this test I´m using the “Fully Manual” mode, which configures the thresholds for the whole service we are protecting with the DoS profile, which is attached to my VS/PO. Figure 6:RST flood with manual configuration My Detection Threshold and my Mitigation Threshold EPS is set to ‘100’. That means as soon as we see more then 100 RST packets hitting the configured VS/PO on my BIG-IP for this web server, the system will start to rate-limit and send a log message. Figure 7:Mitigating RST flood on PO level Perfect. We see the vector in the context of my web server (/Common/PO_10.103.1.50_HTTP) is blocking (rate-limiting) as expected from the configuration. Please ignore the 'IP bad src' which is in detected mode. This is because 'hping' creates randomized IPs and not all of them are valid. RST-flood against a protected server with Fully Automatic Threshold mode: In this test I set the Threshold Mode for the RST vector on the DoS profile which is attached to my web server to ‘Fully Automatic’ and this is what you would most likely do in the real world as well. Figure 8:RST vector configuration with Fully Automatic But, what does this mean for the test now? I run the same flood against the same destination and my script shows me the anomaly on the VS/PO (and on the device context), but it does not mitigate! Why would this happen? Figure 9:RST flood with Fully Automatic configuration When we take a closer look at the screenshot we see that the ‘stats_rate’ shows 730969 pps. The detection rate shows 25. From where is this 25 coming? As we know, when ‘Fully Automatic’ is enabled then the system learns from history. In this case, the history was even lower than 25, but because I set the floor value to 100, the detection rate per TMM is 25 (floor_value / number of TMMs), which in my case is 100/4 = 25 So, we need to recognize, that the ‘stats_rate’ value represents all packets for that vector in that context and the detection value is showing the per TMM value. This value explains us why the system has detected an attack, but why is it not dropping via DoS functionalities? To understand this, we need to remember that the mitigation in ‘Fully Automatic’ mode will ONLY kick in if the incoming rate for that vector is above the detection rate (condition is here now true) AND the stress on the service is too high. But, because BIG-IP is configures as a stateful device, this randomized RST packets will never reach the web service, because they get all dropped latest by the state engine of the BIG-IP. Therefor the service will never have stress caused by this flood.This is one of the nice benefits of having a stateful DoS device. So, the vector on the web server context will not mitigate here, because the web server will not be stressed by this type of TCP attack. This does also explains the Server Stress visualization in the GUI, which didn´t change before and during the attack. Figure 10:DoS Overview in the GUI But, what happens if the attack gets stronger and stronger or the BIG-IP is too busy dealing with all this RST packets? This is when the Device DOS kicks in but only if you have configured it in ‘Fully Automatic’ mode as well. As soon as the BIG-IP receives more RST packets then usually (detection rate) AND the stress (CPU load) on the BIG-IP gets too high, it starts to mitigate on the device context. This is what you can see here: Figure 11:'massive' RST flood with Fully Automatic configuration The flood still goes against the same web server, but the mitigation is done on the device context, because the CPU utilization on the BIG-IP is too high. In the screenshot below you can see that the value for the mitigation (mitigation_curr) is set to 5000 on the device context, which is the same as the detection value. This value resultsfrom the 'floor' value as well. It is the smallest possible value, because the detection and mitigation rate will never be below the 'floor' value. The mitigation rate is calculated dynamically based on the stress of the BIG-IP. In this test I artificially increased the stress on the BIG-IP and therefor the mitigation rate was calculated to the lowest possible number, which is the same as the detection rate. I will provide an explanation of how I did that later. Figure 13:Device context configuration for RST Flood Because this is the device config, the value you enter in the GUI is per TMM and this is reflected on the script output as well. What does this mean for the false-positive rate? First of all, all RST packets not belonging to an existing flow will kicked out by the state engine. At this point we don´t have any false positives. If the attack increases and the CPU can´t handle the number of packets anymore, then the DOS protection on the device context kicks in. With the configuration we have done so far, it will do the rate-limiting on all RST packets hitting the BIG-IP. There is no differentiation anymore between good and bad RST, or if the RST has the destination of server 1 or server 2, and so on. This means that at this point we can face false positives with this configuration. Is this bad? Well, false-positives are always bad, but at this point you can say it´s better to have few false-positives then a service going down or, even more critical, when your DoS device goes down. What can you do to only have false positives on the destination which is under attack? You probably have recognized that you can also enable “Attacked Destination Detection” on the DoS vector, which makes sense on the device context and on a DoS profile which is used on protected object (VS), that covers a whole network. Figure 14:Device context configuration for RST Flood with 'Attacked Destination Detection' enabled If the attack now hits one or multiple IPs in that network, BIG-IP will identify them and will do the rate-limiting only on the destination(s) under attack. Which still means that you could face false positives, but they will be at least only on the IPs under attack. This is what we see here: Figure 15:Device context mitigation on attacked destination(s) The majority of the RST packet drops are done on the “bad destination” (bd_drops_rate), which is the IP under attack. The other option you can also set is “Bad Actor Detection”. When this is enabled the system identifies the source(s) which causes the load and will do the rate limiting for that IP address(es). This usually works very well for amplification attacks, where the attack packets coming from ‘specific’ hosts and are not randomized sources. Figure 16:Device context mitigation on attacked destination(s) and bad actor(s) Here you can see the majority of the mitigation is done on ‘bad actors’. This reduces the chance of false positives massively. Figure 17:Device context configuration for RST Flood with 'Attacked Destination Detection' and 'Bad Actor Detection' enabled You also have multiple additional options to deal with ‘attacked destination’ and ‘bad actors’, but this is something I will cover in another article. Artificial increase BIG-IPs CPU load Before I finish this article, I would like to give you an idea on how you could increase the CPU load of the BIG-IP for your own testing. Because, as we know, with “Fully Automatic” on the device context, the mitigation kicks only in if the packet rate is above the detection rate AND the CPU utilization on the BIG-IP is “too” high. This is sometimes difficult to archive in a lab because you may not be able to send enough packets to stress the CPUs of a HW BIG-IP. In this use-case I use a simple iRule, which I attach to the VS/PO that is under attack. Figure 18:Stress iRule When I start my attack, I send the traffic with a specific TTL for identification. This TTL is configured in my iRule in order to get a CPU intensive string compare function to work on every attack packet. Here is an example for 'hping': $ /usr/sbin/hping3 --rst -p 80 10.103.1.50 --ttl 5 --flood --rand-source This can easily drive the CPU very high and the DDoS rate-limiting kicks in very aggressive. Summary I hope that this article provides you with an even better understanding on what effect the different threshold modes have on the attack mitigation. Of course, keep in mind this are just the ‘static DoS’ vectors. In later articles I will explain also the 'Behavioral DoS' and the Cookie based mitigation, which helps to massively reduce the chance of a false positives. But, also keep in mind, the DoS vectors start to act in microseconds and are very effective for protecting. Thank you, sVen Mueller3.7KViews12likes3CommentsUnderstanding L3/4 and DNS Behavioral DoS Mitigation (BDoS)
Dear Reader, In this article I am writing about another mitigation technology which runs in parallel to the other mitigation methods you can use on a BIG-IP: Dynamic Signatures. These Dynamic Signatures are created by the Behavioral DoS (BDoS) engine. This is a very smart and powerful mechanism to only drop attack traffic without touching legitimate traffic. Introduction We know that the CPU can do smart decisions and is usually able to differentiate between good and bad traffic. Frequently, this is simply accomplished by using the flow table. But, the CPU is limited in performance and this is when the FPGAs (Field Programmable Gate Arrays) come into play. This special purpose chips can handle massive numbers of packets but they are “stupid”. Therefore, the CPU needs to program them with detailed information about which packets should get dropped. The static DoS vectors are relatively simple because they are based on limited attack information such as the protocol, flags, source IPs, destination IPs, and others. But they kick in very fast! This means they (vectors) can program the FPGAs within microseconds and protect the BIG-IP or service. The downside is that the FPGA cannot differentiate if for example a PUSH/ACK packet is valid or one of the attackers because there is not enough information provided. This is where Behavioral DoS can help. How does BDoS work? BDoS is about collecting and working with large amounts of statistical and traffic information. It collects the details that can be obtained from L3/4 and DNS. All this information is called “predicates” and the value of a predicate is called a “bin”. For example, the TTL of a packet is the predicate and it can have a value (bin) from 0-255. By collecting this information all the time, the BDoS engine knows how packets usually look like. It knows the predicates and the bin values together with its rates. This provides a pretty good view on how traffic usually looks like and what the rates are for all this information. When an attack happens, the BDoS engine will be able to identify the ‘abnormal’ bins of the predicates and will create a dynamic signature with all these suspicious predicates. This will give a more precise description of the attack traffic then just saying rate limit “TCP PSH/ACK” packets. Figure 1: Statistical side model A simple example This example is just to explain the principle of BDoS and it uses a very limited set of predicates. Imagine, based on historical information, the system knows that right at this time we usually see 50k PSH/ACK packets on the BIG-IP and all these packets have a TTL between 50 and 60. Further, the load on the BIG-IP is low and the numbers or packets as well as the associated TTL is within the expected norm. Suddenly we see on top of this “normal” traffic 10Mpps of PSH/ACK packets with TTL of 30 and the load of the BIG-IP gets too high. BDoS needs to create a signature describing just the attack traffic to program it into the FPGA and lower the rate of packets going into the CPU. Which PSH/ACK packets should be eliminated? It’s easy: the ones which are not fitting into the “normal”. Which in this case are the PSH/ACK packets with a TTL of 30. Again, this is a very simplified and limited example because it only looks at the predicates protocol (TCP), flags (PSH/ACK), and TTL. In reality, it would of course require way more predicates to create a more specific signature. Example of dynamic signatures Figure 2: Example of a DNS dynamic signature Figure 3: Example of a L3 dynamic signature Behavioral DoS can also be used on HTTP and TLS traffic. But this is done via another Behavioral DoS Engine (BaDOS) which comes from Advanced WAF (A-WAF) and is also included in DDoS Hybrid Defender (DHD). I will discuss BaDOS in another article. How to configure BDoS BDoS can be used on the Device level of BIG-IP to protect the resources of the BIG-IP itself and also on a per Service level via a DoS profile attached to a PO/VS. You can enable BDoS by configuring it on the Network and DNS layer. You simply need to click on the small pencil right of the relevant layer. Figure 4: Network family layers For BDoS you can choose between different Mitigation Modes: “Stress Based Mitigation” and “Manual Multiplier”. These modes are discussed in my article Explanation of F5 DDoS threshold modes. Figure 5: Network properties If you want to run BDoS in “detect only mode”, then you need to set the Mitigation Sensitivity to “None”. This is different than on the static vectors, where you set the state of the vector to “Detect Only”. The same is true for the DoS profile where you can also enable BDoS for the Network and DNS layer and BaDOS for HTTP(S). Here is a useful hint! When you are under attack and the BIG-IP has generate a dynamic attack signature and mitigates based on stress, then the mitigation threshold gets continuously recalculated based on the health of the protected context (device or service). It will always let “some” traffic go through based on the sensitivity level you have configured. But, you have the ability to change the threshold mode from “Fully Automatic” to “Fully Manual” on the specific signatures and set the Mitigation threshold for example to “0”, which will then block the whole attack traffic via the BDOS signature. Figure 6: Dynamic signature with manual threshold Custom Signatures If you are aware of specific attack patterns or you simply want to have HW supported FW rules, then you can create your own signatures. Figure 7: Add manual a signature Under DoS Configuration > Signatures you can choose “Add Signature” in the “Persistent” tab. Figure 8: add predicates to a manual signature A simple example could be an NTP flood signatures, which kicks in if too many NTP packets hit the box. Figure 9: manual NTP predicates After you have created your custom signature you can configure the thresholds and mitigation state in the DoS profiles or on the Device DoS configuration. Figure 10: manual threshold configuration of a manual NTP signature To exclusively handle NTP traffic via this signature and not by the UDP vector, I recommend excluding port 123 from the UDP vector, like you should do with DNS traffic. Figure 12: Exclude port 123 from UDP vector Multilayer DDoS protection Beginning with TMOS version 15, BDoS runs in the Hud-chain before the static DoS vectors. That means that when there is an attack, the static DoS vectors will kick in immediately and protect the service/network behind the BIG-IP and also the BIG-IP itself. Within the next seconds BDoS analyses the traffic and will generate a specific dynamic signature, which describes the attack traffic and will then rate-limit it. Below you can see that F5 DDoS uses a multilayer DDoS approach. When we just look into L3/4 and what I have so far discussed in my articles, then we have covered the stateful engine with its PVA accelerated connection handling, protected by static DoS vectors which kick in in microseconds and then taken over by the granular behavioral DDoS (BDOS) within seconds. Figure 12: Multi-layer DDoS mitigation architecture From the figure you can see that there are even more protection mechanisms, which I will explain in my next articles. Thank you, Sven Mueller2.9KViews7likes3CommentsCookie-based DDoS protection
Dear Reader, DDoS protection starts with being able to identify that you are under attack. Next, you will need to be able to differentiate between good and bad traffic. This will be particularly challenging if the attack (bad) traffic looks like your good traffic. So, how can you ideally filter out only the attack traffic without affecting the good traffic? Well, and finally, you need to have the capacity and performance to deal with the high number of flood packets. In this article, I want to explain to you another layer of F5’s network DDoS protection, which perfectly meets these requirements: cookie-based DDoS protection. Figure 1: Multi-layer DDoS mitigation architecture Cookie-based protection No, I´m not talking about HTTP, we are still within the Network protection. Even on the network layer, BIG-IP can deal with cookies and can use it in a very effective way to mitigate without false/positives. The overall idea is to store connection- related information, encrypted within a packet, which is being forwarded. The response packet will have this cookie included as well and can be used to identify that the packet belongs to an existing connection. Therefore the packet is valid and not just a randomized packet to drive up CPU utilization. When the packet is “wrong” it can be dropped at an early stage by a dedicated chip (FPGA) and doesn´t consume CPU resources. But let´s start talking about attack vectors. SYN flood SYN flood is a very old, but still very effective way for DDoS attacks. The idea is to overwhelm stateful devices with too many half-open connections, which mostly results in (too) high CPU usage and/or fully utilized memory consumption. Attackers just send SYN packets towards the target (server, firewall, etc.). The first packet during a 3-way handshake of a connection setup will let the target immediately allocate memory for the TCP socket data structure holding the tuple, sequence number, and “state” of the session. The target then replies to the client (attacker) with an “SYN-ACK” packet. Under normal circumstances, packet loss or slow communication can cause the SYN-ACK packet to be delayed or lost altogether. The server must decide how long to hold the socket in a half-open state before transmitting a reset and releasing the memory for other tasks. This means the higher this timer is, the faster memory can get filled up under SYN flood conditions. But, on the other hand, the lower the value is, the higher the risk that you drop a valid connection request (for example, because of latency). SYN flood attackers usually spoof the source IP, so the SYN-ACK does not hit the initial source (attacker). This way the SYN flood can also result in an SYN-ACK flood done by the target (server, firewall, etc.), which will be especially painful because the spoof involves only one or a small number of IP addresses. This can be easily identified on the mitigation side and can get blocked or rate-limited. Keep in mind, however, that this could also be a strategy of the attacker… Some interesting side effects are also, that depending on the TCP stack, servers send a RST packet when they release a timed-out connection. Some TCP stacks do also send a RST packet when they receive an SYN-ACK packet for which they don´t have a connection entry. Therefore, often SYN floods do also end in SYN-ACK and RST floods as well. Figure 2: SYN flood But, how can you protect against this type of attack? SYN rate-limiting via static DoS vectors is one option, but it can potentially lead to false/positives. This is true in particular when you cannot limit the mitigation to “attacked destination” and/or “bad actors” (source IPs causing the stress). For further information, please read my article “Increasing accuracy using Bad Actor and Attacked Destination detection“. Behavioral DoS (BDoS) is a very effective way but usually takes a few seconds to kick in. Another strong option is to use F5’s SYN-Cookie mitigation. SYN-Cookie mitigation SYN-Cookie mitigation is an effective way to resist SYN floods. It gets activated when the threshold of the configured number of half-open connections is reached. On AFM (Advanced Firewall Manager) or DHD (DDoS Hybrid Defender), the threshold can be configured via the “tcp-half-open” vector on the device level and also on the per VS/PO level via a DoS profile attached to it. Figure 3: TCP Half Open vector configuration Mitigation threshold means, activate the SYN-cookie process after (in this example) 10.000 half-open connections. Keep in mind, on the Device level you configure the threshold per TMM and within a DoS profile it is per PO/VS to which it is attached. SYN-Cookie mitigation gets deployed by choosing initial TCP sequence numbers in the SYN-ACK response according to the properties of the originating TCP SYN. It encrypts connection information within the sequence number of the responding SYN-ACK packet. When the SYN was sent by a valid client, BIG-IP will receive a well-formed ACK response from the client including the cookie stored within the "ack" field and incremented by one. BIG-IP can now decrypt and reconstruct the connection details and establish the connection by performing the necessary SYN establishment handshakes with the server on behalf of the client. The beauty of this technique is that it smartly protects the memory, but the process of “encrypt” and “decrypt” the cookies consumes CPU, which can be potentially dangerous. To enhance the capability of the BIG-IP systems, SYN cookie functionality is extended to the FPGA, which is radically more effective than software mitigation alone. In this way, BIG-IP can differentiate between “valid” and “bad” SYN packets without negatively impacting memory or CPU performance! Here is a useful hint! As mentioned above, you can configure the activation of the SYN cookie process by defining a threshold of the number of half-open connections. This can be done via the “tcp-half-open” vector on the Device level and also via the DoS profile for a VS/PO. This is my recommended approach. Keep the DoS config in one place and therefore disable the second option (via traditional TMOS configuration). This can be done under System/Configuration: Local Traffic: General, by setting the “Default Per Virtual Server SYN Check Threshold” and “Global SYN Check Threshold” to “0”. Also, disable “Hardware VLAN SYN Cookie Protection”. Figure 4: LTM SYN-Cookie configuration Now it’s all controlled by the “tcp-half-open” vector, which makes it less confusing. Please also keep in mind, SYN Cookie does only work in FPGA when “Auto Last Hop” is enabled! What do the stats of "tcp-half open" mean in the context of SYN-Cookie protection? When we take a look into the stats via the script I provided in my article “Demonstration of Device DoS and Per-Service DoS protection” during an SYN flood attack, we see a rate for int_drops. This only appears when the mitigation threshold (number of half-open connections) is reached. Figure 5: TCP Half Open vector shown in dos_stats output The “stats_rate” shows the number of half-open connections in the last second and the “drop_rate” or “int_drop_rate” = (the number of cookies sent) – (number of valid ACKs received). SYN-ACK flood Let’s talk about another attack vector: SYN-ACK flood. Again, this attack can be done in different ways. The attackers simply send a high rate of randomized SYN-ACK packets towards the victim. Or it is a result of an SYN flood with responding SYN-ACK packets towards a spoofed source IP of the SYN packet. Anyhow, every high packet rate can cause massive damage to the performance of Firewalls, Servers, or other devices and this is what DoS attackers usually try to archive with this type of attack. And again, the question is: how to differentiate between “good” and “bad” SYN/ACK packets? Asking the connection table would increase the CPU and is therefore not the best choice. So, ideally, it is done upfront (CPU) within the hardware chip (FPGA) to be high performant. SYN-ACK flood mitigation By enabling “Only Count Suspicious Events” on the “TCP SYN ACK Flood” vector on the Device level you do enable a cookie mechanism that allows the BIG-IP to differentiate if an SYN-ACK packet belongs to an SYN which has passed the BIG-IP before. Figure 6: TCP SYN ACK Flood vector configuration This mechanism is processed within the FPGA and does therefore not consume CPU. This enables the BIG-IP to be extremely resistant against this type of flood. Please keep in mind, this cookie function was added to the product with version 15.1. Before 15.1, you could also enable “Only Count Suspicious Events”, but it puts the vector into software mode. That means it can differentiate between “good” and “bad” SYN-ACK packets, but the mitigation is done within the CPU. How does SYN-ACK cookie mitigation work When the mechanism is enabled and an SYN packet received on a BIG-IP and is being forwarded, then the initial SNI gets replaced by an own SNI which includes properties of the originating TCP SYN in an encrypted format. When BIG-IP now receives SYN-ACK packets, it expects the initial SNI (cookie) with the ACK field to be incremented by 1. BIG-IP can now decrement and decrypt that value to check if the “cookie properties” fit the properties of the initial SYN packet. If that is the case, then the "ack" field gets replaced with the incremented SNI of the initial SYN packet and it will then be returned to the original client. If the cookie check fails, then it will be counted as a bad SYN-ACK and when the mitigation threshold on the vector is reached the rate-limiting for bad SYN-ACK packets gets activates in the hardware (FPGA). Figure 7: TCP SYN ACK cookie mitigation ACK flood and mitigation Let’s get to a third attack vector: ACK flood. This one is also widely used by attackers and can be very damaging. We already know it’s about the differentiation between valid and bad ACK packets plus being able to use the hardware (FPGA) for mitigation. Here again, the overall idea is to encrypt connection properties within a cookie and store them in the TCP packet. Valid response packets do have this cookie included and can be validated. When the number of invalid ACK packets reaches the mitigation threshold, then the rate-limiting will start. This cookie protection mechanism can be enabled per VLAN on the “bad_ack” vector on the Device configuration. Figure 8: TCP BADACK flood vector (cookie) configuration Simply move the VLANs you want to have this feature being enabled to the “Selected” area. Conclusion TCP floods can be easily used to bring down even strong firewalls, servers, or other security devices. Generally, it is not the volume, in terms of bandwidth, that is at issue here. It is the high packet rate, which in particular puts stateful devices under pressure if they don´t have dedicated hardware (FPGA) supported defense mechanism. With the cookie-based detection and mitigation defense technic, F5 delivers another strong layer within the layered network DDoS architecture. Even the whole cookie process is done within FPGAs, these chips can differentiate between valid and bad TCP packets without checking the connection state table. SYN-ACK and ACK cookies, which got introduced into the product with version 15.1, are being processed even before the Behavioral DDoS layer. SYN cookies are done after the static vectors. One of my next articles will provide an overview of best practices and an explanation on how to integrate DHD (DDoS Hybrid Defender) or AFM (Advanced Firewall Manager) into the Network for DDoS protection. But, next, I will explain the latest part of the Network protection layers, which is the first layer from the data path order: IP-Intelligence/ IP shunning and also RTBH (Remote triggered Black Hole Routing) and Flowspec. I hope this article was useful to you and keep in touch! Thank you, Sven Mueller4.5KViews6likes1CommentIP-Intelligence and IP-Shunning
Dear Reader, With this article we have arrived at the “last” layer of protection, which is actually the first one from the hud-chain point of view: Blocking on the IP layer. The idea is rather simple: When you know that an IP address (or multiple IP addresses) is doing bad things, you block them as early as possible at the speed of line. IP-Intelligence Policy You can create IP-Intelligence (IPI) policies, which describe what should happen (Drop or Allow) with IPs (source or destination) belonging to a specific category. BIG-IP already has a bunch of pre-defined categories, but you can also add your own. Figure 1: Example on an IPI-Policy Important: A great benefit of using this, is that the “Drop” action is done on the FPGA level, which means in hardware. Basically, as soon as a packet hits the BIG-IP. But this is only possible when you set the logging to “No” (no logging, only dropping) or “Limited”. “Limited” means that some packets get leaked into software (CPU) for dropping them there and then they can be logged here as well. By default, every 256th packet goes into software when logging is set to “limited”. This is controlled by the sys db variable “dos.blleaklimit”. You can modify it to values like 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, and so on. The default value is 255. When you set the logging to “Yes”, then all packets will get logged, because the action (Drop) is done always in software. My recommendation is to use “Limited” logging to enable FPGA support for this feature. This is also what we call “shunning” when we drop on FPGA level. You can create multiple policies and assign them to different contexts: VS/PO or Route Domains to have individual configurations for your protected objects (services, networks, etc.) and you can also assign an IPI policy to the global level of BIG-IP, which means this will be executed first and for the whole box. Even when you want to block via IPI only on VS/PO level, my recommendation is to have a global IPI policy attached. It can be empty. Figure 2: Global Policy assignment Figure 3: Per VS/PO Policy assignment Figure 4: Per Route Domain Policy assignment Order of Mitigation Mitigation using the hardware (FPGA) is always done first. Next, traffic goes into software and can be dropped here as well. For IPI the following order takes place. Note that IPI mitigation done on Route Domain level is always in software. Figure 5: Mitigation Order How to get information about “bad” IPs? There are multiple options available. One is to use an external service (3 rd party), which does the categorization of IPs based on their reputation. If you have the IP Intelligence feature licensed, the information is being updated every 5 minutes. Another one is to use BIG-IP to automatically identify “bad actors” via DoS vectors. In my article Increasing accuracy using Bad Actor and Attacked Destination detection, I described how that works and for what it is useful for. But when BIG-IP identifies “bad actors” and this source IP(s) are behaving aggressive for a configurable amount of time, then it may make sense to drop everything coming from this IP(s) for a specific time interval. Figure 6: Bad Actor Detection In this example, the IP(s) which are identified as bad actors and behaving “bad” for 30 seconds get added to the category “denial_of_service” for 600 seconds. Once the “duration time” is over, they get released from the category and the process starts from the beginning, if they are still “bad”. Drop Attacked Destination Another use case to carefully consider is to drop based on attacked destination. Here you can drop all traffic going to a specific destination which is under attack. True, you stop the attack hitting the target, but you also drop all legitimate traffic going there, which would be a great success for the attacker. But, of course, if for whatever reason the attack would potential bring the BIG-IP to its knees this is an option you can choose. You would loose the destination that is under brutal attack but at least you safe all other services on the BIG-IP from being impacted. Figure 7: Attacked Destination Detection Use your up-stream router Think of the following scenario: An attack is about to saturate your Internet-pipe. You mitigate the bad actors on BIG-IP, but your pipe is still full. Why not “signal” the information of bad actors to your upstream router and drop the attackers already there, even before your pipe gets saturated? This is what you can do with using Remote Triggered Black Hole routing (RTBH) or Flowspec. By enabling “Allow External Advertisement” within your “Bad Actor Detection” or “Attacked Destination Detection” you can enable that functionality. Figure 8: External Advertisement I will cover the topic on how to use the Blacklist Publisher to “signal” bad actors including an action (Drop, Rate-limit, DSCP) to an upstream router via RTBH or Flowspec in a dedicated article. I will also explain in the same article how to use the “Scrubbing profile” to execute an action (redirect, rate-limit, DSCP, drop) on networks, PO/VS when these objects reach a certain bandwidth utilization. More options to add bad actors into categories Returning to how you can add IP(s) into categories. In addition to the already discussed options of using F5 subscription service and DoS vectors you can also add manually IPs. You may have your own sources and want to add this information into the categories. Here you have multiple options again. GUI On the Blacklist Categories page, you can manually add/delete IPs for a selected category. Figure 9: Add IP/Geo/FQDN to categories Feed List You can create your own Feed List using HTTP(s) or FTP to regularly download a file and update your categories. Figure 10: External Feed List Don´t forget to then enable your Feed List within your IPI-Policy! Figure 11: Enable Feed Lists TMSH Using tmsh you can also add IPs to categories. Here is an example: $ tmsh run security ip-intelligence category name denial_of_service ip-ttl add { 123.123.123.123,300 } Rest-API Another way is to use Rest-API. Here is an example: $ curl -sk -u sVen:mySecrectPW -H "Content-Type: application/json" -X POST -d '{"command":"run","utilCmdArgs":"name denial_of_service ip-ttl add {123.123.123.123,300}"}' https://mgmnt-address-big-ip/mgmt/tm/security/ip-intelligence/category/ | python -m json.tool Whitelist IPs for IPI/shunning If you want to whitelist IP(s) from not being shunned on IPI level, you need to add this IP(s) to the “whitelist” category. Please we aware that this only works when you use an external feed to add the IP(s). Figure 12: Feed List Properties More useful CLI commands To get an idea on hits per category, you can use a “tmctl” command: # tmctl -w 180 ip_intelligence_stat It gives you an overview of hits per context, category and drop list type. Figure 13: Stats for categories Here you can also see that you can not only drop based on IP. You can also use GEO information and FQDN. If you want to verify the status of an IP, you can use this command: # tmsh show security ip-intelligence info address <ip-address> Figure 13: Example of status of a GEOIP Conclusion IPI/shunning is a very effective and hardware driven way of blocking traffic based on IP information. Especially on amplification and reflection attacks it is a very useful because you are mostly able to identify IP addresses causing the high load. Often operators do also block traffic from countries on which they are sure they do not communicate with for specific services. After finishing the explanation of this last (first) mitigation layer, you hopefully have a better understanding on how F5 L3/4/DNS DDoS mitigation works. Hardware (FPGA) support is an essential element of successful blocking high packet rates. These high-performance chips get dynamically programmed via the “smart” component, the software which runs on CPU, and which also needs to be protected. Therefore, you can configure the DoS mitigation mechanism for protecting your services (server, network, …) behind the BIG-IP, but also BIG-IP itself! Figure 14: Multi-layer DDoS mitigation architecture F5 also provides the ability to protect with hardware (FPGA) support in virtualized environments. Since more and more data center go virtual, the requirement of running DDoS mitigation in virtualized environments has increased. But, because FPGA support is needed for this type of DDoS mitigation, since CPU alone is usually not powerful enough, it can present a problem. Together with Intel F5 provides a strong solution. This solution uses SmartNICs (network card with FPGA on board) within your Hypervisor. DHD (DDoS Hybrid Defender) and AFM (Advanced Firewall Manager) can program these FPGAs in basically the same way as it can be done on dedicated Hardware appliances. This again gives you the power needed for DDoS mitigation, plus the flexibility of virtualized environments. If you are interested in learning more about this, I would like to recommend the article of my colleague Ryan Howard: https://devcentral.f5.com/s/articles/Mitigating-40Gbps-DDoS-Attacks-with-the-new-BIG-IP-VE-for-SmartNICs-Solution My next article will focus on logging and reporting DDoS events. Together, my colleague Mohamed Shaath and I created a DDoS Dashboard including DDoS statistics visibility on all vectors based on an ELK stack. We will share in that article how to install and use it. Thank you, Sven Mueller3.5KViews6likes4CommentsHow BIG-IP can help secure your network against malware (Maze and Cloud Snooper)
In the F5 SIRT and F5 Support, we are often asked how F5 products can be used to defend against threats like ransomware, malware and rootkits. Threats like these can enter a target network through a multitude of vectors specific to the target endpoints – email spam campaigns, phishing and spear-phishing attacks, drive-by download vulnerabilities – but they can also enter the network through direct compromise of internet facing hosts. It’s tempting to look at F5 products and think they aren’t designed to detect a malicious binary being injected into your network, and while (with the exception of ICAP virus scanning of files uploaded to a webserver) that might be true, it is definitely not true to think that F5 products can’t be used as part of a defence-in-depth strategy to protect yourself from both compromise and post compromise exploitation. So let’s talk about a few ways you can bolster your network security using F5 products… The BIG-IP itself The first step is to understand that the BIG-IP is usually found sitting at the border of your network, therefore it is crucially important to protect the BIG-IP from compromise. If the BIG-IP is compromised then it doesn’t matter how good your Advanced WAF policies are or your AFM rulesets, the attacker has a neat jump-box which they can use to pivot into the rest of your network. Fortunately, protecting yourself against exploit here is as simple as following a few basic principles: Don’t expose the management interface to the Internet (if at all possible) Ensure Self IP addresses have appropriate Port Lockdown settings Use strong passwords and definitely don’t leave the passwords at their defaults (fortunately, changing the passwords at installation is now mandated in recent versions) Monitor log files for suspicious activity (like unexpected logins) If you absolutely must expose the management interface to the internet, be sure to set appropriate ACLs to restrict access to specific IP ranges. All of which is documented in our Ask F5 article K13092 Protecting your (other) assets Now that you’ve got the BIG-IP sorted, you can turn your attention to how the BIG-IP(s) in your infrastructure can help protect your internal assets. As we discussed earlier malware can enter your network in a number of different ways, and a BIG-IP can help secure a number of those: Ensuring that your assets can only be accessed via the BIG-IP on ports you control immediately ensures you aren’t accidentally exposing SSH on your webserver, ruling out the possibility of an attacker simply brute-forcing credentials and uploading malware directly. The BIG-IP is (usually!) a full proxy right down to Layer 4, so any exploits or C2 channels that rely on quirks of TCP or utilise unusual header fields to transfer data will fail when the BIG-IP proxies and applies its own optimisations to the server-side flows Placing Advanced WAF in front of your internet-facing web services, configured with an appropriate policy, significantly reduces the possibility of an attacker exploiting a weakness in your web applications to upload a malicious payload (such as SQL Injection or shell command injection) The AFM IPS (in AFM 13.1.0 and later with an IPS subscription license) allows you to scan non-HTTP traffic for potentially malicious activity and take action BIG-IP APM allows you to secure access to potentially vulnerable protocols like RDP, better still, APM layered with Advanced WAF allows you to protect your VPN and secure endpoints against brute force attacks Of course, it is true to say that BIG-IP products are not best placed to stop a user from clicking on a malicious link in an email, but the points above at least help close some of the vectors that malware can enter your network! But, don’t stop there. I’m sure that as part of your day to day, you are on the look-out for IOCs to watch for that might indicate a problem within your network? If you are inspecting your outbound traffic (with BIG-IP AFM or SSLO) then it’s a snap to add rules to those products that will alert you and/or block outbound C2 communication attempts from compromised clients! A practical example – Cloud Snooper Cloud Snooper first appeared in the news in February 2020 billed as “malware that sneaks into your Linux servers”. Reading the Sophos white-paper (links to a non-F5 resource) it becomes clear that the researchers aren’t sure what the original infection vector is here; they know that the key marker is a specific rootkit present on the hosts, but not how that rootkit got there in the first place. Since there isn’t a specific initial infection vector, like most malware, we know that we need to think about everything from the previous two sections: Protect the BIG-IP by ensuring access to the management interface and control-plane is not exposed to attackers Pass inbound traffic through inspection appropriate to the protocols the servers are exposing to the internet (Advanced WAF, AFM) to reduce the potential for malware to be injected into the network through compromise of an internet-facing application The white-paper tells us that the rootkit dropped initially is a local command-and-control system rather than the ultimate exploit; its job is to receive incoming C2 messages and arbitrate between those and malware subsequently installed. It does that by sniffing all of the network traffic arriving at the compromised host and inspecting the TCP source port of every incoming packet. If the source port of the packet matches one of the ‘magic’ numbers hard-coded in the rootkit then it is treated as a command, otherwise it is simply ignored, and the host processes it as normal. So we've a third item to add to our list above: Pass inbound traffic through the BIG-IP in order to block incoming C2 messages destined for the rootkit on any potentially compromised or at-risk host Good news! With even the default configuration in place there’s a good chance that the BIG-IP won’t preserve the TCP source port between the client side and the server side, especially on a system passing a reasonable amount of traffic. But you can go one step further – simply change the “Source Port” option on the Virtual Server from “Preserve” to “Change” and the BIG-IP will always try to change the source port! The C2 messages might arrive at the Virtual Server, but by the time they leave and head to the pool member they’ll be on a new port and the rootkit will completely ignore them. Want to go one step further and ensure that C2 traffic never reaches the server at all? BIG-IP AFM has you covered; just create a Network Firewall rule to deny all traffic with a source port of 1010, 2020, 6060, 7070, 8080 or 9999, assign the rule to a Rule List, assign the Rule List to a Network Firewall Policy and then assign that Network Firewall policy to any Virtual Servers you’re concerned about or even to all traffic passing through the BIG-IP. As the original white-paper notes, regular clients don’t normally use source ports below 32768 so legitimate traffic shouldn’t be affected – but simply by assigning a logging profile with Network Firewall logging enabled will let you see what’s being denied by the rule. See the following chapters in the manual (BIG-IP Network Firewall: Policies and Implementations) for your version for more information: Configuring BIG-IP Network Firewall Policies Local Logging with the Network Firewall If you don’t have AFM, we’ve still got you covered. A simple iRule attached to a Virtual Server provides a simple way to block incoming Cloud Snooper C2 communications: when CLIENT_ACCEPTED { switch -- [TCP::client_port] { 1010 - 2020 - 6060 - 7070 - 8080 - 9999 { # Uncomment the following to log locally, which should be used for debugging purposes only # log local0. "Reject possible Cloud Snooper C2 message from [IP::client_addr] with source port [TCP::client_port]" reject } default { # Not a Cloud Snooper C2 connection, continue as normal } } } Of course, an iRule doesn’t come with the many advantages of AFM's Network Firewall Policies – it’s more computationally expensive, there isn’t simple access to remote logging compatible with leading SIEM systems, and it isn’t as easily administered if needs change. But, still, it’s there and it’s an excellent option – the F5 SIRT makes regular use of iRules in order to stop the bleeding and get customers back up and running in a pinch! Cloud Snooper Conclusion In brief, to protect against Cloud Snooper: Protect the BIG-IP by ensuring access to the management interface and control-plane is not exposed to attackers Pass inbound traffic through inspection appropriate to the protocols the servers are exposing to the internet (Advanced WAF, AFM) to reduce the potential for malware to be injected into the network through compromise of an internet-facing application Pass inbound traffic through the BIG-IP in order to block incoming C2 messages destined for the rootkit on any potentially compromised or at-risk host, setting the "Source Port" option to "Change" on the appropriate Virtual Server or use a Network Firewall ruleset or iRule to block incoming connections with a TCP source port of 1010, 2020, 6060, 7070, 8080 or 9999 Example 2 – Maze The Maze ransomware has been in the news regularly since January 2020 and shares a lot of traits with previously identified ransomware – infection is likely to be through vectors such as phishing, brute force access to network infrastructure like RDP or compromising internet-facing hosts. One thing that is new (but not unique) with Maze is what it does with the data once it’s in; it doesn’t just encrypt files, it syphons them off, so the target is not only being coerced into paying to regain access to their data, but also to avoid their data being exposed on the internet. So, everything we discussed in the previous sections applies equally well to Maze as Cloud Snooper or any other malware infection, with the only difference being we aren't looking for inbound C2 messages but rather outbound data exfiltration: Protect the BIG-IP by ensuring access to the management interface and control-plane is not exposed to attackers Protect servers by placing them behind the BIG-IP and passing outbound traffic through inspection (AFM, SSLO) Pass inbound traffic through inspection appropriate to the protocols the servers are exposing to the internet (Advanced WAF, AFM) to reduce the potential for malware to be injected into the network through compromise of an internet-facing application It’s currently thought that Maze exfiltrates this data en-masse via outbound FTP connections, so protecting against that threat using F5 products is as simple as passing your outbound traffic through a Virtual Server and using Network Firewall policies or iRules to block all outbound FTP or to block or alert on outbound traffic destined to one of the published IOC IP addresses for Maze. Maze Conclusion In brief, to protect against Maze: Protect the BIG-IP by ensuring access to the management interface and control-plane is not exposed to attackers Protect servers by placing them behind the BIG-IP and passing outbound traffic through inspection (AFM, SSLO), blocking outbound FTP connections from any sensitive hosts to prevent Maze from exfiltrating data Pass inbound traffic through inspection appropriate to the protocols the servers are exposing to the internet (Advanced WAF, AFM) to reduce the potential for malware to be injected into the network through compromise of an internet-facing application Conclusion Admit it, you skipped right down to this section, didn’t you? It’s OK – we are all busy people, I understand! Here’s my five-bullet BIG-IP takeaway: Ensure you secure your BIG-IP first; failure to do so renders anything else you do moot. Follow the steps in K13092! Simply placing your servers behind a BIG-IP makes many attacks significantly harder to pull off, by virtue of the BIG-IPs full-proxy architecture If it serves HTTP, put Advanced WAF in front of it Put AFM in front of everything, including your internal clients accessing the internet! If you can’t use one of the products, iRules are your friend and can help you stop inbound and outbound threats And, finally, my general advice five-bullet takeaway – things that apply equally to F5 products and general-purpose computing: Never expose the management interface of anything directly to the internet; if you must, use ACLs to restrict access Visibility is king; you need visibility of traffic in and out of your network to watch for IOCs. If you can’t use F5 products (Advanced WAF, AFM and SSLO have excellent reporting capabilities) then use something and pipe all that data into an SIEM Patch in a timely manner Don’t use weak, insecure or previously leaked passwords Did I mention never exposing the management interface to the internet yet? Those ten things, the first five of which are covered in more detail in the article above, will go a long way to ensuring the security of your network and your assets. Of course, the BIG-IP and good security practices can't obviate the need for good endpoint security, MFA for your users and so on, but you'll at least have squashed a significant amount of attack surface.880Views5likes1Comment2. SYN Cookie: Operation
Introduction As concluded in the last article, in order to avoid allocating space for TCB, the attacked device needs to reject TCP SYN packets sent by clients. In this article I will explain how a system can do this without causing service disruption, and more specifically I will explain how this work in BIG-IP. Implementation Since attacked device need to alter default TCP 3WHS behaviour the best option is implementing SYN Cookie countermeasure in an intermediate device, so you offload your servers from this task. In this way if connection is not legitimate then it is just never forwarded to the backend server and therefore it will not waste any kind of extra resourcess. Since BIG-IP is already in charge of handling application traffic directed towards servers it is the best place to implement SYN Cookie. What BIG-IP does is adding an extra layer, which we can call SYN Cookie Agent, that basically implement a stateless buffer between client and BIG-IP TCP stack. This agent is in charge of handling client’s TCP SYN packets by modifying slightly the standard behaviour of TCP 3WHS, this modification comes in two flavours depending on what type of routing role BIG-IP is playing between client and server. Symmetric routing This is the typical situation. In this environment BIG-IP is sited in the middle of the TCP conversation and all the traffic from/to server pass through it. The SYN Cookie operation in this case is briefly described below: Client sends a TCP SYN packet to BIG-IP. BIG-IP uses a stateless buffer for answering each client SYN with a SYN/ACK. BIG-IP generates the 32 bits sequence number which will be included in SYN/ACK packet sent to the client. BIG-IPencodes essential and mandatory information of the connection in 24 bits. BIG-IP hashes the previous encoded information. BIG-IP also encodes other values like MSS in the remaining bits. BIG-IP generates the SYN/ACK response packet and includes the hash and other encoded information as sequence number for the packet. This is the so calledSYN Cookie. BIG-IP sends SYN/ACK to client. BIG-IP discards the SYN from the stateless buffer, in other words, BIG-IP removes all information related to this TCP connection. At this point no memory has been allocated for TCP connection (TCB). Client sends correct ACK to BIG-IP acknowledging BIG-IP’s sequence number. BIG-IP validates ACK. BIG-IP subtracts one to this ACK. BIG-IP runs the hash function using connection information as input (see point 3-b above). Then it compares it with the hash provided in the ACK, if they match then it means client sent a correct SYN Cookie response and that client is legitimate. BIG-IP uses connection information in the ACK TCP/IP headers to create an entry in connflow. At this time is when BIG-IP builds TCB entry, so it's the first time BIG-IP uses memory to save connection information. BIG-IP increase related SYN Cookie stats. BIG-IP starts and complete a TCP 3WHS with backend server on behalf of the client since we are sure that client is legitimate. Now traffic for the connection is proxied by BIG-IP as usual attending to configured L4 profile in the virtual server. *If ACK packet received by BIG-IP is a spurious ACK then BIG-IP will discard it and no entry will be created in connection table. Since attackers will never send a correct response to a SYN/ACK then you can be sure that TCB entries are never created for them. Only legitimate clients will use BIG-IP resources as shown in diagram: Fig4. TCP SYN Flood attack with SYN Cookie countermeasure Asymmetric routing In an asymmetric environment, also called nPath or DSR, you face a different problem because Big-IP cannot establish a direct TCP 3WHS with the server (point 11 in last section). As you can see in below diagram SYN/ACK packet sent from server to client would not traverse Big-IP, so method used for symmetric routing cannot be used in this case. You need another way to trust in clients and discard them as attackers, so clients then can complete TCP 3WHS directly with the server. Fig5. TCP state diagram section for asymmetric routing + FastL4 In order to circumvent this problem BIG-IP takes the advantage of the fact that applications will try to start a second TCP connection if a first TCP 3WHS is RST by the server. What BIG-IP does in asymmetric routing environments is completing the TCP three way handshake with client, issuing SYN Cookie as I described for symmetric routing, and once BIG-IP confirms that client is trustworthy it will add its IP to SYN Cookie Whitelist, it will send a RST to the client and it will close the TCP connection. At this point client will try to start a new connection and this time BIG-IP will let the client talk directly to the server as it would do under normal circumstances (see diagram above). The process is briefly described below: Client sends a TCP SYN packet to BIG-IP. BIG-IP generates the sequence number as explained for symmetric routing. BIG-IP generates the SYN/ACK response packet and includes the calculated sequence number for the packet. BIG-IP sends SYN/ACK to client and then remove all information related to this TCP connection. At this point no memory has been allocated for TCP connection (TCB). Client sends correct ACK to BIG-IP acknowledging sequence number. BIG-IP validates ACK. BIG-IP substract one to this ACK so it can decode needed information and check hash. BIG-IP increase related SYN Cookie stats. BIG-IP sends a RST to client and discards TCP connection. Big-IP adds client’s IP to Whitelist. Clients starts a new TCP connection. BIG-IP lets client to start this new TCP connection directly to the server since it knows that server is legitimate. *If ACK packet received by BIG-IP is a spurious ACK then BIG-IP will discard it and no entry will be created in connection table. Fig6. TCP 3WHS flow in asymmetric routing DSR whitelist has some important characteristics you need to take into account: By default IP is added 30 seconds to the whitelist (DB Key tm.flowstate.timeout). Client’s IP is added to whitelist for 30 seconds but only if there is no traffic from this client, if BIG-IP have already seen ACK from this client then its IP is removed from whitelist since connection has been already established between client and server. Whitelist is common to all virtual servers, so if a client is whitelisted it will be for all applications. Whitelist it is not mirrored. Whitelist it is not shared among blades. SYN Cookie Challenge When a TCP connection is initiated the TCP SYN packet sent by the client specify certain values that define the connection, some of these values are mandatory, like source and destination IP and port, otherwise you would not be able to identify the correct connection when packets arrive. Some other values are optional and they are used to improve performance, like TCP options or WAN optimizations. Under normal circumstances, upon TCP SYN reception the system creates a TCB entry where all this information is saved. BIG-IP will use this information to set up the TCP connection with the backend server. The problem we face is that when SYN Cookie is in play device does not create the TCB entry, only a limited piece of connection information is collected, so the information that BIG-IP has about the TCP connection is limited. Remember that SYN Cookie is not handled by TCP stack but by the stateless SYN Cookie Agent, so we cannot save the connection details. What BIG-IP does instead is encoding key information in 24 bits, this left only some bits for the rest of data. While there is no room for values like Window Scale information, however we have space for other values, for example 3 bits reserved for MSS value. This limits MSS possible values to 8, the most commonly used values, and the one chosen will the nearest to the original MSS value. However, note that MSS limitation has a workaround through configuration in FastL4 profile that it can help in some cases. The option SYN Cookie MSS in this profile specifies a value that overrides the SYN cookie maximum segment size (MSS) value in the SYN-ACK packet that is returned to the client. Valid values are 0, and values from 256 through 9162. The default is 0, which means no override. You might use this option if backend servers use a different MSS value for SYN cookies than the BIG-IP system does. If this is not enough for some customers, BIG-IP overcomes this problem by using TCP timestamp space to save extra information about TCP connection, in other words we create a second extra SYN Cookie. This space is used to record the client and server side Window Scale values, or the SACK info which are then made available to the TCP stack via this cookie if the connection is accepted. Note that TCP connection will only be accepted if both SYN Cookies (standard and Timestamp) are correct. So all this means that if the client starts a connection with TCP TS set then Big-IP will have more space to encode information and hence the performance will be improved when under TCP SYN flood attack. NOTE: SYN Cookie Timestamp extension (software or hardware) only work for standard virtual servers currently. FastL4 virtual servers only supports MMS option in SYN Cookie mode. Conclusion Now you know how SYN Cookie works under the hoods. In next article I will describe when SYN Cookie is activated and I will give specific details of BIG-IP SYN Cookie operation. Note that limitations related to SYN Cookie commented in this article are not caused specifically by BIG-IP SYN Cookie implementation but by SYN Cookie standard itself. In fact, F5 Networks implements some improvements on SYN Cookie to get a better performance and provide some extra features.3KViews3likes3Comments1. SYN Cookie: Intro
Introduction As for today one of the most common and easiest DDoS attacks to carry out is TCP SYN flood attack. Due to this, many efforts have been dedicated to implement the best solution to mitigate it. This is why you can find different countermeasures with the same goal, like reducing SYN-RECEIVED timer TCP state, increasing backlog, recycling old TCBs, using TCPCT (RFC6013), implementing SYN cache, etc. but the widest spread choice is TCP SYN Cookie. In this article series I explain several aspects of SYN Cookie like how this countermeasure works and how is implemented in BIG-IP, how must be configured, differences between hardware and software SYN Cookie, things to take into account when dealing with SYN Cookie and troubleshooting. All examples shown in these articles are based on TMOS v16, although there could be some slight difference with your specific version it will help you to understand what it is happening in your own system. Note that, SYN Cookie term refers to the countermeasure itself, but sometimes you will read SYN Cookie meaning in fact SYN Cookie challenge. I will talk about the difference throughout the articles. As a final comment, please be aware that I have tried as much as possible to avoid talking about DB keys or internal commands throughout these article series. In any case, as a rule of thumb, please do not change any DB key unless this change is recommended by an F5 engineer. Why SYN Cookie? If you review the well-knownTCP state diagramyou can probably notice a weak point. Below I have extracted the section of TCP diagram that describes aTCP passive openand I have filled in red the specific TCP state that support the TCP SYN flood attack. Reason is that in SYN_RCVD state the system already reserves memory for the incoming connection information, which is saved in the so called TCB (Transmission Control Block). This means that the system starts to consume memory even before the connection is already ESTABLISHED. Fig1. TCP state diagram section This can be dangerous due to a simple reason, if a device running a server and waiting for connections receives a huge enough amount of TCP SYN packets then the service could be unavailable for legitimate users. This is because, according to TCP standard, the server must answer with a SYN/ACK packet to all these TCP SYN packets, create a TCB entry for each connection and also wait for last ACK from client. Therefore, the server will need to allocate memory, even without knowing if these connections will be finished successfully or not. This is the base of a TCP SYN flood attack. An attacker just needs to generate enough amount of TCP SYN packets to overwhelm a server. Any ISP will allow TCP SYN packets inside their network infrastructure since they cannot know if these packets are legitimate or not, and hence, all these packets will reach the victim. TCP SYN flood attack can cause two different symptoms depending on reachability of attacker’s source IPs. Attending to which one you face maybe you could have clues about the origin of the attack: 1.Source IPs from where the TCP SYN flood attack is being run are reachable from the attacked device. In this case when the target victim replies with expected responses (SYN/ACK) to source IPsthe source IPs will response with a RST packet to the server since they are not aware of any TCP connection started from them. Fig2. DDoS TCP SYN Flood attack (bots) This behaviour will cause that the server most probably detects a TCP RST flood attack together with TCP SYN flood attack: Feb 22 10:12:54 slot1/AFM1 err tmm2[14782]: 01010252:3: A Enforced Device DOS attack start was detected for vector TCP RST flood, Attack ID 31508769 The good news in this case is that server will be able to free up space for more TCBs since upon receiving the RST packet TCP state changes to CLOSED and then to LISTEN and therefore reserved space used for TCP connection will be freed up. This is not the common behaviour since what an attacker wants is waste all server resources.Typically, source IPs are random and not reachable from server. Attending to this explanation, if you face a RST flood together with a SYN flood attack maybe attacker is just sending an hping3 from internal subnets, or near to your Big-IP device. 2.Source IPs are not reachable from the attacked device. This is the typical behaviour since, as commented, the idea for TCP SYN flood DDoS attack is consuming all server resources dedicated to TCP connections in order to avoid providing the service to legitimate users. Note that when I say ‘reachable’ I mean that TCP SYN/ACK packets sent by server never has a response. Source IP could be reachable by ICMP or by starting a new TCP connection from server, but unrelated SYN/ACK packets from server are dropped at any point in the path, or even in the client itself. So the result is the same as if they were not reachable at all. In this case TCP SYN/ACK packets sent by the server will be lost and hence space for TCB will be reserved until timeout. Fig3. TCP SYN Flood attack (generic) Note that the word ‘attack’ in this article series does not mandatorily refers to intentional attacker, but also can refer to legitimate devices doing something wrong by error or ignorance. Also a wrong SYN Cookie configuration can warn us about a non real TCP SYN flood attack. Conclusion At this point you know basic theory under TCP SYN flood attack, so the question now is, how can you avoid that a simple TCP SYN packet reserves space for TCB entry? You cannot. Since you have to keep with TCP standard only choice left is rejecting the TCP SYN packet and close the connection. Then the question is how to guarantee that legitimate clients will be able to connect to the server if you close their connections, and solution is, using TCP specification to your benefit. This is what TCP SYN Cookie does and what I will review in the next article, showing how it is implemented in BIG-IP.2.7KViews3likes0Comments5. SYN Cookie: LTM Vs AFM
Introduction At this point I have covered SYN Cookie from LTM perspective, in this article I will explain the important differences between LTM and AFM SYN Cookie. LTM Vs AFM AFM is a security platform, so it adds some extra options when configuring SYN Cookie that you cannot find in LTM. These additional abilities are added to AFM because in AFM SYN Cookie is just another DoS vector, so this allows you to configure characteristics reserved for DoS vectors. In fact AFM SYN Cookie is configured as a DoS vector called TCP Half Open. It is very important to note that when LTM and AFM SYN Cookie are configured at the same time in your BIG-IP then AFM SYN Cookie will work since it has precedence: Fig12. AFM Vs LTM However, before starting describing SYN Cookie in AFM be aware that comparing to other DoS vectors TCP half open has a couple of limitations: -Blacklisting is not allowed. -TCP SYN flood attackers handled by SYN Cookie are not included in IPI You have learnedmain details about LTM SYN Cookie, now the best way to see the advantages of AFM is by listing them: One of the most important advantages of AFM SYN Cookie is that it allows more granularity than LTM SYN Cookie since we can define a different threshold for each virtual server. This means that you can configure different SYN Cookie thresholds for each virtual server, for each VLAN and globally for the device. Remember that LTM SYN cookie only allows us to define a sole threshold for all virtual servers. Fig13. AFM granularity AFM handles SYN Cookie as another DoS vector, so administration is easier since you do not need to create parallel configurations for different types of countermeasures. AFM whitelist is unified, there is only one whitelist for DoS and SYN cookie. This ease the security administration. TCP Half Open DoS vector also allows you to use Auto-Threshold, so you can rely on AFM in order to define the best threshold attending to your environment, after a learning period of course. AFM provides extra information about SYN Cookie in form of events and stats. Logging As I pointed out one of the important advantages of AFM is that it provides extra information about SYN Cookie. In order to get this information first you need to create a Security Log Profile and enable specific options in it: Network firewall will provide specific SYN Cookie events, while DoS protection will provide DoS events for TCP Half open, as it does for any other DoS vector. But you will also get TCP half open stats automatically because AFM enables basic AVR under the hoods. So you will be able to see information in Dashboard. Below tables shows the information you can check in GUI after enabling above options in security log profile: Configuration Let’s see how to configure SYN Cookie in different contexts for AFM as we did for LTM in previous article: Device/Global context Details about the context provided for LTM applies here as well. Also note that SYN cookie MUST be enabled in the protocol profile applied to virtual server in order to AFM Global SYN cookie take the virtual server into account when counting embryonic connections. Configuration options for typical LTM protocol profiles: Remember that DSR whitelist is not the same than SYN Cookie whitelist. DSR whitelist is a special whitelist that automatically add legitimate clients to a temporal whitelist in order to allow SYN Cookie in DSR environments (see article 2 of this series). On the other hand, SYN Cookie whitelist is in fact a DoS whitelist where client’s IPs are added until you decide to remove them. During the time an IP is in this address list the client will not be matched against DoS vectors. Above table refers specifically to DSR whitelist. Configuration options for TCP half open vector through TMSH are common to other vectors, but if you try to configure a non supported option then you will get an error. For example: 01071b13:3: DOS attack data (tcp-half-open): bad actor and auto blacklisting features do not apply to tcp-half-open vector. Since we are working with DoS there are other options you can configure for this SYN Cookie vector, like auto-threshold, floors, multiplier, etc. but in this article I just want to describe the related SYN Cookie usage, so you can compare better with LTM version. VLAN context Details about this context described for LTM applies here. (*) Be aware of ID993269. At this point it is important to give more details about the difference between mitigating a TCP SYN flood attack by using TCP SYN flood DoS vector or by using SYN Cookie. TCP SYN flood vector will just count the number of TCP SYN packets that reach the specific context where it is configured, if this counter value exceeds the threshold that you configured for the SYN Flood vector, then SYN packets will be dropped until counter value is under the threshold again. As you can guess this could disrupt legitimate traffic during the attack. Sometimes this is better than lose totally an application but you need to configure this with care and always as a last resort. On the other hand, SYN Cookie will not drop any connection unless connection comes from an attacker. Virtual server context Details about the context described for LTM applies here. Example for a virtual server using a TCP profile: Conclusion Now you know the difference between SYN Cookie LTM and AFM. Clearly AFM has more versatility, so it could be interesting thinking in moving SYN Cookie configuration from LTM to AFM if we have this module licensed. Configuration will be centralized and might be clearer if you are get used to work with security DoS vectors. In next article I will write about the differences between hardware and software when working with SYN Cookie.1.4KViews3likes0Comments