on 12-Feb-2020 14:26
In my article “Concept of Device DOS and DOS profile”, I recommended to use the “Fully Automatic” or “Multiplier” based configuration option for some DOS vectors. In this article I would like to explain how these threshold modes work and what is happening behind the scene.
When you configure a DOS vector you have the option to choose between different threshold modes:
“Fully Automatic”, “Auto Detection / Multiplier Based”, “Manual Detection / Auto Mitigation” and “Fully Manual”.
Figure 1: Threshold Modes
The two options I normally use on many vectors are “Fully Automatic” and “Auto Detection / Multiplier Based”. But what are these two options do for me?
To manually set thresholds is for some vectors not an easy task. I mean who really knows how many PUSH/ACK packets/sec for example are usually hitting the device or a specific service? And when I have an idea about a value, should this be a static value? Or should I better take the maximum value I have seen so far? And how many packets per second should I put on top to make sure the system is not kicking in too early? When should I adjust it? Do I have increasing traffic?
In reality, the rate changes constantly and most likely during the day I will have more PUSH/ACK packets/sec then during the night.
What happens when there is a campaign or an event like “Black Friday” and way more users are visiting the webpage then usually? During these high traffic events, my suggested thresholds might be no longer correct which could lead to “good” traffic getting dropped.
All this should be taken into consideration when setting a threshold and it ends up being very difficult to do manually. It´s better to make the machine doing it for you and this is what “Fully Automatic” is about.
Figure 2: Expected EPS
As soon as you use this option, it leverages from the learning it has done since traffic is passing through the BIG-IP, or since you have enforced the relearning, which resets everything learned so far and starts from new.
The system continuously calculates the expected rates for all the vectors based on the historic rates of traffic. It takes the information up to one year and calculates them with different weights in order to know which packets rate should be expected at that time and day for that specific vector in the specific context (Device, Virtual Server/Protected Object).
The system then calculates a padding on top of this expected rate. This rate is called Detection Rate and is dependent on the “threshold sensitivity” you have configured:
Low Sensitivity means 66% padding
Medium Sensitivity means 40% padding
High Sensitivity means 0% padding
Figure 3: Detection EPS
As soon as the current rate is above the detection value, the BIG-IP will show the message “Attack detected”, which actually means anomaly detected, because it sees more packets of that specific vector then expected + the padding (detection_rate). But DoS mitigation will not start at that point!
Figure 4: Current EPS
Keep in mind, when you run the BIG-IP in stateful mode it will drop 'out of state' packets anyway. This has nothing to do with DoS functionalities.
But what happens when there is a serious flood and the BIG-IP CPU gets high because of the massive number of packets it has to deal with? This is when the second part of the “Fully Automatic” approach comes into the game.
Again, depending on your threshold sensitivity the DOS mitigation starts as soon as a certain level of stress is detected on the CPU of the BIG-IP.
Figure 5: Mitigation Threshold
Low Sensitivity means 78,3% TMM load
Medium Sensitivity means 68,3% TMM load
High Sensitivity means 51,6% TMM load
Note, that the mitigation is per TMM and therefore the stress and rate per TMM is relevant.
When the traffic rate for that vector is above the detection rate and the CPU of the BIG-IP (Device DOS) is “too” busy, the mitigation kicks in and will rate limit on that specific vector. When a DOS vector is hardware supported, FPGAs drop the packets at the switch level of the BIG-IP. If that DOS vector is not hardware supported, then the packet is dropped at a very early stage of the life cycle of a packet inside a BIG-IP. The rate at which are packets dropped is dynamic (mitigation rate), depending on the incoming number of packets for that vector and the CPU (TMM) stress of the BIG-IP. This allows the stress of the CPU to go down as it has to deal with less packets.
Once the incoming rate is again below the detection rate, the system declares the attack as ended. Note: When an attack is detected, the packet rate during that time will not go into the calculation of expected rates for the future. This ensures that the BIG-IP will not learn from attack traffic skewing the automatic thresholds. All traffic rates below the detection (or below the floor value, when configured) rate modify the expected rate for the future and the BIG-IP will adjust the detection rate automatically.
For most of the vectors you can configure a floor and ceiling value.
Floor means that as long as the traffic value is below that threshold, the mitigation for that vector will never kick in. Even when the CPU is at 100%.
Ceiling means that mitigation always kicks in at that rate, even when the CPU is idle.
With these two values the dynamic and automatic process is done between floor and ceiling.
Mitigation only gets executed when the rate is above the rate of the Floor EPS and Detection EPS AND stress on the particular context is measured.
Figure 6: Floor and Ceiling EPS
What is the difference when you use “Fully Automatic” on Device level compared to VS/PO (DOS profile) level?
Everything is the same, except that on VS or Protected Object (PO) level the relevant stress is NOT the BIG-IP device stress, it is the stress of the service you are protecting (web-, DNS, IMAP-server, network, ...).
BIG-IP can measure the stress of the service by measuring TCP attributes like retransmission, window size, congestion, etc. This gives a good indication on how busy a service is. This works very well for request/response protocols like TCP, HTTP, DNS.
I recommend using this, when the Protected Object is a single service and not a “wildcard” Protected Object covering for example a network or service range.
When the Protected Object is a “wildcard” service and/or a UDP service (except DNS), I recommend using “Auto Detection / Multiplier Option”.
It works in the same way as the “Fully Automatic” from the learning perspective, but the mitigation condition is not stress, it is the multiplication of the detection rate.
For example, the detection rate for a specific vector is calculated to be 100k packets/sec. By default, the multiplication rate is “500”, which means 5x. Therefore, the mitigation rate is calculated to 500k packets/sec. If that particular vector has more than 500k packets/sec those packets would be dropped. The multiplication rate can also be individually configured. Like in the screenshot, where it is set to 8x (800).
Figure 7: Auto Detection / Multiplier Based Mitigation
The benefit of this mode is that the BIG-IP will automatically learn the baseline for that vector and will only start to mitigate based on a large spike. The mitigation rate is always a multiplication of the detection rate, which is 5x by default but is configurable.
When should I use “Fully Manual”?
When you want to rate-limit a specific vector to a certain number of packets/sec, then “Fully Manual” is the right choice. Very good examples for that type of vector are the “Bad Header” vector types. These type of packets will never get forwarded by the BIG-IP so dropping them by a DoS vector saves the CPU, which is beneficial under DoS conditions.
In the screenshot below is a vector configured as “Fully Manual”. Next I’ll describe what each of the options means.
Figure 8: Fully Manual
Detection Threshold EPS configures the packet rate/sec (pps) when you will get a log messages (NO mitigation!).
Detection Threshold % compares the current pps rate for that vector with the multiplication of the configured percentage (in this example 5 for 500%) with the 1-minute average rate. If the current rate is higher, then you will get a log message.
Mitigation Threshold EPS rate limits to that configured value (mitigation).
I recommend setting the threshold (Mitigation Threshold EPS) to something relatively low like ‘10’ or ‘100’ on ‘Bad Header’ type of vectors.
You can also set it also to ‘0’, which means all packets hitting this vector will get dropped by the DoS function which usually is done in hardware (FPGA).
With the ‘Detection Threshold EPS’ you set the rate at which you want to get a log messages for that vector. If you do it this way, then you get a warning message like this one to inform you about the logging behavior:
Warning generated: DOS attack data (bad-tcp-flags-all-set): Since drop limit is less than detection limit, packets dropped below the detection limit rate will not be logged.
Another use-case for “Fully Manual” is when you know the maximum number of these packets the service can handle. But here my recommendation is to still use “Fully Automatic” and set the maximum rate with the Ceiling threshold, because then the protected service will benefit from both threshold options.
Important: Please keep in mind, when you set manual thresholds for Device DoS the thresholds are to protect each TMM. Therefore the value you set is per TMM!
An exception to this is the Sweep and Flood vector where the threshold is per BIG-IP/service and not per TMM like on DoS profiles.
When using manual thresholds for a DOS profile of a Protected Object the threshold configuration is per service (all packets targeted to the protected service) NOT per TMM like on the Device level. Here the goal is to set how many packets are allowed to pass the BIG-IP and reach the service. The distribution of these thresholds to the TMMs is done in a dynamic way: Every TMM gets a percentage of the configured threshold, based on the EPS (Events Per Second, which is in this context Packets Per Second) for the specific vector the system has seen in the second before on this TMM. This mechanism protects against hash type of attacks.
Ok, I hope this article gives you a better understanding on how ‘Fully Manual’, ‘Fully Automatic’ and ‘Auto Detection / Multiplier Based Mitigation’ works. They are important concepts to understand, especially when they work in conjunction with stress measurement. This means the BIG-IP will only kick in with the DoS mitigation when the protected object (BIG-IP or the service behind the BIG-IP) is under stress. -Why risk false positives, when not necessary?
With my next article I will demonstrate you how Device DoS and the DoS profiles work together and how the stateful concept cooperates with the DoS mitigation. I will show you some DoS commands to test it and also commands to get details from the BIG-IP via CLI.
Thank you, sVen Mueller
Thank you for this new article that clarified the threshold concept. It also makes me greedy because your article shows how valuable a dashboard providing such graphs will improve our understanding of the device. Anyone from the dev here ? 🙂
The content of the next article is also promising, keep doing the good job !
Thanks a lot Sylvain for your feedback. 🙂
It is really motivating to hear that the articles are helpful. 😊
Not sure why it took a year to be directed here but this should be required reading for anyone doing DDoS work.
Great articles!!! Hope you will continue this series. I wonder if I Am missing something or there is mistake on the diagrams. In red in left top corner there is sentence "Mitigation starts, because the expected EPS value is exceeded..." Should it not say detection EPS?
I Am as well a bit confused with Mitigation (CPU) Threshold line - should not this line be flat? My understanding is that for given set Threshold Sensitivity CPU Threshold is constant, so CPU Threshold line should be flat? Or maybe this is not CPU Threshold line but Mitigation rate line?
One more question about using different State settings. Could you share some real life examples when to use:
You never mentioned Manual Detection/Auto Mitigation - is that because this mode is not really useful?
very good catch! Yes, the sentence in red within the diagram is not correct. It should say "Mitigation starts, because DETECTED EPS value is exceeded..."
The mitigation rate on device level is calculated based on CPU stress. In my diagram the CPU stress should show that the CPU is in the beginning relaxed and therefor the calculated mitigate rate is very high and flat, until the CPU gets under pressure.
I hope thats helps!?
"Learn Only" is great, when you implement the box. But you should keep the caveats also in mind. First of all, in "Learn Only" mode, the box never mitigates, which can be of course dangerous. It also takes everything as valid traffic, which again means, when there is an attack, it negatively impacts the learning and threshold calculation.
I usually go immediately with "Mitigate" and set the floor value to a reasonable value. Soon, I will publish an article about my ELK based DDoS dashboards (https://github.com/elk-f5ddos/DDOS-Dashboard), which also provide packet rate graphs for all vectors. I´m sure this makes fine tuning way easier.
Please keep in mind that the box also learns with "Mitigate" and "Detect only" mode, unless it detects an anomaly.
"Detect Only" is really only for reporting from my point of view.
I never used "Manual Detection/Auto Mitigation" to be honest.
I hope that helps!?
Thanks a lot for clarification, it helps a lot. I just wonder if Detect Only/Mitigation settings influence anyhow Learn Only. I mean when you have fresh AFM install vectors has some preconfigured settings, so when you change state from Disabled some vectors had threshold preset to Fully Automatic (FA), some Fully Manual (FM), for FA some floor values are as well preset.
Question is if learning will work different for vectors with FA preset and for FM preset or it's irrelevant? Learn Only seems to have no sense for vectors with FM threshold (especially when changed manually from FA).
If Threshold Mode is influencing Learn Only State what is advised setting - change every vector supporting FA to FA (if preset is FM)?
I wonder as well what is best way to verify what was learned (what Detection Thresholds was learned):
Am I right?
Last one - in FA Detection EPS is not constant but changing over time - but when looking on vector (for example via File Type: Device DoS) we see just single value in Aggregate column (Detection Threshold EPS section) - what this single value represents? Max Detection EPS value over some period of time, min value, something else?
Thanks in advance,
Great article that explains the difference between Device DOS protection and using a DOS profile on the VIP.
First of all, thanks again for your reply in the other post.
I have a question regarding the "Detection Threshold %" of the fully manual mode.
I've trying to get an alert with this threshold on a protected object, but with no much success. I've configured the detection eps and mitigation eps to a very high value above 40k to make sure that I didn't get an alert and no mitigation occur with this two thresholds and generated a baseline of 160 eps for about 4 hours and keep it running while I generate an attack of around 20k eps and let the "Detection Threshold %" at 200 but never get the alert.
The vectorI tested was SYN Flood with only the threshols configured, no bad actor enabled.
An I missing something here?
Thanks in advance.