L7 DoS Protection with NGINX App Protect DoS

Intro

NGINX security modules ecosystem becomes more and more solid. Current App Protect WAF offering is now extended by App Protect DoS protection module. App Protect DoS inherits and extends the state-of-the-art behavioral L7 DoS protection that was initially implemented on BIG-IP and now protects thousands of workloads around the world.

In this article, I’ll give a brief explanation of underlying ML-based DDoS prevention technology and demonstrate few examples of how precisely it stops various L7 DoS attacks.

Technology

It is important to emphasize the difference between the general volumetric-based protection approach that most of the market uses and ML-based technology that powers App Protect DoS.

Volumetric-based DDoS protection is an old and well-known mechanism to prevent DDoS attacks. As the name says, such a mechanism counts the number of requests sharing the same source or destination, then simply drops or applies rate-limiting after some threshold crossed. For instance, requests sourcing the same IP are dropped after 100 RPS, requests going to the same URL after 200 RPS, and rate-limiting kicks in after 500 RPS for the entire site.

Obviously, the major drawback of such an approach is that the selection criterion is too rough. It can cause erroneous drops of valid user requests and overall service degradation. The phenomenon when a security measure blocks good requests is called a “false positive”.

App Protect DoS implements much more intelligent techniques to detect and fight off DDoS attacks. At a high level, it monitors all ongoing traffic and builds a statistical model in other words a baseline in a process called “learning”. The learning process almost never stops, therefore a baseline automatically adjusts to the current web application layout, a pattern of use, and traffic intensity. This is important because it drastically reduces maintenance cost and reaction speed for the solution. There is no more need to manually customize protection configuration for every application or traffic change.

Infinite learning produces a legitimate question. Why can’t the system learn attack traffic as a baseline and how does it detect an attack then? To answer this question let us define what a DDoS attack is. A DDoS attack is a traffic stream that intends to deny or degrade access to a service. Note, the definition above doesn’t focus on the amount of traffic. ‘Low and slow’ DDoS attacks can hurt a service as severely as volumetric do. Traffic is only considered malicious when a service level degrades. So, this means that attack traffic can become a baseline, but it is not a big deal since protected service doesn’t suffer.

Now only the “service degradation” term separates us from the answer. How does the App Protect DoS measure a service degradation? As humans, we usually measure the quality of a web service in delays. The longer it takes to get a response the more we swear. App Protect DoS mimics human behavior by measuring latency for every single transaction and calculates the level of stress for a service. If overall stress crosses a threshold App Protect DoS declares an attack. Think of it; a service degradation triggers an attack signal, not a traffic volume. Volume is harmless if an application server manages to respond quickly.

Nice! The attack is detected for a solid reason. What happens next? First of all, the learning process stops and rolls back to a moment when the stress level was low. The statistical model of the traffic that was collected during peacetime becomes a baseline for anomaly detection. App Protect DoS keeps monitoring the traffic during an attack and uses machine learning to identify the exact request pattern that causes a service degradation. Opposed to old-school volumetric techniques it doesn’t use just a single parameter like source IP or URL, but actually builds as accurate as possible signature of entire request that causes harm. The overall number of parameters that App Protect DoS extracts from every request is in the dozens. A signature usually contains about a dozen including source IP, method, path, headers, payload content structure, and others. Now you can see that App Protect DoS accuracy level is insane comparing to volumetric vectors.

The last part is mitigation. App Protect DoS has a whole inventory of mitigation tools including accurate signatures, bad actor detection, rate-limiting or even slowing down traffic across the board, which it uses to return service. The strategy of using those is convoluted but the main objective is to be as accurate as possible and make no harm to valid users. In most cases, App Protect DoS only mitigates requests that match specific signatures and only when the stress threshold for a service is crossed. Therefore, the probability of false positives is vanishingly low.

The additional beauty of this technology is that it almost doesn’t require any configuration. Once enabled on a virtual server it does all the job "automagically" and reports back to your security operation center. The following lines present a couple of usage examples.

Demo

Demo topology is straightforward. On one end I have a couple of VMs. One of them continuously generates steady traffic flow simulating legitimate users. The second one is supposed to generate various L7 DoS attacks pretending to be an attacker. On the other end, one VM hosts a demo application and another one hosts NGINX with App Protect DoS as a protection tool. VM on a side runs ELK cluster to visualize App Protect DoS activity.

Workflow of a demo aims to showcase a basic deployment example and overall App Protect DoS protection technology. First, I’ll configure NGINX to forward traffic to a demo application and App Protect DoS to apply for DDoS protection. Then a VM that simulates good users will send continuous traffic flow to App Protect DoS to let it learn a baseline. Once a baseline is established attacker VM will hit a demo app with various DoS attacks. While all this battle is going on our objective is to learn how App Protect DoS behaves, and that good user's experience remains unaffected.

Similar to App Protect WAF App Protect DoS is implemented as a separate module for NGINX. It installs to a system as an apt/yum package. Then hooks into NGINX configuration via standard “load_module” directive.

load_module modules/ngx_http_app_protect_dos_module.so;

Once loaded protection enables under either HTTP, server, or location sections. Depending on what would you like to protect.

app_protect_dos_enable [on|off]

By default, App Protect DoS takes a protection configuration from a local policy file “/etc/nginx/BADOSDefaultPolicy.json”

{ "mitigation_mode" : "standard", "use_automation_tools_detection": "on", "signatures" : "on", "bad_actors" : "on" }

As I mentioned before App Protect DoS doesn’t require complex config and only takes four parameters. Moreover, default policy covers most of the use cases therefore, a user only needs to enable App Protect DoS on a protected object.

The next step is to simulate good users’ traffic to let App Protect DoS learn a good traffic pattern. I use a custom bash script that generates about 6-8 requests per second like an average surfing activity.

While inspecting traffic and building a statistical model of good traffic App Protect DoS sends logs and metrics to Elasticsearch so we can monitor all its activity.

The dashboard above represents traffic before/after App Protect DoS, degree of application stress, and mitigations in place. Note that the rate of client-side transactions matches the rate of server-side transactions. Meaning that all requests are passing through App Protect DoS and there are not any mitigations applied. Stress value remains steady since the backend easily handles the current rate and latency does not increase.

Now I am launching an HTTP flood attack. It generates several thousands of requests per second that can easily overwhelm an unprotected web server. My server has App Protect DoS in front applying all its’ intelligence to fight off the DoS attack. After a few minutes of running the attack traffic, the dashboard shows the following situation.

The attack tool generated roughly 1000 RPS. Two charts on the left-hand side show that all transactions went through App Protect DoS and were reaching a demo app for a couple of minutes causing service degradation. Right after service stress has reached a threshold an attack was declared (vertical red line on all charts).

As soon as the attack has been declared App Protect DoS starts to apply mitigations to resume the service back to life. As I mentioned before App Protect DoS tries its best not to harm legitimate traffic. Therefore, it iterates from less invasive mitigations to more invasive. During the first several seconds when App Protect DoS just detected an attack and specific anomaly signature is not calculated yet. App Protect DoS applies an HTTP redirect to all requests across the board. Such measure only adds a tiny bit of latency for a web browser but allows it to quickly filter out all not-so-intelligent attack tools that can’t follow redirects. In less than a minute specific anomaly signature gets generated. Note how detailed it is.

The signature contains 11 attributes that cover all aspects: method, path, headers, and a payload. Such a level of granularity and reaction time is not feasible neither for volumetric vectors nor a SOC operator armed with a regex engine.

Once a signature is generated App Protect DoS reduces the scope of mitigation to only requests that match the signature. It eliminates a chance to affect good traffic at all. Matching traffic receives a redirect and then a challenge in case if an attacker is smart enough to follow redirects.

After few minutes of observation App Protect DoS identifies bad actors since most of the requests come from the same IP addresses (right-bottom chart). Then switches mitigation to bad actor challenge. Despite this measure hits all the same traffic it allows App Protect DoS to protect itself. It takes much fewer CPU cycles to identify a target by IP address than match requests against the signature with 11 attributes. From now on App Protect DoS continues with the most efficient protection until attack traffic stops and server stress goes away.

The technology overview and the demo above expose only a tiny bit of App Protect DoS protection logic. A whole lot of it engages for more complicated attacks. However, the results look impressive. None of the volumetric protection mechanisms or even a human SOC operator can provide such accurate mitigation within such a short reaction time. It is only possible when a machine fights a machine.

Updated Dec 15, 2022

Version 2.0

devops

NGINX App Protect DoS