The last days of False Positives! Really ? - Part 1
You might have heard all the buzz and claims that artificial intelligence (AI) and machine learning (ML) will eradicate the issue of false positives (FP) in application security (AppSec). Specifically, that advanced deep learning recurrent neural networks (RNN) algorithms are being developed with the goal of drastically increasing the accuracy of threat detection and eliminating FPs. Are we really witnessing the last days of FPs?
In AppSec, FPs occur when systems identify legitimate requests as attacks or violations. AppSec is an expansive topic, but for this article we’ll focus on web application and anti-fraud protection.
FPs not only reduce security accuracy and cause threat-alert fatigue, but they can also cause severe financial losses, as the following quotes illustrate:
- "A 2015 report from Javelin Strategy and Research estimates that only one in five fraud predictions is correct and that the errors can cost a bank $118 billion in lost revenue, as declined customers then refrain from using that credit card."
- "Businesses today face a threat landscape that produces an average of 450,000 new potential threats every day. According to the Ponemon ‘Cost of Malware Containment’ report, security teams can expect to log almost 17,000 malware alerts in a typical week (~50% are FPs) ==> +100 alerts per hour ==> +21,000 hours each year ==> 2,625 eight-hour shifts chasing down FPs! And we’re only talking about the time spent on malware—there are plenty of other alerts to be concerned with as well."
How do security systems detect attacks?
Attack detection techniques correspond to FP reduction techniques, since FPs result from systems identifying legitimate traffic as attacks. Following are some methods that help detect attacks.
Rule-based detection
Rule-based detection (sometimes referred to as Signature-based detection) refers to inspecting application traffic against a set of predefined rules. The rules evaluate the specific values of an HTTP request/response element. Elements include L7 properties, such as the URL, HTTP Headers, Parameters, Payload, and other L2-6 properties, such as ports, IP addresses, etc. When elements of the traffic match the patterns defined in a rule, the system considers the traffic malicious and triggers a violation. You can format rules as signatures that identify attacks, or classes of attacks, on a web application and its components, or you can directly implement rules as security controls on the WAF system. Inspection in WAF systems is typically based on packets and sessions.
Detecting attacks relies on patterns defined in rules, so you must constantly update the rules to keep up with the latest attacks. Some advanced web application firewalls (WAF), like the BIG-IP ASM system, enable you to define custom (user-defined) signatures to offer additional flexibility for specific applications and environments. Custom signatures also allow you to proactively patch issues in situations where a new vulnerability is discovered.
For example, the following simplistic F5 BIG-IP ASM custom signature triggers a violation when an incoming traffic header contains the keyword backdoor.
<?xml version="1.0" encoding="utf-8"?> <signatures export_version="15.0.0"> <sig id="300000000"> <rev num="2"> <sig_name>custom-sig-01</sig_name> <rule>headercontent:"backdoor"; nocase;</rule> <last_update>2019-08-21 15:18:31</last_update> <apply_to>Request</apply_to> <risk>1</risk> <accuracy>1</accuracy> <doc>Test of a Custom Signature</doc> <attack_type>Cross Site Scripting (XSS)</attack_type> <systems> <system_name>Microsoft Windows</system_name> <system_name>Unix/Linux</system_name> </systems> </rev> </sig> </signatures>
Note: The rule pattern in the signature is usually obfuscated to avoid reverse engineering.
Rule-based detection is efficient at detecting known attacks; however, it does not easily detect new and zero days attacks.
Anomaly-based detection
Anomaly-based detection works by monitoring traffic in real time to distinguish normal traffic from anomalous traffic. Instead of inspecting traffic request data against signature rules, Anomaly-based detection looks for behavioral patterns in traffic that deviate from the norm. Unusual traffic that is linked with malicious activities is identified as an attack. Traffic is also evaluated for traffic similarity. If the unusual traffic is very similar to previous traffic and does not contain malicious patterns, it is likely not an attack.
Anomaly-based detection considers point-in-time and context. This type of detection observes statistical variation when the behavior changes and it considers the context of the application and the environment (including internal and external factors). It measures factors like frequency, variation, and volume.
Machine Learning
ML extends the capability of anomaly-based detection and opens new horizons, from a programmatic behavioral analysis to an intelligent self-learning detection mechanism. With ML, the system (machine) is exposed to data sets and is taught (the learning part of ML) how to detect threats by using advanced algorithms such as deep learning recurrent neural networks (RNN). To ensure robust learning, systems need powerful algorithms and good, reliable, and updated data sets. Quality data sets provide examples of both malicious traffic and legitimate traffic from which the system can learn. Data feeds usually originate from various sources, including customer traffic, threat campaigns, and experimental data.
Combining rule-based and anomaly-based techniques
Rule-based techniques excel at detecting known attacks, and anomaly-based techniques are theoretically better at detecting new attacks, so modern security systems like the BIG-IP ASM system use both rule-based and anomaly-based detection to efficiently guard your applications from malicious traffic.
We have now reviewed how WAFs detect attacks, so in the next article in this series we'll investigate the different techniques used to reduce the FP rate (how often the system mistakenly detects legitimate traffic as an attack).