AI Security : Prompt Injection and Insecure Output Handling.

In this article, I discuss OWASP LLM Top 10 and two major attack methodologies.

OWASP LLM Top 10

OWASP Top 10, well known to security engineers, is a widely used ranking of the frequently used attack methodologies to the web services. OWASP also published Top 10 ranking of AI security a list of the Top 10 attacks against Large Language Models (LLM): OWASP Top 10 for Large Language Model Applications

The current Top 10 ranking is (ver. 1.1) :

Prompt Injection
Insecure Output Handling
Training Data Poisoning
Model Denial of Service
Supply Chain Vulnerabilities
Sensitive Information Disclosure
Insecure Plugin Design
Excessive Agency
Over-reliance
Model Theft

I have previously written about attacks against AI. I had discussed "Adversarial Examples" which corresponds to 3 in this ranking (LLM03 Training Data Poisoning) . In this article, I am going to discuss 1, and 2.

LLM01: Prompt Injection

Prompt Injection attack is to manipulate a LLM by crafted inputs. Those causes the target LLM to perform an unintended action. Most LLMs constrain the content of their answers to prevent them from giving answers that could have dangerous consequences. Thus, if you ask a question that contains dangerous content to the LLM, the LLM will not be answered as is. As like Web Application Firewall (WAF), the LLM inspects the input to detect and halt answers, which is not desirable. So a prompt attack is to insert some strings to evade inspection.

fromAdversarial Attacks and Defenses in Large Language Models: Old and New Threats

However, by typing something other than the usual words, it may be able to circumvent the restriction and force the LLM to give an unintended answer. This is the equivalent of Jailbreak on an iPhone. Just as there are WAF evasion attacks that use special strings to evade WAF's inspection, there are constraint evasion attacks against LLMs as well.

For example, if you write the characters for asking questions in a picture instead of typing them in, you can bypass the LLM's checks, which use pattern matching to determine whether the input is constrained or not.

ArtPrompt is a paper which examines the possibility of bypassing this constraint by using ASCII art as well as pictures.

From: ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

OWASP suggests input validation, strict input guidelines, and context-aware filtering for this attack. Very simply, using human-eye check will be one way to do that. As like WAF signatures, the development of LLM prompt check patterns will happen in the future.

LLM02: Unsecured output handling

If LLM output is sent to the backend system without detailed checks, backend vulnerabilities may be exploited, causing unintended consequences.

LLM not only answers questions, but also creates applications that match the request, by using LLM application framework like Langchain. The application executes by itself and works as a backend application.

From Demystifying RCE Vulnerabilities in LLM-Integrated Apps

So this time the output matters. Unintended behavior in the output to this application can lead to risks such as leakage of private information in the LLM. In this model the output is application, thus, if there is a vulnerability the attacker might be able to run code which leaks system information, which is similar to XSS or SQL Injection attack. OWASP suggests evaluating outputs for mitigation.

Updated Oct 04, 2024

Version 2.0