Securing the LLM User Experience with an AI Firewall

As artificial intelligence (AI) seeps into the core day-to-day operations of enterprises, a need exists to exert control over the intersection point of AI-infused applications and the actual large language models (LLMs) that answer the generated prompts.   This control point should serve to impose security rules to automatically prevent issues such as personally identifiable information (PII) inadvertently exposed to LLMs.   The solution must also counteract motivated, intentional misuse such as jailbreak attempts, where the LLM can be manipulated to provide often ridiculous answers with the ensuing screenshotting attempting to discredit the service.

Beyond the security aspect and the overwhelming concern of regulated industries, other drivers include basic fiscal prudence 101, ensuring the token consumption of each offered LLM model is not out of hand.   This entire discussion around observability and policy enforcement for LLM consumption has given rise to a class of solutions most frequently referred to as AI Firewalls or AI Gateways (AI GW).

An AI FW might be leveraged by a browser plugin, or perhaps applying a software development kit (SDK) during the coding process for AI applications.   Arguably, the most scalable and most easily deployed approach to inserting AI FW functionality into live traffic to LLMs is to use a reverse proxy.   A modern approach includes the F5 Distributed Cloud service, coupled with an AI FW/GW service, cloud-based or self-hosted, that can inspect traffic intended for LLMs like those of OpenAI, Azure OpenAI, or privately operated LLMs like those downloaded from Hugging Face.

A key value offered by this topology, a reverse proxy handing off LLM traffic to an AI FW, which in turn can allow traffic to reach target LLMs, stems from the fact that traffic is seen, and thus controllable, in both directions.   Should an issue be present in a user’s submitted prompt, also known as an “inference”, it can be flagged: PII (Personally Identifiable Information) leakage is a frequent concern at this point.   In addition, any LLM responses to prompts are also seen in the reverse path: consider a corrupted LLM providing toxicity in its generated replies.  Not good.

To achieve a highly performant reverse proxy approach to secured LLM access, a solution that can span a global set of users, F5 worked with Prompt Security to deploy an end-to-end AI security layer.  This article will explore the efficacy and performance of the live solution.

 

Impose LLM Guardrails with the AI Firewall and Distributed Cloud

 

An AI firewall such as the Prompt Security offering can get in-line with AI LLM flows through multiple means.  API calls from Curl or Postman can be modified to transmit to Prompt Security when trying to reach targets such as OpenAI or Azure OpenAI Service.   Simple firewall rules can prevent employee direct access to these well-known API endpoints, thus making the Prompt Security route the sanctioned method of engaging with LLMs.

A number of other methods could be considered but have concerns.   Browser plug-ins have the advantage of working outside the encryption of the TLS layer, in a manner similar to how users can use a browser’s developer tools to clearly see targets and HTTP headers of HTTPS transactions encrypted on the wire.   Prompt Security supports plugins.  A downside, however, of browser plug-ins is the manageability issue, how to enforce and maintain across-the-board usage, simply consider the headache non-corporate assets used in the work environment. 

Another approach, interesting for non-browser, thick applications on desktops, think of an IDE like VSCode, might be an agent approach, whereby outbound traffic is handled by an on-board local proxy. Again, Prompt can fit in this model however the complexity of enforcement of the agent, like the browser approach, may not always be easy and aligned with complete A-to-Z security of all endpoints.

One of the simplest approaches is to ingest LLM traffic through a network-centric approach.  An F5 Distributed Cloud HTTPS load balancer, for instance, can ingest LLM-bound traffic, and thoroughly secure the traffic at the API layer, things like WAF policy and DDoS mitigations, as examples.   HTTP-based control plane security is the focus here, as opposed to the encapsulated requests a user is sending to an LLM.  The HTTPS load balancer can in turn hand off traffic intended for the likes of OpenAI to the AI gateway for prompt-aware inspections.

F5 Distributed Cloud (XC) is a good architectural fit for inserting a third-party AI firewall service in-line with an organization’s inferencing requests.   Simply project a FQDN for the consumption of AI services; in this article we used the domain name “llmsec.busdevF5.net” into the global DNS, advertising one single IP address mapping to the name.   This DNS advertisement can be done with XC.  The IP address, through BGP-4 support for anycast, will direct any traffic to this address to the closest of 27 international points of presence of the XC global fabric.   Traffic from a user in Asia may be attracted to Singapore or Mumbai F5 sites, whereas a user in Western Europe might enter the F5 network in Paris or Frankfurt.

As depicted, a distributed HTTPS load balancer can be configured – “distributed” reflects the fact traffic ingressing in any of the global sites can be intercepted by the load balancer. Normally, the server name indicator (SNI) value in the TLS Client Hello can be easily used to pick the correct load balancer to process this traffic.

The first step in AI security is traditional reverse proxy core security features, all imposed by the XC load balancer.   These features, to name just a few, might include geo-IP service policies to preclude traffic from regions, automatic malicious user detection, and API rate limiting; there are many capabilities bundled together.   Clean traffic can then be selected for forwarding to an origin pool member, which is the standard operation of any load balancer.   In this case, the Prompt Security service is the exclusive member of our origin pool. For this article, it is a cloud instantiated service - options exist to forward to Prompt implemented on a Kubernetes cluster or running on a Distributed Cloud AppStack Customer Edge (CE) node.

Block Sensitive Data with Prompt Security In-Line

AI inferences, upon reaching Prompt’s security service, are subjected to a wide breadth of security inspections.   Some of the more important categories would include:

  • Sensitive data leakage, although potentially contained in LLM responses, intuitively the larger proportion of risk is within the requesting prompt, with user perhaps inadvertently disclosing data which should not reach an LLM
  • Source code fragments within submissions to LLMs, various programming languages may be scanned for and blocked, and the code may be enterprise intellectual property
  • OWASP LLM top 10 high risk violations, such as LLM jailbreaking where the intent is to make the LLM behave and generate content that is not aligned with the service intentions; the goal may be embarrassing “screenshots”, such as having a chatbot for automobile vendor A actually recommend a vehicle from vendor B
  • OWASP Prompt Injection detection, considered one of the most dangerous threats as the intention is for rogue users to exfiltrate valuable data from sources the LLM may have privileged access to, such as backend databases
  • Token layer attacks, such as unauthorized and excessive use of tokens for LLM tasks, the so-called “Denial of Wallet” threat
  • Content moderation, ensuring a safe interaction with LLMs devoid of toxicity, racial and gender discriminatory language and overall curated AI experience aligned with those productivity gains that LLMs promise

To demonstrate sensitive data leakage protection, a Prompt Security policy was active which blocked LLM requests with, among many PII fields, a mailing address exposed.  To reach OpenAI GPT3.5-Turbo, one of the most popular and cost-effective models in the OpenAI model lineup, prompts were sent to an F5 XC HTTPS load balancer at address llmsec.busdevf5.net.   Traffic not violating the comprehensive F5 WAF security rules were proxied to the Prompt Security SaaS offering.  The prompt below clearly involves a mailing address in the data portion.

 

The ensuing prompt is intercepted by both the F5 and Prompt Security solutions.   The first interception, the distributed HTTPS load balancer offered by F5 offers rich details on the transaction, and since no WAF rules or other security policies are violated, the transaction is forwarded to Prompt Security.   The following demonstrates some of the interesting details surrounding the transaction, when completed (double-click to enlarge).

As highlighted, the transaction was successful at the HTTP layer, producing a 200 Okay outcome.   The traffic originated in the municipality of Ashton, in Canada, and was received into Distributed Cloud in F5’s Toronto (tr2-tor) RE site.   The full details around the targeted URL path, such as the OpenAI /v1/chat/completions target and the user-agent involved, vscode-restclient, are both provided.

Although the HTTP transaction was successful, the actual AI prompt was rejected, as hoped for, by Prompt Security.   Drilling into the Activity Monitor in the Prompt UI, one can get a detailed verdict on the transaction (double-click).

Following the yellow highlights above, the prompt was blocked, and the violation is “Sensitive Data”.  The specific offending content, the New York City street address, is flagged as a precluded entity type of “mailing address”.   Other fields that might be potentially blocking candidates with Prompt’s solution include various international passports or driver’s license formats, credit card numbers, emails, and IP addresses, to name but a few.

A nice, time saving feature offered by the Prompt Security user interface is to simply choose an individual security framework of interest, such as GDPR or PCI, and the solution will automatically invoke related sensitive data types to detect.

An important idea to grasp: The solution from Prompt is much more nuanced and advanced than simple REGEX; it invokes the power of AI itself to secure customer journeys into safe AI usage.   Machine learning models, often transformer-based, have been fine-tuned and orchestrated to interpret the overall tone and tenor of prompts, gaining a real semantic understanding of what is being conveyed in the prompt to counteract simple obfuscation attempts.   For instance, using printed numbers, such as one, two, three to circumvent Regex rules predicated on numerals being present - this will not succeed.

This AI infused ability to interpret context and intent allows for preset industry guidelines for safe LLM enforcement.   For instance, simply indicating the business sector is financial will allow the Prompt Security solution to pass judgement, and block if desired, financial reports, investment strategy documents and revenue audits, to name just a few.   Similar awareness for sectors such as healthcare or insurance is simply a pull-down menu item away with the policy builder.

Source Code Detection

A common use case for LLM security solutions is identification and, potentially, blocking submissions of enterprise source code to LLM services.   In this scenario, this small snippet of Python is delivered to the Prompt service:

def trial():
    return 2_500 <= sorted(choices(range(10_000), k=5))[2] < 7_500
sum(trial() for i in range(10_000)) / 10_000

 

A policy is in place for Python and JavaScript detection and was invoked as hoped for.

 

curl --request POST \
  --url https://llmsec.busdevf5.net/v1/chat/completions \
  --header 'authorization: Bearer sk-oZU66yhyN7qhUjEHfmR5T3BlbkFJ5RFOI***********' \
  --header 'content-type: application/json' \
  --header 'user-agent: vscode-restclient' \
  --data '{"model": "gpt-3.5-turbo","messages": [{"role": "user","content": "def trial():\n    return 2_500 <= sorted(choices(range(10_000), k=5))[2] < 7_500\n\nsum(trial() for i in range(10_000)) / 10_000"}]}'

 

 

Content Moderation for Interactions with LLMs

One common manner of preventing LLM responses from veering into undesirable territory is for the service provider to implement a detailed system prompt, a set of guidelines that the LLM should be governed by when responding to user prompts.   For instance, the system prompt might instruct the LLM to serve as polite, helpful and succinct assistant for customers purchasing shoes in an online e-commerce portal.   A request for help involving the trafficking of narcotics should, intuitively, be denied.

Defense in depth has traditionally meant no single point of failure. In the above scenario, screening both the user prompt and ensuring LLM response for a wide range of topics leads to a more ironclad security outcome.   The following demonstrates some of the topics Prompt Security can intelligently seek out; in this simple example, the topic of “News & Politics” has been singled out to block as a demonstration.

Testing can be performed with this easy Curl command, asking for a prediction on a possible election result in Canadian politics:

curl --request POST \
  --url https://llmsec.busdevf5.net/v1/chat/completions \
  --header 'authorization: Bearer sk-oZU66yhyN7qhUjEHfmR5T3Blbk*************' \
  --header 'content-type: application/json' \
  --header 'user-agent: vscode-restclient' \
  --data '{"model": "gpt-3.5-turbo","messages": [{"role": "user","content": "Who will win the upcoming Canadian federal election expected in 2025"}],"max_tokens": 250,"temperature": 0.7}'

The response, available in the Prompt Security console, is also presented to the user. In this case, a Curl user leveraging the VSCode IDE.  The response has been largely truncated for brevity, fields that are of interest is an HTTP “X-header” indicating the transaction utilized the F5 site in Toronto, and the number of tokens consumed in the request and response are also included.

 

Advanced LLM Security Features

Many of the AI security concerns are given prominence by the OWASP Top Ten for LLMs, an evolving and curated list of potential concerns around LLM usage from subject matter experts.   Among these are prompt injection attacks and malicious instructions often perceived as benign by the LLM.   Prompt Security uses a layered approach to thwart prompt injection.   For instance, during the uptick in interest in ChatGPT, DAN (Do Anything Now) prompt injection was widespread and a very disruptive force, as discussed here.

User prompts will be closely analyzed for the presence of the various DAN templates that have evolved over the past 18 months.   More significantly, the use of AI itself allows the Prompt solution to recognize zero-day bespoke prompts attempting to conduct mischief.   The interpretative powers of fine-tuned, purpose-built security inspection models are likely the only way to stay one step ahead of bad actors.

Another chief concern is protection of the system prompt, the guidelines that reel in unwanted behavior of the offered LLM service, what instructed our LLM earlier in its role as a shoe sales assistant.   The system prompt, if somehow manipulated, would be a significant breach in AI security, havoc could be created with an LLM directed astray.   As such, Prompt Security offers a policy to compare the user provided prompt, the configured system prompt in the API call, and the response generated by the LLM.   In the event that a similarity threshold with the system prompt is exceeded in the other fields, the transaction can be immediately blocked.

An interesting advanced safeguard is the support for a “canary” word - a specific value that a well behaved LLM should never present in any response, ever.   The detection of the canary word by the Prompt solution will raise an immediate alert.

One particularly broad and powerful feature in the AI firewall is the ability to find secrets, meaning tokens or passwords, frequently for cloud-hosted services, that are revealed within user prompts.   Prompt Security offers the ability to scour LLM traffic for in excess of 200 meaningful values.   Just as a small representative sample of the industry’s breadth of secrets, these can all be detected and acted upon:

  • Azure Storage Keys Detector
  • Artifactory Detector
  • Databricks API tokens
  • GitLab credentials
  • NYTimes Access Tokens
  • Atlassian API Tokens

Besides simple blocking, a useful redaction option can be chosen.   Rather than risk compromise of credentials and obfuscated value will instead be seen at the LLM.

F5 Positive Security Models for AI Endpoints

The AI traffic delivered and received from Prompt Security’s AI firewall is both discovered and subjected to API layer policies by the F5 load balancer.   Consider the token awareness features of the AI firewall, excessive token consumption can trigger an alert and even transaction blocking.   This behavior, a boon when LLMs like the OpenAI premium GPT-4 models may have substantial costs, allows organizations to automatically shut down a malicious actor who illegitimately got hold of an OPENAI_API key value and bombarded the LLM with prompts.   This is often referred to as a “Denial of Wallet” situation.

F5 Distributed Cloud, with its focus upon the API layer, has congruent safeguards.   Each unique user of an API service is tracked to monitor transactional consumption.   By setting safeguards for API rate limiting, an excessive load placed upon the API endpoint will result in HTTP 429 “Too Many Request” in response to abusive behavior.

A key feature of F5 API Security is the fact that it is actionable in both directions, and also an in-line offering, unlike some API solutions which reside out of band and consume proxy logs for reporting and threat detection.   With the automatic discovery of API endpoints, as seen in the following screenshot, the F5 administrator can see the full URL path which in this case exercises the familiar OpenAI /v1/chat/completions endpoint.

As highlighted by the arrow, the schema of traffic to API endpoints is fully downloadable as an OpenAPI Specification (OAS), formerly known as a Swagger file.   This layer of security means fields in API headers and bodies can be validated for syntax, such that a field whose schema expects a floating-point number can see any different encoding, such as a string, blocked in real-time in either direction.

A possible and valuable use case:  allow an initial unfettered access to a service such as OpenAI, by means of Prompt Security’s AI firewall service, for a matter of perhaps 48 hours.   After a baseline of API endpoints has been observed, the API definition can be loaded from any saved Swagger files at the end of this “observation” period.  The loaded version can be fully pruned of undesirable or disallowed endpoints, all future traffic must conform or be dropped.

This is an example of a “positive security model”, considered a gold standard by many risk-adverse organizations.   Simply put, a positive security model allows what has been agreed upon through and rejects everything else.   This ability to learn and review your own traffic, and then only present Prompt Security with LLM endpoints that an organization wants exposed is an interesting example of complementing an AI security solution with rich API layer features.

Summary

The world of AI and LLMs is rapidly seeing investment, in time and money, from virtually all economic sectors; the promise of rapid dividends in the knowledge economy is hard to resist.   As with any rapid deployment of new technology, safe consumption is not guaranteed, and it is not built in.   Although LLMs often suggest guardrails are baked into offerings, a 30-second search of the Internet will expose firsthand experiences where unexpected outcomes when invoking AI are real.  Brand reputation is at stake and false information can be hallucinated or coerced out of LLMs by determined parties.

By combining the ability to ingest globally at high-speed dispersed users and apply a first level of security protections, F5 Distributed Cloud can be leveraged as an onboarding for LLM workloads.   As depicted in this article, Prompt Security can in turn handle traffic egressing F5’s distributed HTTPS load balancers and provide state-of-the-art AI safeguards, including sensitive data detection, content moderation and other OWASP-aligned mechanisms like jailbreak and prompt injection mitigation.   Other deployment models exist, including deploying Prompt Security’s solution on-premises, self-hosted in cloud tenants, and running the solution on Distributed Cloud CE nodes themselves is supported.

Updated Jun 24, 2024
Version 2.0
No CommentsBe the first to comment