BIG-IP and OPSWAT Together Implement DLP for AI Traffic

For many years, through leveraging ICAP support, various malware and data loss prevention (DLP) solutions have infused the BIG-IP application delivery controller with additional, valuable tiers of security. A typical use case would be for inbound enterprise traffic to have payloads scanned by anti-virus (AV) signature-based solutions adjacent to BIG-IP, thereby reducing the risk of employee computers being exposed to rogue content. Another common security use case has been to scan outbound traffic, frequently looking for disclosure of sensitive, potentially personally identifiable information (PII) values that enterprise policy should prevent. Another risk is unwanted, possibly inadvertent, release of enterprise intellectual property. In these scenarios, a DLP solution adjacent to BIG-IP can address the concerns of data exfiltration.

With the rise of ChatGPT from OpenAI into day-to-day activities of enterprise workforces, a renewed examination of DLP for BIG-IP was conducted. In this article, we evaluate complementing a BIG-IP, configured in transparent forward proxy mode, with the Proactive DLP module of OPWAT’s MetaDefender solution.

The Lab Setup and Test DLP Data

The BIG-IP was set up as a transparent forward proxy as this allows for a simple approach to intercept enterprise traffic as it egresses a location towards Internet services, such as a publicly hosted AI solution. In this article, OpenAI’s ChatGPT was selected as the AI component due to its wide embrace in the market. The ICAP protocol was harnessed, specifically ICAP Request Mode, to steer copies of traffic bound for OpenAI to be reviewed by OPSWAT MetaDefender. Should a DLP policy violation be detected, the ICAP protocol is used to immediately signal the BIG-IP to reject completion of the transaction request. Details around the setup specifics are provided towards the end of the article.

The objectives of the test were to expose both form-based textual traffic to ChatGPT, as well as file-based uploaded content, including hand-written documents, to exercise Optical Character Recognition (OCR) capabilities of the DLP service.
To leverage compliant but synthetic data samples, credit card numbers (CCNs), American Social Security Numbers (SSNs) and other fully artificial identities were used. In this case, the bulk of the test data came from dlptest.com/sample-data.

Block SSN and CCN Values In ChatGPT Submissions

A common starting point in DLP deployments is to ensure awareness and actionability exists for common sore spots, such as American Social Security numbers or any apparent references to a credit card number. The following demonstrates an operator attempting to do basic customer list creations with source data that includes CCN, and the failure, by design, to get the request through the F5 proxy.

Using Microsoft Edge as our browser, we can check developer tools to gain insight into how the proxy was able to reject the above request, which violated enterprise guidelines.

Within the OPSWAT MetaDefender GUI a running log is shown, including both clean traffic and inspected traffic having violations. We can see DLP violations, three were found in the above attempted use of ChatGPT, are distilled to one violation entry.

Clicking into the event, we see rich details a SecOps team member would expect. Notice that the actual credit card value is partially obfuscated to the operator by default.

Optical Character Recognition (OCR) for Sensitive Data in Images

In the past, methods to actively thwart DLP solutions included exfiltrating images, such as screenshots with something like Windows Snipping Tool, to send out sought-after data. Another approach could be simply handwriting sensitive data and sending pdf scans or images of such manual notes through a secured network. OCR combats this by interpreting the presence of sensitive data, regardless of how it is presented in an upload.

To evaluate this, the OPSWAT MetaDefender OCR feature was adjusted to “Best”, meaning additional time could be allotted to study images in depth. As well, the sensitivity gauge for possible violations was set to “low” making false positives a concern, but a preference existed towards erring on the side of finding any potential violations. An attempt to upload the following scribbled note, containing a CCN, to ChatGPT was made in order to receive a transcription, one of the more popular day-to-day use cases with generative AI.

As noted in Developer Tools in Edge, the file was not successfully uploaded to the chat interface and an HTTP 403 code was seen. The OPSWAT solution, as configured, was able to determine a potential violation due to a credit card value in the handwritten note. The confidence level was presented as low, suggesting a more detailed evaluation comparing true violations versus false positives might be a good idea for SecOps.

The same setting in OPSWAT MetaDefender was used for multiple workflows concurrently, specifically file processes (where archives such as zip files were analyzed) and file processes without archives. The Proactive DLP settings in question are highlighted in the following image.

Safeguard the Misuse of Costly Public Cloud Credentials

The power of generative AI is seen frequently is content creation, from interesting emails to internal memos and best-practice guideline creations. Consider the case of HR and IT support teams empowered with ChatGPT to prepare comprehensive and effective onboarding procedures for corporate new hires. To expedite on-boarding to enterprise cloud accounts, one might prefer a hands-on approach from support, not to generate documentation that reveals things like AWS or Azure credentials. This may be considered sensitive due to the large costs incurred if a third party were to learn of the values.
If IT support were to engage ChatGPT with the following request, using fictitious AWS credentials, the F5 and OPSWAT technology could easily block the requested content creation.

Please prepare a quick start set of instructions, one page maximum, explaining how our employees can configure AWS console access from a browser. Make sure to use are corporate access credentials provided below.
$ export AWS_ACCESS_KEY_ID = AKIAIOSFODNN7EXNNPLE
$ export AWS_SECRET_ACCESS_KEY=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
$ export AWS_DEFAULT_REGION=us-west-2

Within the OPSWAT MetaDefender console, we get a clear indication that the AWS parameters were the rationale for denying the transaction.

Extensibility – The Key to a Good DLP Solution

Due to the vast array of data that one enterprise might consider sensitive, and another might not, the ability to support a rich set of potential data types is critical. Consider a short representative list of potential concerns where actionability, through transaction blocking or field obfuscation, may be advantageous:

• International Banking Account Number (IBAN)
• Detect NSFW (Not Suitable for Work) content in text and images
• Add Watermark to documents, adjustable text position, and opacity
• Tax Information Number (TIN) and banking SWIFT Codes
• Medical and dental images following the DICOM standard

In the case of OPSWAT Proactive DLP the above, along with other DLP table stakes like full REGEX rules, are available. As one last demonstration, the option to detect and block United Kingdom National Health Service (NHS) numbers was enabled. The ability of ChatGPT to organize data and specifically present data with simple plain English filtering requests is a reasonable use case in many organizations. However, guardrails are easily implemented through BIG-IP and OPSWAT in the following example.

The above demonstrates the request was blocked from being sent to ChatGPT, based upon the UK NHS identifiers in the payload.

The DLP solution has declared with medium certainty the presence of the embargoed data type, and indicates the decision was made within eighty-eight milliseconds. As per a normal ICAP Request mode implementation, the F5 BIG-IP will accordingly block the transaction.

Brief Overview of the F5 and OPSWAT Proactive DLP Setup

The BIG-IP was configured as a transparent forward proxy. This involves setting up a forwarding virtual server for HTTPS, HTTP, and other remaining IP-based protocols. An internal virtual server is also configured to forward traffic to an OPSWAT MetaDefender ICAP server for inspection prior to conducting the transparent proxy forwarding motion. This simple setup means no proxy settings need be configured on client devices, such as through manual setting, a PAC file or frequently by Active Directory group policy object (GPO), none of this is required. Simply set the inside interface of the BIG-IP as your test PC’s default gateway and you are good to go.

The setup and steps required are documented here. Specifically, this article followed the “Overview: Configuring transparent forward proxy in inline mode", where the BIG-IP serves as a default gateway. Alternative instructions are provided on the same page where an existing router continues to be used and instead the transparent proxy is implemented by means of WCCP. In both cases, a simple access policy steers traffic to the forwarding virtual server; as such both the LTM (Local Traffic Manager) and APM (Access Policy Manager) modules were utilized.

OPSWAT MetaDefender Proactive DLP overview and settings can be found here. MetaDefender requires two modules, Core and ICAP Server, both of which can co-exist on a single Windows or Linux server platform. Here is the top of the Proactive DLP tab. The entire DLP setup can be maintained through settings found within this tab.

Summary

With the rise of Generative AI consumption in day-to-day business activities, the expectation exists that sensitive data will not be exposed, both in regulated industries and in any enterprise following sound security practices. Realistically, the actual end user may not be familiar with enterprise policies or focused on other top-of-mind tasks. As such, an enterprise-grade DLP solution, coupled with a BIG-IP and its high-rate TLS interception ability, can serve a renewed purpose to mitigate unwanted data leakage into AI offerings like OpenAI’s ChatGPT.

Interestingly, OWASP’s 2025 Top Ten list for LLMs, which ranks security concerns around large language models in a top-down order, has placed Sensitive Information Discloser at the second-highest spot. In this article, a F5 BIG-IP set as a transparent forward proxy was coupled through the ICAP protocol with a data loss prevention (DLP) offering from OPSWAT. Numerous examples of sensitive data were both identified and blocked prior to transmission to ChatGPT, thereby achieving a policy-driven and centrally managed network DLP approach to sound data security practices.

Published Dec 06, 2024

Version 1.0

security

Steve_Gorman

Employee

Joined October 25, 2021

View Profile

BIG-IP and OPSWAT Together Implement DLP for AI Traffic

The Lab Setup and Test DLP Data

Block SSN and CCN Values In ChatGPT Submissions

Optical Character Recognition (OCR) for Sensitive Data in Images

Safeguard the Misuse of Costly Public Cloud Credentials

Extensibility – The Key to a Good DLP Solution

Brief Overview of the F5 and OPSWAT Proactive DLP Setup

Summary

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS

The Lab Setup and Test DLP Data

Block SSN and CCN Values In ChatGPT Submissions

Optical Character Recognition (OCR) for Sensitive Data in Images

Safeguard the Misuse of Costly Public Cloud Credentials

Extensibility – The Key to a Good DLP Solution

Brief Overview of the F5 and OPSWAT Proactive DLP Setup

Summary

The Lab Setup and Test DLP Data

Block SSN and CCN Values In ChatGPT Submissions

Optical Character Recognition (OCR) for Sensitive Data in Images

Safeguard the Misuse of Costly Public Cloud Credentials

Extensibility – The Key to a Good DLP Solution

Brief Overview of the F5 and OPSWAT Proactive DLP Setup

Summary