Protect multi-cloud and Edge Generative AI applications with F5 Distributed Cloud

Introduction

The release of ChatGPT in 2022 saw Generative AI and Large Language Models (LLMs) move from a theoretical field of study to a driving force for an increasing number of real applications. Bloomberg is estimating the GenAI market to reach a size of $1.3 trillion in 2032, an explosive increase of over $40 billion in 2022. The same research points to the synergy between rolling out new GenAI applications and the ongoing move of workloads to the public clouds.

The public cloud providers (AWS, Google, and Microsoft) seem very well positioned to support the massive demand for computation power required by GenAI and there is already stiff competition between them to attract developers and enterprises by expanding their GenAI-supporting features. Customers wanting to leverage the best tool and functionalities from each cloud provider may end up deploying their applications in a distributed way, across multiple cloud providers.

This also has some drawbacks, the complexity of utilizing different environments makes it more difficult to find the diverse skills needed, the lack of unified visibility hinders operations and inconsistent policy enforcement can lead to potential security vulnerabilities.

Securing distributed GenAI workloads with F5 Distributed Cloud

F5’s response with Distributed Cloud is to simplify connectivity and security across clouds. It can serve both legacy and modern applications, ensuring a consistent SaaS experience. It abstracts away the application delivery and security layers from the underlying infrastructure, preventing vendor lock-in and facilitating workload migrations between the public cloud providers. It also seamlessly integrates with an extensive partner ecosystem, allowing 3rd party service insertion and avoiding its lock-in.

As a testament to the speed of development in this area, a new direction is already being explored: running GenAI at the Edge. This move is partially driven by the power consumption (and therefore cost) projected to be needed in case GenAI models will keep following the existing trend of being deployed mainly in data centers, see Tirias Research’s “Generative AI Breaks The Data Center Part 1 and 2”.

Generation latency, security, and privacy regulations might be other reasons to consider deploying GenAI models at the Edge, at least for inference and potentially fine-tuning while the training may remain on the cloud. For example, research papers like “An Overview on Generative AI at Scale with Edge-Cloud Computing” show some potential future directions for architecting GenAI applications.

Research has also been carried out on the environmental impact of GenAI, for example, “Reducing the Carbon Impact of Generative AI Inference (today and in 2035)”, one of the mitigation measures being the intelligent distribution of requests to improve the carbon footprint but also maintain user experience, by minimizing user-response latency.

Edge computing has the potential to offer low latency, custom security, privacy compliance, and better cost management. The downsides are similar to the multi-cloud scenario, where multi-vendor complexity is driving up the total cost of ownership and increasing time to market.

F5’s Distributed Cloud AppStack offers a fully integrated stack that enables a consistent deployment model (on-prem or public/private cloud), lowering TCO and shortening TTM.

F5 can protect LLMs wherever they are deployed. In a scenario where a private LLM needs to be protected, NGINX App Protect can provide API Security by enforcing the OpenAPI spec, ensuring that only the compliant requests are submitted to the LLM:

In a scenario where the LLM and the GenAI front-end application are being deployed in different locations (such as the case of a multi-cloud deployment), F5 Distributed Cloud can provide seamless connectivity across any environment (with its AppConnect feature) and also protect the connection with the WAF function:

Where "Inference at the Edge" is needed, either due to security, regulatory or latency concerns, F5 Distributed Cloud can easily provide an unified deployment environment, portable across different sites, that can also benefit from the full security stack available at the Regional Edge level.
For more information on various ways to deploy F5 Distributed Cloud and implementation examples (both manual through the SaaS console and automation), you can consult the “Deploy WAF on any Edge with F5 Distributed Cloud” DevCentral article.

For a demo on how NGINX App Protect and F5 Distributed Cloud MultiCloud Networking can secure GenAI workloads, including protection against OWASP’s Sensitive Information Disclosure (LLM06), you can check the following recording:

For more details on the step-by-step procedure to setup these demos through the Distributed Cloud console, as well as the corresponding automation scripts, you can check the "F5 Distributed Cloud Terraform Examples" GitHub repository.

As shown in these demos, F5 Distributed Cloud enables GenAI applications to be distributed across multiple public clouds (as well as on-prem and private clouds), seamlessly connecting their components with a unitary, single pane of glass, Multicloud Networking (MCN) solution.
F5 XC MCN solution employs Customer Edge sites as "portals" between different environments, allowing services from one environment to be exposed in the other. In the demo above, the LLM remote service from AWS/EKS is being advertised as local to GCP/GKE, to be used by the GenAI application. Since the service is exposed through an HTTP Loadbalancer XC object, a wide range of security features can be enabled for this service, helping secure the MCN connection.
F5 XC Secure MCN (S-MCN) is therefore a complete solution, connecting and securing multicloud and on-prem deployments, regardless of their location.

API Discovery and enforcement is one of the critical features of F5 Distributed Cloud in this context. Another would be API Rate Limiting, enabling protection against OWASP’s Model Denial of Service (LLM04). You can check the “Protect LLM applications against Model Denial of Service” for an implementation example.

To help accelerate LLM execution speeds, F5 Distributed Cloud can leverage GPU resources available on Distributed Cloud sites where such hardware is avalable and also supports Virtual GPU (vGPU) applications on Distributed Cloud VMware sites with NVIDIA Tesla T4 vGPU software.

Conclusion

F5 Distributed Cloud capabilities allow customers to use a single platform for connectivity, application delivery, and security of GenAI applications in any cloud location and at the Edge, with a consistent and simplified operational model, a game changer for streamlined operational experience for DevOps, NetOps, and SecOps.