How I did it - "Securing Nvidia Triton Inference Server with NGINX Plus Ingress Controller”
Welcome to another edition of "How I Did It" where we explore a variety of F5 and partner technologies and dive into the details of their implementation; well at least how I do it 😀.
In this installment, we step into the world of AI and Machine learning (ML) and take a look at how F5’s NGINX Plus Ingress Controller can provide secure and scalable external access to Nvidia’s Triton Inference Servers hosted on Kubernetes.
NVIDIA Triton Inference Server is a powerful tool for deploying machine learning models in production environments, specifically designed to run on Kubernetes. It plays a crucial role in AI and machine learning by serving as a platform for deploying trained models into production environments. It allows these models to process incoming data and generate predictions or insights in real-time. The platform supports various deep learning frameworks such as TensorFlow, PyTorch, and ONNX Runtime. Triton manages model deployment, scaling, and versioning, providing a seamless experience for developers and operators.
NGINX Plus Ingress Controller is architected to provide and manage external connectivity to applications running on Kubernetes. It enhances the standard Kubernetes Ingress capabilities by providing advanced features such as SSL/TLS termination, traffic throttling, and advanced routing based on request attributes. NGINX Plus can also integrate with external authentication services and provide enhanced security features like NGINX App Protect. The controller is designed to improve the performance, reliability, and security of applications deployed on Kubernetes, making it a popular choice for managing ingress traffic in production environments.
Deployment Overview
The self-guided demonstration provides a working example of how NGINX Plus Ingress Controller can provide secure external access -as well as load balancing- to a Kubernetes-hosted NVIDIA Triton Inference Server cluster (see below).
Demonstration resources and step-by-step guidance are available on GitHub. The repository is based on forks from both the NVIDIA Triton Inference Server repo and NGINX Plus Ingress Controller. The repo includes a Helm chart along with instructions for installing a scalable NVIDIA Triton Inference Server and NGINX+ Ingress Controller in an on-premises or cloud-based Kubernetes cluster.
In addition to the Triton Inference server and ingress controller, the repo deploys Promethues and Grafana, (along with the NGINX Plus dashboard) to aid in autoscaling and provide visibility into the Triton server service as well as monitoring of ingress/egress traffic (see below).
Check it Out
Want to get a feel for it before putting hands to keys? The video below provides a step-by-step walkthrough of configuring and deploying the demonstration environment.
Try it Out
Liked what you saw? If that's the case, (as I hope it was) try it out for yourself. The F5Devcentral repo includes nearly everything you need (with the exception of an NGINX Plus license and certificates) to deploy a fully functioning environment; all you need is a Kubernetes platform to deploy to host the deployment. With respect to licensing, if needed, F5 provides a fully functional 30-day trial license.
Additional Links
Triton-server & Nginx Plus Ingress Controller Demo Repo
Nvidia Triton Inference Server Overview
Nvidia Triton Inference Server Repos
NGINX Plus Ingress Controller Overview