NGINX Plus
39 TopicsF5 NGINX Plus R33 Licensing and Usage Reporting
Beginning with F5 NGINX Plus version R33, all customers are required to deploy a JSON Web Token (JWT) license for each commercial instance of NGINX Plus. Each instance is responsible for validating its own license status. Furthermore, NGINX Plus will report usage either to the F5 NGINX licensing endpoint or to the F5 NGINX Instance Manager for customers who are connected. For those customers who are disconnected or operate in an air-gapped environment, usage can be reported directly to the F5 NGINX Instance Manager. To learn more about the latest features of NGINX R33, please check out the recent blog post. Install or Upgrade NGINX Plus R33 To successfully upgrade to NGINX Plus R33 or perform a fresh installation, begin by downloading the JWT license from your F5 account. Once you have the license, place it in the F5 NGINX directory before proceeding with the upgrade. For a fresh installation, after completing the installation, also place the JWT license in the NGINX directory. For further details, please refer to the provided instructions. This video provides a step-by-step guide on installing or upgrading to NGINX Plus R33. Report Usage to F5 in Connected Environment To effectively report usage data to F5 within a connected environment using NGINX Instance Manager, it's important to ensure that port 443 is open. The default configuration directs the usage endpoint to send reports directly to the F5 licensing endpoint at product.connect.nginx.com. By default, usage reporting is enabled, and it's crucial to successfully send at least one report on installation for NGINX to process traffic. However, you can postpone the initial reporting requirement by turning off the directive in your NGINX configuration. This allows NGINX Plus to handle traffic without immediate reporting during a designated grace period. To configure usage reporting to F5 using NGINX Instance Manager, update the usage endpoint to reflect the fully qualified domain name (FQDN) of the NGINX Instance Manager. For further details, please refer to the provided instructions. This video shows how to report usage in the connected environment using NGINX Instance Manager. Report Usage to F5 in Disconnected Environment using NGINX Instance Manager In a disconnected environment without an internet connection, you need to take certain steps before submitting usage data to F5. First, in NGINX Plus R33, update the `usage report` directive within the management block of your NGINX configuration to point to your NGINX Instance Manager host. Ensure that your NGINX R33 instances can access the NGINX Instance Manager by setting up the necessary DNS entries. Next, in the NMS configuration in NGINX Instance Manager, modify the ‘mode of operation’ to disconnected, save the file, and restart NGINX Instance Manager. There are multiple methods available for adding a license and submitting the initial usage report in this disconnected environment. You can use a Bash script, REST API, or the web interface. For detailed instructions on each method, please refer to the documentation. This video shows how to report usage in disconnected environments using NGINX Instance Manager. Conclusion The transition to NGINX Plus R33 introduces important enhancements in licensing and usage reporting that can greatly improve your management of NGINX instances. With the implementation of JSON Web Tokens (JWT), you can validate your subscription and report telemetry data more effectively. To ensure compliance and optimize performance, it’s crucial to understand the best practices for usage reporting, regardless of whether you are operating in a connected or disconnected environment. Get started today with a 30-day trial, and contact us if you have any questions. Resources NGINX support documentation Blog announcementproviding a comprehensive summary of the new features in this release.80Views1like0CommentsF5 NGINX Plus R33 Release Now Available
We’re excited to announce the availability of NGINX Plus Release 33 (R33). The release introduces major changes to NGINX licensing, support for post quantum cryptography, initial support for QuickJS runtime in NGINX JavaScript and a lot more.391Views1like0CommentsUpcoming Action Required: F5 NGINX Plus R33 Release and Licensing Update
Hello community! The upcoming release of NGINX Plus R33 is scheduled for this quarter. This release brings changes to our licensing process, aligning it with industry best practices and the rest of the F5 licensing programs. These updates are designed to better serve our commercial customers by providing improved visibility into usage, streamlined license tracking, and enhanced customer service. Key Changes in NGINX Plus R33 Release: Q4, 2024 New Requirement: All commercial NGINX Plus instances will now require the placement of a JSON Web Token (JWT). This JWT file can be downloaded from your MyF5 account. License Validation: NGINX Plus instances will regularly validate their license status with the F5 licensing endpoint for connected customers. Offline environments can manage this through the NGINX Instance Manager. Usage Reporting: NGINX Plus R33 introduces a new requirement for commercial product usage reporting. NGINX's adoption of F5's standardized approach ensures easier and more precise license and usage tracking. Once our customers are utilizing R33 together with NGINX’s management options, tasks such as usage reporting and renewals will be much more streamlined and straightforward. Additionally, NGINX instance visibility and management will be much easier. Action Required To ensure a smooth transition and uninterrupted service, please take the following steps: Install the JWT: Make sure to install the JWT on all your commercial NGINX Plus instances. This is crucial to avoid any interruptions. Additional Steps: Refer to our detailed guide for any other necessary steps.See here for additional required next steps. IMPORTANT: Failure to followthese steps will result in NGINX Plus R33 and subsequent release instances not functioning. Critical Notes JWT Requirement: JWT files are essential for the startup of NGINX Plus R33. NGINX Ingress Controller: Users of NGINX Ingress Controller should not upgrade to NGINX Plus R33 until the next version of the Ingress Controller is released. No Changes for Earlier Versions: If you are using a version of NGINX Plus prior to R33, no action is required. Resources We are preparing a range of resources to help you through this transition: Support Documentation: Comprehensive support documentation will be available upon the release of NGINX Plus R33. Demonstration Videos: We will also provide demonstration videos to guide you through the new processes upon the release of NGINX Plus R33. NGINX Documentation: For more detailed information, visit our NGINX documentation. Need Assistance? If you have any questions or concerns, please do not hesitate to reach out: F5 Representative: Contact your dedicated representative for personalized support. MyF5 Account: Support is readily available through your MyF5 account. Stay tuned for more updates. Thank you for your continued partnership.848Views0likes0CommentsNGINX Virtual Machine Building with cloud-init
Traditionally, building new servers was a manual process. A system administrator had a run book with all the steps required and would perform each task one by one. If the admin had multiple servers to build the same steps were repeated over and over. All public cloud compute platforms provide an automation tool called cloud-init that makes it easy to automate configuration tasks while a new VM instance is being launched. In this article, you will learn how to automate the process of building out a new NGINX Plus server usingcloud-init.470Views3likes4CommentsSecuring and Scaling Hybrid Apps with F5/NGINX (Part 3)
In part 2 of our series, I demonstrated how to configure ZT (Zero Trust) use cases centering around authentication with NGINX Plus in hybrid environments. We deployed NGINX Plus as the external LB to route and authenticate users connecting to my Kubernetes applications. In this article, we explore other areas of the ZT spectrum configurable on the External LB Service, including: Authorization and Access Encryption mTLS Monitoring/Auditing ZT Use case #1: Authorization Many people think that authentication and authorization can be used interchangeably. However, they both mean different things. Authentication involves the process of verifying user identities based on the credentials presented. Even though authenticated users are verified by the system, they do not necessarily have the authority to access protected applications. That is where authorization comes into play. Authorization involves the process of verifying the authority of an identity before granting access to application. Authorization in the context of OIDC authentication involves retrieving claims from user ID tokens and setting conditions to validate whether the user is authorized to enter the system. An authenticated user is granted an ID token from the IdP with specific user information through JWT claims. The configuration of these claims is typically set from the IdP. Revisiting the OIDC auth use case configured in the previous section, we can retrieve the ID tokens of authenticated users from the NGINX key-value store. $ curl -i http://localhost:8010/api/9/http/keyvals/oidc_acess_tokens Then we can view the decoded value of the ID token using jwt.io. Below is an example of decoded payload data from the ID token. { "exp": 1716219261, "iat": 1716219201, "admin": true, "name": "Micash", "zone_info": "America/Los_Angeles" "jti": "9f8ff4bd-4857-4e12-9634-e5876f786f98", "iss": "http://idp.f5lab.com:8080/auth/realms/master", "aud": "account", "typ": "Bearer", "azp": "appworld2024", "nonce": "gMNK3tu06j6tp5-jGa3aRhkj4F0P-Z3e04UfcFeqbes" } NGINX Plus has access to these claims as embedded variables. They are accessed by prefixing $jwt_claim_ to the desired field (for example, $jwt_claim_admin for the admin claim). We can easily set conditions on these claims and block unauthorized users before they even reach the back-end applications. Going back to our frontend.conf file in the previous part of our series. We can set $jwt_flag variable to 0 or 1 based on the value of the admin JWT claim. We then use the jwt_claim_require directive to validate the ID token. ID tokens with admin claims set to false will be rejected. map $jwt_claim_admin $jwt_status { "true" 1; default 0; } server { include conf.d/openid_connect.server_conf; # Authorization code flow and Relying Party processing error_log /var/log/nginx/error.log debug; # Reduce severity level as required listen [::]:443 ssl ipv6only=on; listen 443 ssl; server_name example.work.gd; ssl_certificate /etc/ssl/nginx/default.crt; # self-signed for example only ssl_certificate_key /etc/ssl/nginx/default.key; location / { # This site is protected with OpenID Connect auth_jwt "" token=$session_jwt; error_page 401 = @do_oidc_flow; auth_jwt_key_request /_jwks_uri; # Enable when using URL auth_jwt_require $jwt_status; proxy_pass https://cluster1-https; # The backend site/app } } Note: Authorization with NGINX Plus is not restricted to only JWT tokens. You can technically set conditions on a variety of attributes, such as: Session cookies HTTP headers Source/Destination IP addresses ZT use case #2: Mutual TLS Authentication (mTLS) When it comes to ZT, mTLS is one of the mainstream use cases falling under the Zero Trust umbrella. For example, enterprises are using Service Mesh technologies to stay compliant with ZT standards. This is because Service Mesh technologies aim to secure service to service communication using mTLS. In many ways, mTLS is similar to the OIDC use case we implemented in the previous section. Only here, we are leveraging digital certificates to encrypt and authenticate traffic. This underlying framework is defined by PKI (Public Key Infrastructure). To explain this framework in simple terms we can refer to a simple example; the driver's license you carry in your wallet. Your driver’s license can be used to validate your identity, the same way digital certificates can be used to validate the identity of applications. Similarly, only the state can issue valid driver's licenses, the same way only Certificate Authorities (CAs) can issue valid certificates to applications. It is also important that only the state can issue valid certificates. Therefore, every CA must have a private secure key to sign and issue valid certificates. Configuring mTLS with NGINX can be broken down in two parts: Ingress mTLS; Securing SSL client traffic and validating client certificates against a trusted CA. Egress mTLS; securing SSL upstream traffic and offloading authentication of TLS material to a trusted HTTPS back-end server. Ingress mTLS You can configure ingress mTLS on the NLK deployment by simply referencing the trusted certificate authority adding the ssl_client_certificate directive in the server context. This will configure NGINX to validate client certificates with the referenced CA. Note: If you do not have a CA, you can create one using OpenSSL or Cloudflare PKI and TLS toolkits server { listen 443 ssl; status_zone https://cafe.example.com; server_name cafe.example.com; ssl_certificate /etc/ssl/nginx/default.crt; ssl_certificate_key /etc/ssl/nginx/default.key; ssl_client_certificate /etc/ssl/ca.crt; } Egress mTLS Egress mTLS is a slight alternative to ingress mTLS where NGINX verifies certificates of upstream applications rather than certificates originating from clients. This feature can be enabled by adding the proxy_ssl_trusted_certificate directive to the server context. You can reference the same trusted CA we used for verification when configuring ingress mTLS or reference a different CA. In addition to verifying server certificates, NGINX as a reverse-proxy can pass over certs/keys and offload verification to HTTPS upstream applications. This can be done by adding the proxy_ssl_certificate and proxy_ssl_certificate_key directives in the server context. server { listen 443 ssl; status_zone https://cafe.example.com; server_name cafe.example.com; ssl_certificate /etc/ssl/nginx/default.crt; ssl_certificate_key /etc/ssl/nginx/default.key; #Ingress mTLS ssl_client_certificate /etc/ssl/ca.crt; #Egress mTLS proxy_ssl_certificate /etc/nginx/secrets/default-egress.crt; proxy_ssl_certificate_key /etc/nginx/secrets/default-egress.key; proxy_ssl_trusted_certificate /etc/nginx/secrets/default-egress-ca.crt; } ZT use case #3: Secure Assertion Markup Language (SAML) SAML (Security Assertion Markup Language) is an alternative SSO solution to OIDC. Many organizations may choose between SAML and OIDC depending on requirements and IdPs they currently run in production. SAML requires a SP (Service Provider) to exchange XML messages via HTTP POST binding to a SAML IdP. Once exchanges between the SP and IdP are successful, the user will have session access to the protected backed applications with one set of user credentials. In this section, we will configure NGINX Plus as the SP and enable SAML with the IdP. This will be like how we configured NGINX Plus as the relying party in an OIDC authorization code flow (See ZT Use case #1). Setting up the IdP The one prerequisite is setting up your IdP. In our example, we will set up the Microsoft Entra ID on Azure. You can use the SAML IdP of your choosing. Once the SAML application is created in your IdP, you can access the SSO fields necessary to link your SP (NGINX Plus) to your IdP (Microsoft Entra ID). You will need to edit the basic SAML configuration by clicking on the pencil icon next to Editin Basic SAML Configuration, as seen in the figure above. Add the following values and click Save: Identifier (Entity ID) -- https://fourth.run.place Reply URL (Assertion Consumer Service URL) -- https://fourth.run.place/saml/acs Sign on URL: https://fourth.run.place Logout URL (Optional): https://fourth.run.place/saml/sls Finally download the Certificate (Raw) from Microsoft Entra ID and save it to your NGINX Plus instance. This certificate is used to verify signed SAML assertions received from the IdP. Once the certificate is saved on the NGINX Plus instance, extract the public key from the downloaded certificate and convert it to SPKI format. We will use this certificate later when we configure NGINX Plus in the next section. $ openssl x509 -in demo-nginx.der -outform DER -out demo-nginx.der $ openssl x509 -inform DER -in demo-nginx.der -pubkey -noout > demo-nginx.spki Configuring NGINX Plus as the SAML Service Provider After the IdP is setup, we can configure NGINX Plus as the SP to exchange and validate XML messages with the IdP. Once logged into the NGINX Plus instance, simply clone the nginx SAML GitHub repo. $ git clone https://github.com/nginxinc/nginx-saml.git && cd nginx-saml Copy the config files into the /etc/nginx/conf.d directory. $ cp frontend.conf saml_sp.js saml_sp.server_conf saml_sp_configuration.conf /etc/nginx/conf.d/ Notice that by default, frontend.conf listens on port 8010 with clear text http. You can merge kube_lb.conf into frontend.conf to enable TLS termination and update the upstream context with application endpoints you wish to protect with SAML. Finally we will need to edit the saml_sp_configuration.conf file and update variables in the map context based on the parameters of your SP and IdP: $saml_sp_entity_id; https://fourth.run.place $saml_sp_acs_url; https://fourth.run.place/saml/acs $saml_sp_sign_authn; false $saml_sp_want_signed_response; false $saml_sp_want_signed_assertion; true $saml_sp_want_encrypted_assertion; false $saml_idp_entity_id; Unique identifier that identifies the IdP to the SP. This field is retrieved from your IdP $saml_idp_sso_url; This is the login URL and is also retrieved from the IdP $saml_idp_verification_certificate; Variable referencing the certificate downloaded from the previous section when setting up the IdP. This certificate will verify signed assertions received from the IdP. Use the full directory (/etc/nginx/conf.d/demo-nginx.spki) $saml_sp_slo_url; https://fourth.run.place/saml/sls $saml_idp_slo_url; This is the logout URL retrieved from the IdP $saml_sp_want_signed_slo; true The remaining variables defined in saml_sp_configuration.conf can be left unchanged, unless there is a specific requirement for enabling them. Once the variables are set appropriately, we can reload NGINX Plus. $ nginx -s reload Testing Now we will verify the SAML flow. open your browser and enter https://fourth.run.place in the address bar. This should redirect me to the IDP login page. Once you login with your credentials, I should be granted access to my protected application ZT use case #4: Monitoring/Auditing NGINX logs/metrics can be exported to a variety of 3rd party providers including: Splunk, Prometheus/Grafana, cloud providers (AWS CloudWatch and Azure Monitor Logs), Datadog, ELK stack, and more. You can monitor NGINX metrics and logs natively with NGINX Instance Manager or NGINX SaaS. The NGINX Plus API provides me a lot of flexibility by exporting metrics to any third-party tool that accepts JSON. For example, you can export NGINX Plus API metrics to our native real-time dashboard from part 1. native real-time dashboard from part 1 Whichever tool I chose, monitoring/auditing my data generated from my IT systems is key to understanding and optimizing my applications. Conclusion Cloud providers offer a convenient way to expose Kubernetes Services to the internet. Simply create Kubernetes Service of type: LoadBalancer and external users connect to your services via public entry point. However, cloud load balancers do nothing more than basic TCP/HTTP load balancing. You can configure NGINX Plus with many Zero Trust capabilities as you scale out your environment to multiple clusters in different regions, which is what we will cover in the next part of our series.223Views2likes0CommentsSecuring and Scaling Hybrid apps with F5 NGINX (Part 2)
If you attended a cybersecurity trade-show lately, you may have noticed the term “Zero Trust (ZT)” advertised on almost every booth. It may seem like most security companies are offering the same value proposition: ‘Securing apps with ZT’. Its commonality stems from the fact that ZT is a broad term that can span endless use cases. ZT is not a feature or capability, rather a philosophy embraced by IT security leaders based on the idea that all traffic entering and exiting a system is not trusted and must be scrutinized before passing through. Organizations are shifting to a zero-trust mindset due to the increased complexity of cyber-attacks. Perimeter based firewalls are no longer sufficient in securing digital resources. In Part 1 of our series, we configured NGINX Plus as an external load balancer to route and terminate TLS traffic to cluster nodes. In this part of the series, we leverage the same NGINX Plus deployment to enable ZT use cases that will improve the security posture of your hybrid applications. NOTE: Part 1 of the series is a prerequisite for enabling the ZT use cases in our examples. Please ensure that part 1 is completed before starting with part 2 Part 1 of the series ZT Use case #1: OIDC Authentication OIDC (OpenID Connect) is an authentication layer on top of the OAuth 2.0 framework. Many organizations will choose OIDC to authenticate digital identities and enable SSO (Single-Sign-On) for consumer applications. With Single-Sign-on technologies, users gain access to multiple applications with one set of user credentials by authenticating their identities through an IdP (Identity Provider). I can configure NGINX Plus to operate as an OIDC relaying party to exchange and validate ID tokens with the IdP (Identity Provider), in addition to basic reverse-proxy load balancing configured in part 1. I will extend the architecture in part 1 with an IdP and configure NGINX Plus as the identity aware proxy. Prerequisites for NGINX Plus Before configuring NGINX Plus as the OIDC identity aware proxy: 1. Installation of NJS is required. $ sudo apt-get install nginx-plus-module-njs 2. Load the NJS module into the NGINX configuration by adding the following line at the top of yournginx.conf file. load_module modules/ngx_http_js_module.so; 3. Clone the OIDC GitHub repository in your directory of choice cd /home/ubuntu && git clone --branch R28 https://github.com/nginxinc/nginx-openid-connect.git Setting up the IdP The IdP will manage and store digital identities to mitigate attackers from impersonating users to steal sensitive information. There are many IdP vendors to choose from; Okta, Ping Identity, Azure AD. We chose Okta as the IdP in our example moving forward. If you do not have access to an IdP, you can quickly get started with the Okta Command Line Interface (CLI) and run the okta register command to sign up for a new account. Once account creation is successful, we will use the Okta CLI to preconfigure Okta as the IdP, creating what Okta calls an app integration. Other IdPs will have different nomenclatures defining an application integration. For example, Azure AD will call them App registrations. If you are not using Okta, you can follow the documentation of the IdP you are using and skip to the next section (Configuring NGINX as the OpenID Connect relying party). 1. Run the okta login command to authenticate the Okta Cli with your Okta developer account. Enter your Okta domain and API token at the prompts $ okta login Okta Org URL: https://your-okta-domain Okta API token: your-api-token 2. Create the app integration $ okta apps create --app-name=mywebapp --redirect-uri=https://<nginx-plus-hostname>:443/_codexch where --app-name defines the application name (here, mywebapp) --redirect-uri defines the URI to which sign-ins are redirected to the NGINX Plus. <nginx-plus-hostname> should resolve to the NGINX Plus external IP configured in part 1. We use port 443 since TLS termination is configured on NGINX Plus from part 1. Recall we used self-signed certificates and keys to configure TLS on NGINX Plus. In a production environment, we recommend using certs/keys issued by a trusted certificate authority such as Let’s Encrypt. Once the command from step #2 is completed, the client ID and Secret generated from the app integration can be found in ${HOME}/.okta.env Configuring NGINX as the OpenID Connect relying party Now that we have finished setting up our IdP, we can now start configuring NGINX Plus as the OpenID Connect relying party. Once logged into the NGINX Plus instance, simply run the configuration script from your home directory. $ ./nginx-openid-connect/configure.sh -h <nginx-plus-hostname> -k request -i <YOURCLIENTID> -s <YOURCLIENTSECRET> –xhttps://dev-xxxxxxx.okta.com/.well-known/openid-configuration where -h defines the hostname of NGINX Plus -k defines how NGINX will retrieve JWK files to validate JWT signatures. The JWK file is retrieved from a subrequest to the IdP -i defines the Client ID generated from the IdP -s defines the Client Secret generated from the IdP -x defines the URL of the OpenID configuration endpoint. Using Okta as the example, the URL starts with your Okta organization domain, followed by the path URI /.well-known/openid-configuration The configure script will generate OIDC config files for NGINX Plus. We will copy the generated config files into the /etc/nginx/conf.d directory from part 1. $ sudo cp frontend.conf openid_connect.js openid_connect.server_conf openid_connect_configuration.conf /etc/nginx/conf.d/ You will notice by default that frontend.conf listens on port 8010 with clear text http. We need to merge kube_lb.conf into frontend.conf to enable both use cases from part 1 and 2. The resulting frontend.conf should look something like this: https://gist.github.com/nginx-gists/af067326734063da6a4ff42146873262 Finally, I will need to edit the openid_connect_configuration.conf file and modify my client secret to the one generated by my Okta IdP. Reload NGINX Plus for the new config to take effect. $ nginx -s reload Testing the Environment Now we are ready to test our environment in action. To summarize, we set up an IdP and configured NGINX Plus as the identity aware proxy to validate user ID tokens before the entering the Kubernetes cluster. To test the environment, we will open a browser and enter the hostname of NGINX Plus into the address field. You should be redirected to your IdP login page. Note: The host name should resolve to the Public IP of the NGINX Plus machine. Once you are prompted with the IdP login page from your browser, you can access the Kubernetes pods once the user credentials are validated. User credentials should be defined from the IdP. Once you are logged into your application, the ID token of the authenticated is stored in the NGINX Plus Key-Value Store. Enabling PKCE with OIDC In the previous section, we learned how to configure NGINX Plus as the OIDC relying party to authenticate user identities attempting connections to protected Kubernetes applications. However, there are few cases where attackers can intercept code exchange requests issued from the IdP and hijack your ID tokens and gain access to your sensitive applications. PKCE is an extension of the OIDC Authorization code flow designed to protect against authorization code interception and theft. PKCE provides an extra layer of security where the attacker will need to provide a code verifier in addition to the authorization code in exchange for the ID token from the IdP. In our current setup, NGINX Plus will send a random generated PKCE code verifier as a query parameter when redirecting users to the IdP login page. The IdP will use this PKCE code verifier as extra validation when the authorization code is exchanged for the ID token. PKCE needs to be enabled from both NGINX Plus and the IdP. To enable PKCE verification on NGINX Plus, edit the openid_connect_configuration.conf file and modify $oidc_pkce_enable to 1 and reload NGINX Plus. Depending on the IdP you are using, a checkbox should be available to enable PKCE. Testing the Environment To test that PKCE is working, we will open a browser and enter the NGINX Plus host name once again. You should be redirected to the login page, only this time you will notice the URL had slightly changed with additional query parameters: code_challenge_method – Method used to hash the plain code verifier (most likely SHA256) code_challenge – The hashed value of the plain code verifier NGINX Plus will provide this plain code verifier along with the authorization code in exchange for the ID token. NGINX Plus will then validate the ID token and store it in cache. Extending OIDC with 3rd party Systems Customers may need to integrate their OIDC workflow with proprietary Auth/Auth systems already in production. For example, additional metadata pertaining to a user may need to be collected from an external Redis Cache or JFrog Artifactory. We can fill this gap by extending our diagram from the previous section. In addition to token validation with NGINX Plus, I pull response data from JFrog Artifactory and pass it to the backend applications once users are authenticated. Note: I am using JFrog Artifactory as an example 3rd party endpoint here. I can technically use any endpoint I want. Testing the Environment To test our environment in action, I will make a few updates to my NGINX OIDC configuration. You can pull our updated GitHub repository and use it as reference for updating your NGINX configuration. Update #1: Adding the njs source code The first update is extending NGINX Plus with njs and retrieving response data from our 3rd party system. Add the KvOperations.js source file in your /etc/nginx/njs directory. Update #2: frontend.conf I am adding lines 37-39 to frontend.conf so that NGINX Plus will initiate the sub-request to our 3rd party system after users are authenticated with OIDC. We are setting the URI of this sub-request to /kvtest. More on this on the next update. Update #3: openid_connect.server_conf We are adding lines 35-48 to openid_connect.server_conf consisting of two internal redirects in NGINX: /kvtest; Internal redirect from sub-requests with URI /kvtest will functions in KvOperations.js /auth; Internal redirect for sub-requests with URI /authwill be proxied to the 3rd party endpoint. You can replace the <artifactory-url> in line 47 with your own endpoints Update #4: openid_connect_configuration.conf This update is optional and applies when passing dynamic variables to your 3rd party endpoints. You can dynamically update variables on the fly by sending POST requests to the NGINX Plus Key-Value store. $ curl -iX POST -d '{"admin", "<input-value>"}' http://localhost:9000/api/7/http/<keyval-zone-name> We are defining/instantiating the new Key-Value store in lines 102-104.Once the updates are complete, I can test the optimized OIDC environment by troubleshooting/verifying the application is on the receiving end of my dynamic input values. Wrapping it up I covered a subset of ZT use cases with NGINX in hybrid architectures. The use cases presented in this article center around authentication. In the next part of our series, I will cover more ZT use case that include: Alternative authentication methods (SAML) Encryption Authorization and Access Control Monitoring/Auditing310Views0likes0CommentsScalable AI Deployment: Harnessing OpenVINO and NGINX Plus for Efficient Inference
Introduction In the realm of artificial intelligence (AI) and machine learning (ML), the need for scalable and efficient AI inference solutions is paramount. As organizations deploy increasingly complex AI models to solve real-world problems, ensuring that these models can handle high volumes of inference requests becomes critical. NGINX Plus serves as a powerful ally in managing incoming traffic efficiently. As a high-performance web server and reverse proxy server, NGINX Plus is adept at load balancing and routing incoming HTTP and TCP traffic across multiple instances of AI model serving environments. The OpenVINO Model Server, powered by Intel's OpenVINO toolkit, is a versatile inference server supporting various deep learning frameworks and hardware acceleration technologies. It allows developers to deploy and serve AI models efficiently, optimizing performance and resource utilization. When combined with NGINX Plus capabilities, developers can create resilient and scalable AI inference solutions capable of handling high loads and ensuring high availability. Health checks allow NGINX Plus to continuously monitor the health of the upstream OVMS instances. If an OVMS instance becomes unhealthy or unresponsive, NGINX Plus can automatically route traffic away from it, ensuring that inference requests are processed only by healthy OVMS instances. Health checks provide real-time insights into the health status of OVMS instances. Administrators can monitor key metrics such as response time, error rate, and availability, allowing them to identify and address issues proactively before they impact service performance. In this article, we'll delve into the symbiotic relationship between the OpenVINO Model Server, and NGINX Plus to construct a robust and scalable AI inference solution. We'll explore setting up the environment, configuring the model server, harnessing NGINX Plus for load balancing, and conducting testing. By the end, readers will gain insights into how to leverage Docker, the OpenVINO Model Server, and NGINX Plus to build scalable AI inference systems tailored to their specific needs. Flow explanation: Now, let's walk through the flow of a typical inference request. When a user submits an image of a zebra for inference, the request first hits the NGINX load balancer. The load balancer then forwards the request to one of the available OpenVINO Model Server containers, distributing the workload evenly across multiple containers. The selected container processes the image using the optimized deep-learning model and returns the inference results to the user. In this case, the object is named zebra. OpenVINO™ Model Server is a scalable, high-performance solution for serving machine learning models optimized for Intel® architectures. The server provides an inference service via gRPC, REST API, or C API -- making it easy to deploy new algorithms and AI experiments. You can visit https://hub.docker.com/u/openvino for reference. Setting up: We'll begin by deploying model servers within containers. For this use case, I'm deploying the model server on a virtual machine (VM). Let's outline the steps to accomplish this: Get the docker image for OpenVINO ONNX run time docker pull openvino/onnxruntime_ep_ubuntu20 You can also visit https://docs.openvino.ai/nightly/ovms_docs_deploying_server.html for OpenVINO model server deployment in a container environment. Begin by creating a docker-compose file following the structure below: https://raw.githubusercontent.com/f5businessdevelopment/F5openVino/main/docker-compose.yml version: '3' services: resnet1: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9001 volumes: - ./models:/models ports: - "9001:9001" resnet2: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9002 volumes: - ./models:/models ports: - "9002:9002" # Add more services for additional containers resnet3: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9003 volumes: - ./models:/models ports: - "9003:9003" resnet4: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9004 volumes: - ./models:/models ports: - "9004:9004" resnet5: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9005 volumes: - ./models:/models ports: - "9005:9005" resnet6: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9006 volumes: - ./models:/models ports: - "9006:9006" resnet7: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9007 volumes: - ./models:/models ports: - "9007:9007" resnet8: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9008 volumes: - ./models:/models ports: - "9008:9008" Make sure you have Docker and Docker Compose installed on your system. Place your model files in the `./models/resnet50` directory on your local machine. Save the provided Docker Compose configuration to a file named `docker-compose.yml`. Run the following command in the directory containing the `docker-compose.yml` file to start the services: docker-compose up -d You can now access the OpenVINO Model Server instances using the specified ports (e.g., `http://localhost:9001` for `resnet1` and `http://localhost:9002` for `resnet2`). - Ensure that the model files are correctly placed in the `./models/resnet50` directory before starting the services. Set up an NGINX Plus proxy server. You can refer to https://docs.nginx.com/nginx/admin-guide/installing-nginx/installing-nginx-plus/ for NGINX Plus installation also You have the option to configure VMs with NGINX Plus on AWS by either: Utilizing the link provided below, which guides you through setting up NGINX Plus on AWS via the AWS Marketplace: NGINX Plus on AWS Marketplace or Following the instructions available on GitHub at the provided repository link. This repository facilitates spinning up VMs using Terraform on AWS and deploying VMs with NGINX Plus under the GitHub repository - F5 OpenVINO The NGINX Plus proxy server functions as a proxy for upstream model servers. Within the upstream block, backend servers (model_servers) are defined along with their respective IP addresses and ports. In the server block, NGINX listens on port 80 to handle incoming HTTP/2 requests targeting the specified server name or IP address. Requests directed to the root location (/) are then forwarded to the upstream model servers utilizing the gRPC protocol. The proxy_set_header directives are employed to maintain client information integrity while passing requests to the backend servers. Ensure to adjust the IP addresses, ports, and server names according to your specific setup. Here is an example configuration that is also available at GitHubhttps://github.com/f5businessdevelopment/F5openVino upstream model_servers { server 172.17.0.1:9001; server 172.17.0.1:9002; server 172.17.0.1:9003; server 172.17.0.1:9004; server 172.17.0.1:9005; server 172.17.0.1:9006; server 172.17.0.1:9007; server 172.17.0.1:9008; zone model_servers 64k; } server { listen 80 http2; server_name 10.0.0.19; # Replace with your domain or public IP location / { grpc_pass grpc://model_servers; health_check type=grpc grpc_status=12; # 12=unimplemented proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } If you are using gRPC with SSL please refer to the detailed configuration at NGINX Plus SSL Configuration Here is the explanation: upstream model_servers { server 172.17.0.1:9001; # Docker bridge network IP and port for your container server 172.17.0.1:9002; # Docker bridge network IP and port for your container .... .... } This section defines an upstream block named model_servers, which represents a group of backend servers. In this case, there are two backend servers defined, each with its IP address and port. These servers are typically the endpoints that NGINX will proxy requests to. server { listen 80 http2; server_name 10.1.1.7; # Replace with your domain or public IP This part starts with the main server block. It specifies that NGINX should listen for incoming connections on port 80 using the HTTP/2 protocol (http2), and it binds the server to the IP address 10.1.1.7. Replace this IP address with your domain name or public IP address. location / { grpc_pass grpc://model_servers; health_check type=grpc grpc_status=12; # 12=unimplemented proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } Within the location/block, NGINX defines how to handle requests to the root location. In this case, it's using gRPC (grpc_pass grpc://model_servers;) to pass the requests to the upstream servers defined in the model_servers block. The proxy_set_header directives are used to set headers that preserve client information when passing requests to the backend servers. These headers include Host, X-Real-IP, and X-Forwarded-For. Health checks with type=grpc enable granular monitoring of individual gRPC services and endpoints. You can verify the health of specific gRPC methods or functionalities, ensuring each service component is functioning correctly. In summary, this NGINX configuration sets up a reverse proxy server that listens for HTTP/2 requests on port 80 and forwards them to backend servers (model_servers) using the gRPC protocol. It's commonly used for load balancing or routing requests to multiple backend servers. Inference Testing: This is how you can conduct testing. On the client side, we utilize a script named predict.py. Below is the script for reference # Import necessary libraries import numpy as np from classes import imagenet_classes from ovmsclient import make_grpc_client # Create a gRPC client to communicate with the server # Replace "10.1.1.7:80" with the IP address and port of your server client = make_grpc_client("10.1.1.7:80") # Open the image file "zebra.jpeg" in binary read mode with open("zebra.jpeg", "rb") as f: img = f.read() # Send the image data to the server for prediction using the "resnet" model output = client.predict({"0": img}, "resnet") # Extract the index of the predicted class with the highest probability result_index = np.argmax(output[0]) # Print the predicted class label using the imagenet_classes dictionary print(imagenet_classes[result_index]) This script imports necessary libraries, establishes a connection to the server at the specified IP address and port, reads an image file named "zebra.jpeg," sends the image data to the server for prediction using the "resnet" model, retrieves the predicted class index with the highest probability, and prints the corresponding class label. Results: Execute the following command from the client machine. Here, we are transmitting this image of Zebra to the model server. python3 predict.py zebra.jpg #run the Inference traffic zebra. The prediction output is 'zebra'. Let's now examine the NGINX Plus logs cat /var/log/nginx/access.log 10.1.1.7 - - [13/Apr/2024:00:18:52 +0000] "POST /tensorflow.serving.PredictionService/Predict HTTP/2.0" 200 4033 "-" "grpc-python/1.62.1 grpc-c/39.0.0 (linux; chttp2)" This log entry shows that a POST request was made to the NGINX server at the specified timestamp, and the server responded with a success status code (200). The request was made using gRPC, as indicated by the user agent string. Conclusion: Using NGINX Plus, organizations can achieve a scalable and efficient AI inference solution. NGINX Plus can address disruptions caused by connection timeouts/errors, sudden spikes in request rates, or changes in network topology. OpenVINO Model Server optimizes model performance and inference speed, utilizing Intel hardware acceleration for enhanced efficiency. NGINX Plus acts as a high-performance load balancer, distributing incoming requests across multiple model server instances for improved scalability and reliability. Together, this enables seamless scaling of AI inference workloads, ensuring optimal performance and resource utilization. You can look at this video for reference: https://youtu.be/Sd99woO9FmQ References: https://hub.docker.com/u/openvino https://docs.nginx.com/nginx/deployment-guides/amazon-web-services/high-availability-keepalived/ https://www.nginx.com/blog/nginx-1-13-10-grpc/ https://github.com/f5businessdevelopment/F5openVino.git https://docs.openvino.ai/nightly/ovms_docs_deploying_server.html741Views1like0Comments