Issue with worker_connections limits in Nginx+
Hello Nginx Community, We are using Nginx+ for our Load Balancer and have encountered a problem where the current worker_connections limit is insufficient. I need our monitoring system to check the current value of worker_connections for each Nginx worker process to ensure that the active worker_connections are below the maximum allowed. The main issue is that I cannot determine the current number of connections for each Nginx worker process. In my test configuration, I set worker_connections to 28 (which is a small value used only for easily reproducing the issue). With 32 worker processes, the total capacity should be 32 * 28 = 896 connections. Using the /api/9/connections endpoint, we can see the total number of active connections: { "accepted": 2062055, "dropped": 4568, "active": 9, "idle": 28 } Despite the relatively low number of active connections, the log file continually reports that worker_connections are insufficient. Additionally, as of Nginx+ R30, there is an endpoint providing per-worker connection statistics (accepted, dropped, active, and idle connections, total and current requests). However, the reported values for active connections are much lower than 28: $ curl -s http://<some_ip>/api/9/workers | jq | grep active "active": 2, "active": 0, "active": 1, "active": 2, "active": 1, "active": 1, "active": 0, "active": 0, "active": 3, "active": 0, "active": 0, "active": 0, "active": 2, "active": 2, "active": 0, "active": 1, "active": 0, "active": 0, "active": 0, "active": 0, "active": 0, "active": 0, "active": 0, "active": 2, "active": 1, "active": 2, "active": 1, "active": 0, "active": 1, "active": 0, "active": 0, "active": 1, Could you please help us understand why the active connections are reported as lower than the limit, yet we receive logs indicating that worker_connections are not enough? Thank you for your assistance.76Views1like5CommentsNGINX Virtual Machine Building with cloud-init
Traditionally, building new servers was a manual process. A system administrator had a run book with all the steps required and would perform each task one by one. If the admin had multiple servers to build the same steps were repeated over and over. All public cloud compute platforms provide an automation tool called cloud-init that makes it easy to automate configuration tasks while a new VM instance is being launched. In this article, you will learn how to automate the process of building out a new NGINX Plus server usingcloud-init.303Views3likes4CommentsSecuring and Scaling Hybrid Apps with F5/NGINX (Part 3)
In part 2 of our series, I demonstrated how to configure ZT (Zero Trust) use cases centering around authentication with NGINX Plus in hybrid environments. We deployed NGINX Plus as the external LB to route and authenticate users connecting to my Kubernetes applications. In this article, we explore other areas of the ZT spectrum configurable on the External LB Service, including: Authorization and Access Encryption mTLS Monitoring/Auditing ZT Use case #1: Authorization Many people think that authentication and authorization can be used interchangeably. However, they both mean different things. Authentication involves the process of verifying user identities based on the credentials presented. Even though authenticated users are verified by the system, they do not necessarily have the authority to access protected applications. That is where authorization comes into play. Authorization involves the process of verifying the authority of an identity before granting access to application. Authorization in the context of OIDC authentication involves retrieving claims from user ID tokens and setting conditions to validate whether the user is authorized to enter the system. An authenticated user is granted an ID token from the IdP with specific user information through JWT claims. The configuration of these claims is typically set from the IdP. Revisiting the OIDC auth use case configured in the previous section, we can retrieve the ID tokens of authenticated users from the NGINX key-value store. $ curl -i http://localhost:8010/api/9/http/keyvals/oidc_acess_tokens Then we can view the decoded value of the ID token using jwt.io. Below is an example of decoded payload data from the ID token. { "exp": 1716219261, "iat": 1716219201, "admin": true, "name": "Micash", "zone_info": "America/Los_Angeles" "jti": "9f8ff4bd-4857-4e12-9634-e5876f786f98", "iss": "http://idp.f5lab.com:8080/auth/realms/master", "aud": "account", "typ": "Bearer", "azp": "appworld2024", "nonce": "gMNK3tu06j6tp5-jGa3aRhkj4F0P-Z3e04UfcFeqbes" } NGINX Plus has access to these claims as embedded variables. They are accessed by prefixing $jwt_claim_ to the desired field (for example, $jwt_claim_admin for the admin claim). We can easily set conditions on these claims and block unauthorized users before they even reach the back-end applications. Going back to our frontend.conf file in the previous part of our series. We can set $jwt_flag variable to 0 or 1 based on the value of the admin JWT claim. We then use the jwt_claim_require directive to validate the ID token. ID tokens with admin claims set to false will be rejected. map $jwt_claim_admin $jwt_status { "true" 1; default 0; } server { include conf.d/openid_connect.server_conf; # Authorization code flow and Relying Party processing error_log /var/log/nginx/error.log debug; # Reduce severity level as required listen [::]:443 ssl ipv6only=on; listen 443 ssl; server_name example.work.gd; ssl_certificate /etc/ssl/nginx/default.crt; # self-signed for example only ssl_certificate_key /etc/ssl/nginx/default.key; location / { # This site is protected with OpenID Connect auth_jwt "" token=$session_jwt; error_page 401 = @do_oidc_flow; auth_jwt_key_request /_jwks_uri; # Enable when using URL auth_jwt_require $jwt_status; proxy_pass https://cluster1-https; # The backend site/app } } Note: Authorization with NGINX Plus is not restricted to only JWT tokens. You can technically set conditions on a variety of attributes, such as: Session cookies HTTP headers Source/Destination IP addresses ZT use case #2: Mutual TLS Authentication (mTLS) When it comes to ZT, mTLS is one of the mainstream use cases falling under the Zero Trust umbrella. For example, enterprises are using Service Mesh technologies to stay compliant with ZT standards. This is because Service Mesh technologies aim to secure service to service communication using mTLS. In many ways, mTLS is similar to the OIDC use case we implemented in the previous section. Only here, we are leveraging digital certificates to encrypt and authenticate traffic. This underlying framework is defined by PKI (Public Key Infrastructure). To explain this framework in simple terms we can refer to a simple example; the driver's license you carry in your wallet. Your driver’s license can be used to validate your identity, the same way digital certificates can be used to validate the identity of applications. Similarly, only the state can issue valid driver's licenses, the same way only Certificate Authorities (CAs) can issue valid certificates to applications. It is also important that only the state can issue valid certificates. Therefore, every CA must have a private secure key to sign and issue valid certificates. Configuring mTLS with NGINX can be broken down in two parts: Ingress mTLS; Securing SSL client traffic and validating client certificates against a trusted CA. Egress mTLS; securing SSL upstream traffic and offloading authentication of TLS material to a trusted HTTPS back-end server. Ingress mTLS You can configure ingress mTLS on the NLK deployment by simply referencing the trusted certificate authority adding the ssl_client_certificate directive in the server context. This will configure NGINX to validate client certificates with the referenced CA. Note: If you do not have a CA, you can create one using OpenSSL or Cloudflare PKI and TLS toolkits server { listen 443 ssl; status_zone https://cafe.example.com; server_name cafe.example.com; ssl_certificate /etc/ssl/nginx/default.crt; ssl_certificate_key /etc/ssl/nginx/default.key; ssl_client_certificate /etc/ssl/ca.crt; } Egress mTLS Egress mTLS is a slight alternative to ingress mTLS where NGINX verifies certificates of upstream applications rather than certificates originating from clients. This feature can be enabled by adding the proxy_ssl_trusted_certificate directive to the server context. You can reference the same trusted CA we used for verification when configuring ingress mTLS or reference a different CA. In addition to verifying server certificates, NGINX as a reverse-proxy can pass over certs/keys and offload verification to HTTPS upstream applications. This can be done by adding the proxy_ssl_certificate and proxy_ssl_certificate_key directives in the server context. server { listen 443 ssl; status_zone https://cafe.example.com; server_name cafe.example.com; ssl_certificate /etc/ssl/nginx/default.crt; ssl_certificate_key /etc/ssl/nginx/default.key; #Ingress mTLS ssl_client_certificate /etc/ssl/ca.crt; #Egress mTLS proxy_ssl_certificate /etc/nginx/secrets/default-egress.crt; proxy_ssl_certificate_key /etc/nginx/secrets/default-egress.key; proxy_ssl_trusted_certificate /etc/nginx/secrets/default-egress-ca.crt; } ZT use case #3: Secure Assertion Markup Language (SAML) SAML (Security Assertion Markup Language) is an alternative SSO solution to OIDC. Many organizations may choose between SAML and OIDC depending on requirements and IdPs they currently run in production. SAML requires a SP (Service Provider) to exchange XML messages via HTTP POST binding to a SAML IdP. Once exchanges between the SP and IdP are successful, the user will have session access to the protected backed applications with one set of user credentials. In this section, we will configure NGINX Plus as the SP and enable SAML with the IdP. This will be like how we configured NGINX Plus as the relying party in an OIDC authorization code flow (See ZT Use case #1). Setting up the IdP The one prerequisite is setting up your IdP. In our example, we will set up the Microsoft Entra ID on Azure. You can use the SAML IdP of your choosing. Once the SAML application is created in your IdP, you can access the SSO fields necessary to link your SP (NGINX Plus) to your IdP (Microsoft Entra ID). You will need to edit the basic SAML configuration by clicking on the pencil icon next to Editin Basic SAML Configuration, as seen in the figure above. Add the following values and click Save: Identifier (Entity ID) -- https://fourth.run.place Reply URL (Assertion Consumer Service URL) -- https://fourth.run.place/saml/acs Sign on URL: https://fourth.run.place Logout URL (Optional): https://fourth.run.place/saml/sls Finally download the Certificate (Raw) from Microsoft Entra ID and save it to your NGINX Plus instance. This certificate is used to verify signed SAML assertions received from the IdP. Once the certificate is saved on the NGINX Plus instance, extract the public key from the downloaded certificate and convert it to SPKI format. We will use this certificate later when we configure NGINX Plus in the next section. $ openssl x509 -in demo-nginx.der -outform DER -out demo-nginx.der $ openssl x509 -inform DER -in demo-nginx.der -pubkey -noout > demo-nginx.spki Configuring NGINX Plus as the SAML Service Provider After the IdP is setup, we can configure NGINX Plus as the SP to exchange and validate XML messages with the IdP. Once logged into the NGINX Plus instance, simply clone the nginx SAML GitHub repo. $ git clone https://github.com/nginxinc/nginx-saml.git && cd nginx-saml Copy the config files into the /etc/nginx/conf.d directory. $ cp frontend.conf saml_sp.js saml_sp.server_conf saml_sp_configuration.conf /etc/nginx/conf.d/ Notice that by default, frontend.conf listens on port 8010 with clear text http. You can merge kube_lb.conf into frontend.conf to enable TLS termination and update the upstream context with application endpoints you wish to protect with SAML. Finally we will need to edit the saml_sp_configuration.conf file and update variables in the map context based on the parameters of your SP and IdP: $saml_sp_entity_id; https://fourth.run.place $saml_sp_acs_url; https://fourth.run.place/saml/acs $saml_sp_sign_authn; false $saml_sp_want_signed_response; false $saml_sp_want_signed_assertion; true $saml_sp_want_encrypted_assertion; false $saml_idp_entity_id; Unique identifier that identifies the IdP to the SP. This field is retrieved from your IdP $saml_idp_sso_url; This is the login URL and is also retrieved from the IdP $saml_idp_verification_certificate; Variable referencing the certificate downloaded from the previous section when setting up the IdP. This certificate will verify signed assertions received from the IdP. Use the full directory (/etc/nginx/conf.d/demo-nginx.spki) $saml_sp_slo_url; https://fourth.run.place/saml/sls $saml_idp_slo_url; This is the logout URL retrieved from the IdP $saml_sp_want_signed_slo; true The remaining variables defined in saml_sp_configuration.conf can be left unchanged, unless there is a specific requirement for enabling them. Once the variables are set appropriately, we can reload NGINX Plus. $ nginx -s reload Testing Now we will verify the SAML flow. open your browser and enter https://fourth.run.place in the address bar. This should redirect me to the IDP login page. Once you login with your credentials, I should be granted access to my protected application ZT use case #4: Monitoring/Auditing NGINX logs/metrics can be exported to a variety of 3rd party providers including: Splunk, Prometheus/Grafana, cloud providers (AWS CloudWatch and Azure Monitor Logs), Datadog, ELK stack, and more. You can monitor NGINX metrics and logs natively with NGINX Instance Manager or NGINX SaaS. The NGINX Plus API provides me a lot of flexibility by exporting metrics to any third-party tool that accepts JSON. For example, you can export NGINX Plus API metrics to our native real-time dashboard from part 1. native real-time dashboard from part 1 Whichever tool I chose, monitoring/auditing my data generated from my IT systems is key to understanding and optimizing my applications. Conclusion Cloud providers offer a convenient way to expose Kubernetes Services to the internet. Simply create Kubernetes Service of type: LoadBalancer and external users connect to your services via public entry point. However, cloud load balancers do nothing more than basic TCP/HTTP load balancing. You can configure NGINX Plus with many Zero Trust capabilities as you scale out your environment to multiple clusters in different regions, which is what we will cover in the next part of our series.196Views2likes0CommentsSecuring and Scaling Hybrid apps with F5 NGINX (Part 2)
If you attended a cybersecurity trade-show lately, you may have noticed the term “Zero Trust (ZT)” advertised on almost every booth. It may seem like most security companies are offering the same value proposition: ‘Securing apps with ZT’. Its commonality stems from the fact that ZT is a broad term that can span endless use cases. ZT is not a feature or capability, rather a philosophy embraced by IT security leaders based on the idea that all traffic entering and exiting a system is not trusted and must be scrutinized before passing through. Organizations are shifting to a zero-trust mindset due to the increased complexity of cyber-attacks. Perimeter based firewalls are no longer sufficient in securing digital resources. In Part 1 of our series, we configured NGINX Plus as an external load balancer to route and terminate TLS traffic to cluster nodes. In this part of the series, we leverage the same NGINX Plus deployment to enable ZT use cases that will improve the security posture of your hybrid applications. NOTE: Part 1 of the series is a prerequisite for enabling the ZT use cases in our examples. Please ensure that part 1 is completed before starting with part 2 Part 1 of the series ZT Use case #1: OIDC Authentication OIDC (OpenID Connect) is an authentication layer on top of the OAuth 2.0 framework. Many organizations will choose OIDC to authenticate digital identities and enable SSO (Single-Sign-On) for consumer applications. With Single-Sign-on technologies, users gain access to multiple applications with one set of user credentials by authenticating their identities through an IdP (Identity Provider). I can configure NGINX Plus to operate as an OIDC relaying party to exchange and validate ID tokens with the IdP (Identity Provider), in addition to basic reverse-proxy load balancing configured in part 1. I will extend the architecture in part 1 with an IdP and configure NGINX Plus as the identity aware proxy. Prerequisites for NGINX Plus Before configuring NGINX Plus as the OIDC identity aware proxy: 1. Installation of NJS is required. $ sudo apt-get install nginx-plus-module-njs 2. Load the NJS module into the NGINX configuration by adding the following line at the top of yournginx.conf file. load_module modules/ngx_http_js_module.so; 3. Clone the OIDC GitHub repository in your directory of choice cd /home/ubuntu && git clone --branch R28 https://github.com/nginxinc/nginx-openid-connect.git Setting up the IdP The IdP will manage and store digital identities to mitigate attackers from impersonating users to steal sensitive information. There are many IdP vendors to choose from; Okta, Ping Identity, Azure AD. We chose Okta as the IdP in our example moving forward. If you do not have access to an IdP, you can quickly get started with the Okta Command Line Interface (CLI) and run the okta register command to sign up for a new account. Once account creation is successful, we will use the Okta CLI to preconfigure Okta as the IdP, creating what Okta calls an app integration. Other IdPs will have different nomenclatures defining an application integration. For example, Azure AD will call them App registrations. If you are not using Okta, you can follow the documentation of the IdP you are using and skip to the next section (Configuring NGINX as the OpenID Connect relying party). 1. Run the okta login command to authenticate the Okta Cli with your Okta developer account. Enter your Okta domain and API token at the prompts $ okta login Okta Org URL: https://your-okta-domain Okta API token: your-api-token 2. Create the app integration $ okta apps create --app-name=mywebapp --redirect-uri=https://<nginx-plus-hostname>:443/_codexch where --app-name defines the application name (here, mywebapp) --redirect-uri defines the URI to which sign-ins are redirected to the NGINX Plus. <nginx-plus-hostname> should resolve to the NGINX Plus external IP configured in part 1. We use port 443 since TLS termination is configured on NGINX Plus from part 1. Recall we used self-signed certificates and keys to configure TLS on NGINX Plus. In a production environment, we recommend using certs/keys issued by a trusted certificate authority such as Let’s Encrypt. Once the command from step #2 is completed, the client ID and Secret generated from the app integration can be found in ${HOME}/.okta.env Configuring NGINX as the OpenID Connect relying party Now that we have finished setting up our IdP, we can now start configuring NGINX Plus as the OpenID Connect relying party. Once logged into the NGINX Plus instance, simply run the configuration script from your home directory. $ ./nginx-openid-connect/configure.sh -h <nginx-plus-hostname> -k request -i <YOURCLIENTID> -s <YOURCLIENTSECRET> –xhttps://dev-xxxxxxx.okta.com/.well-known/openid-configuration where -h defines the hostname of NGINX Plus -k defines how NGINX will retrieve JWK files to validate JWT signatures. The JWK file is retrieved from a subrequest to the IdP -i defines the Client ID generated from the IdP -s defines the Client Secret generated from the IdP -x defines the URL of the OpenID configuration endpoint. Using Okta as the example, the URL starts with your Okta organization domain, followed by the path URI /.well-known/openid-configuration The configure script will generate OIDC config files for NGINX Plus. We will copy the generated config files into the /etc/nginx/conf.d directory from part 1. $ sudo cp frontend.conf openid_connect.js openid_connect.server_conf openid_connect_configuration.conf /etc/nginx/conf.d/ You will notice by default that frontend.conf listens on port 8010 with clear text http. We need to merge kube_lb.conf into frontend.conf to enable both use cases from part 1 and 2. The resulting frontend.conf should look something like this: https://gist.github.com/nginx-gists/af067326734063da6a4ff42146873262 Finally, I will need to edit the openid_connect_configuration.conf file and modify my client secret to the one generated by my Okta IdP. Reload NGINX Plus for the new config to take effect. $ nginx -s reload Testing the Environment Now we are ready to test our environment in action. To summarize, we set up an IdP and configured NGINX Plus as the identity aware proxy to validate user ID tokens before the entering the Kubernetes cluster. To test the environment, we will open a browser and enter the hostname of NGINX Plus into the address field. You should be redirected to your IdP login page. Note: The host name should resolve to the Public IP of the NGINX Plus machine. Once you are prompted with the IdP login page from your browser, you can access the Kubernetes pods once the user credentials are validated. User credentials should be defined from the IdP. Once you are logged into your application, the ID token of the authenticated is stored in the NGINX Plus Key-Value Store. Enabling PKCE with OIDC In the previous section, we learned how to configure NGINX Plus as the OIDC relying party to authenticate user identities attempting connections to protected Kubernetes applications. However, there are few cases where attackers can intercept code exchange requests issued from the IdP and hijack your ID tokens and gain access to your sensitive applications. PKCE is an extension of the OIDC Authorization code flow designed to protect against authorization code interception and theft. PKCE provides an extra layer of security where the attacker will need to provide a code verifier in addition to the authorization code in exchange for the ID token from the IdP. In our current setup, NGINX Plus will send a random generated PKCE code verifier as a query parameter when redirecting users to the IdP login page. The IdP will use this PKCE code verifier as extra validation when the authorization code is exchanged for the ID token. PKCE needs to be enabled from both NGINX Plus and the IdP. To enable PKCE verification on NGINX Plus, edit the openid_connect_configuration.conf file and modify $oidc_pkce_enable to 1 and reload NGINX Plus. Depending on the IdP you are using, a checkbox should be available to enable PKCE. Testing the Environment To test that PKCE is working, we will open a browser and enter the NGINX Plus host name once again. You should be redirected to the login page, only this time you will notice the URL had slightly changed with additional query parameters: code_challenge_method – Method used to hash the plain code verifier (most likely SHA256) code_challenge – The hashed value of the plain code verifier NGINX Plus will provide this plain code verifier along with the authorization code in exchange for the ID token. NGINX Plus will then validate the ID token and store it in cache. Extending OIDC with 3rd party Systems Customers may need to integrate their OIDC workflow with proprietary Auth/Auth systems already in production. For example, additional metadata pertaining to a user may need to be collected from an external Redis Cache or JFrog Artifactory. We can fill this gap by extending our diagram from the previous section. In addition to token validation with NGINX Plus, I pull response data from JFrog Artifactory and pass it to the backend applications once users are authenticated. Note: I am using JFrog Artifactory as an example 3rd party endpoint here. I can technically use any endpoint I want. Testing the Environment To test our environment in action, I will make a few updates to my NGINX OIDC configuration. You can pull our updated GitHub repository and use it as reference for updating your NGINX configuration. Update #1: Adding the njs source code The first update is extending NGINX Plus with njs and retrieving response data from our 3rd party system. Add the KvOperations.js source file in your /etc/nginx/njs directory. Update #2: frontend.conf I am adding lines 37-39 to frontend.conf so that NGINX Plus will initiate the sub-request to our 3rd party system after users are authenticated with OIDC. We are setting the URI of this sub-request to /kvtest. More on this on the next update. Update #3: openid_connect.server_conf We are adding lines 35-48 to openid_connect.server_conf consisting of two internal redirects in NGINX: /kvtest; Internal redirect from sub-requests with URI /kvtest will functions in KvOperations.js /auth; Internal redirect for sub-requests with URI /authwill be proxied to the 3rd party endpoint. You can replace the <artifactory-url> in line 47 with your own endpoints Update #4: openid_connect_configuration.conf This update is optional and applies when passing dynamic variables to your 3rd party endpoints. You can dynamically update variables on the fly by sending POST requests to the NGINX Plus Key-Value store. $ curl -iX POST -d '{"admin", "<input-value>"}' http://localhost:9000/api/7/http/<keyval-zone-name> We are defining/instantiating the new Key-Value store in lines 102-104.Once the updates are complete, I can test the optimized OIDC environment by troubleshooting/verifying the application is on the receiving end of my dynamic input values. Wrapping it up I covered a subset of ZT use cases with NGINX in hybrid architectures. The use cases presented in this article center around authentication. In the next part of our series, I will cover more ZT use case that include: Alternative authentication methods (SAML) Encryption Authorization and Access Control Monitoring/Auditing297Views0likes0CommentsScalable AI Deployment: Harnessing OpenVINO and NGINX Plus for Efficient Inference
Introduction In the realm of artificial intelligence (AI) and machine learning (ML), the need for scalable and efficient AI inference solutions is paramount. As organizations deploy increasingly complex AI models to solve real-world problems, ensuring that these models can handle high volumes of inference requests becomes critical. NGINX Plus serves as a powerful ally in managing incoming traffic efficiently. As a high-performance web server and reverse proxy server, NGINX Plus is adept at load balancing and routing incoming HTTP and TCP traffic across multiple instances of AI model serving environments. The OpenVINO Model Server, powered by Intel's OpenVINO toolkit, is a versatile inference server supporting various deep learning frameworks and hardware acceleration technologies. It allows developers to deploy and serve AI models efficiently, optimizing performance and resource utilization. When combined with NGINX Plus capabilities, developers can create resilient and scalable AI inference solutions capable of handling high loads and ensuring high availability. Health checks allow NGINX Plus to continuously monitor the health of the upstream OVMS instances. If an OVMS instance becomes unhealthy or unresponsive, NGINX Plus can automatically route traffic away from it, ensuring that inference requests are processed only by healthy OVMS instances. Health checks provide real-time insights into the health status of OVMS instances. Administrators can monitor key metrics such as response time, error rate, and availability, allowing them to identify and address issues proactively before they impact service performance. In this article, we'll delve into the symbiotic relationship between the OpenVINO Model Server, and NGINX Plus to construct a robust and scalable AI inference solution. We'll explore setting up the environment, configuring the model server, harnessing NGINX Plus for load balancing, and conducting testing. By the end, readers will gain insights into how to leverage Docker, the OpenVINO Model Server, and NGINX Plus to build scalable AI inference systems tailored to their specific needs. Flow explanation: Now, let's walk through the flow of a typical inference request. When a user submits an image of a zebra for inference, the request first hits the NGINX load balancer. The load balancer then forwards the request to one of the available OpenVINO Model Server containers, distributing the workload evenly across multiple containers. The selected container processes the image using the optimized deep-learning model and returns the inference results to the user. In this case, the object is named zebra. OpenVINO™ Model Server is a scalable, high-performance solution for serving machine learning models optimized for Intel® architectures. The server provides an inference service via gRPC, REST API, or C API -- making it easy to deploy new algorithms and AI experiments. You can visit https://hub.docker.com/u/openvino for reference. Setting up: We'll begin by deploying model servers within containers. For this use case, I'm deploying the model server on a virtual machine (VM). Let's outline the steps to accomplish this: Get the docker image for OpenVINO ONNX run time docker pull openvino/onnxruntime_ep_ubuntu20 You can also visit https://docs.openvino.ai/nightly/ovms_docs_deploying_server.html for OpenVINO model server deployment in a container environment. Begin by creating a docker-compose file following the structure below: https://raw.githubusercontent.com/f5businessdevelopment/F5openVino/main/docker-compose.yml version: '3' services: resnet1: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9001 volumes: - ./models:/models ports: - "9001:9001" resnet2: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9002 volumes: - ./models:/models ports: - "9002:9002" # Add more services for additional containers resnet3: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9003 volumes: - ./models:/models ports: - "9003:9003" resnet4: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9004 volumes: - ./models:/models ports: - "9004:9004" resnet5: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9005 volumes: - ./models:/models ports: - "9005:9005" resnet6: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9006 volumes: - ./models:/models ports: - "9006:9006" resnet7: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9007 volumes: - ./models:/models ports: - "9007:9007" resnet8: image: openvino/model_server:latest command: > --model_name=resnet --model_path=/models/resnet50 --layout=NHWC:NCHW --port=9008 volumes: - ./models:/models ports: - "9008:9008" Make sure you have Docker and Docker Compose installed on your system. Place your model files in the `./models/resnet50` directory on your local machine. Save the provided Docker Compose configuration to a file named `docker-compose.yml`. Run the following command in the directory containing the `docker-compose.yml` file to start the services: docker-compose up -d You can now access the OpenVINO Model Server instances using the specified ports (e.g., `http://localhost:9001` for `resnet1` and `http://localhost:9002` for `resnet2`). - Ensure that the model files are correctly placed in the `./models/resnet50` directory before starting the services. Set up an NGINX Plus proxy server. You can refer to https://docs.nginx.com/nginx/admin-guide/installing-nginx/installing-nginx-plus/ for NGINX Plus installation also You have the option to configure VMs with NGINX Plus on AWS by either: Utilizing the link provided below, which guides you through setting up NGINX Plus on AWS via the AWS Marketplace: NGINX Plus on AWS Marketplace or Following the instructions available on GitHub at the provided repository link. This repository facilitates spinning up VMs using Terraform on AWS and deploying VMs with NGINX Plus under the GitHub repository - F5 OpenVINO The NGINX Plus proxy server functions as a proxy for upstream model servers. Within the upstream block, backend servers (model_servers) are defined along with their respective IP addresses and ports. In the server block, NGINX listens on port 80 to handle incoming HTTP/2 requests targeting the specified server name or IP address. Requests directed to the root location (/) are then forwarded to the upstream model servers utilizing the gRPC protocol. The proxy_set_header directives are employed to maintain client information integrity while passing requests to the backend servers. Ensure to adjust the IP addresses, ports, and server names according to your specific setup. Here is an example configuration that is also available at GitHubhttps://github.com/f5businessdevelopment/F5openVino upstream model_servers { server 172.17.0.1:9001; server 172.17.0.1:9002; server 172.17.0.1:9003; server 172.17.0.1:9004; server 172.17.0.1:9005; server 172.17.0.1:9006; server 172.17.0.1:9007; server 172.17.0.1:9008; zone model_servers 64k; } server { listen 80 http2; server_name 10.0.0.19; # Replace with your domain or public IP location / { grpc_pass grpc://model_servers; health_check type=grpc grpc_status=12; # 12=unimplemented proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } If you are using gRPC with SSL please refer to the detailed configuration at NGINX Plus SSL Configuration Here is the explanation: upstream model_servers { server 172.17.0.1:9001; # Docker bridge network IP and port for your container server 172.17.0.1:9002; # Docker bridge network IP and port for your container .... .... } This section defines an upstream block named model_servers, which represents a group of backend servers. In this case, there are two backend servers defined, each with its IP address and port. These servers are typically the endpoints that NGINX will proxy requests to. server { listen 80 http2; server_name 10.1.1.7; # Replace with your domain or public IP This part starts with the main server block. It specifies that NGINX should listen for incoming connections on port 80 using the HTTP/2 protocol (http2), and it binds the server to the IP address 10.1.1.7. Replace this IP address with your domain name or public IP address. location / { grpc_pass grpc://model_servers; health_check type=grpc grpc_status=12; # 12=unimplemented proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } Within the location/block, NGINX defines how to handle requests to the root location. In this case, it's using gRPC (grpc_pass grpc://model_servers;) to pass the requests to the upstream servers defined in the model_servers block. The proxy_set_header directives are used to set headers that preserve client information when passing requests to the backend servers. These headers include Host, X-Real-IP, and X-Forwarded-For. Health checks with type=grpc enable granular monitoring of individual gRPC services and endpoints. You can verify the health of specific gRPC methods or functionalities, ensuring each service component is functioning correctly. In summary, this NGINX configuration sets up a reverse proxy server that listens for HTTP/2 requests on port 80 and forwards them to backend servers (model_servers) using the gRPC protocol. It's commonly used for load balancing or routing requests to multiple backend servers. Inference Testing: This is how you can conduct testing. On the client side, we utilize a script named predict.py. Below is the script for reference # Import necessary libraries import numpy as np from classes import imagenet_classes from ovmsclient import make_grpc_client # Create a gRPC client to communicate with the server # Replace "10.1.1.7:80" with the IP address and port of your server client = make_grpc_client("10.1.1.7:80") # Open the image file "zebra.jpeg" in binary read mode with open("zebra.jpeg", "rb") as f: img = f.read() # Send the image data to the server for prediction using the "resnet" model output = client.predict({"0": img}, "resnet") # Extract the index of the predicted class with the highest probability result_index = np.argmax(output[0]) # Print the predicted class label using the imagenet_classes dictionary print(imagenet_classes[result_index]) This script imports necessary libraries, establishes a connection to the server at the specified IP address and port, reads an image file named "zebra.jpeg," sends the image data to the server for prediction using the "resnet" model, retrieves the predicted class index with the highest probability, and prints the corresponding class label. Results: Execute the following command from the client machine. Here, we are transmitting this image of Zebra to the model server. python3 predict.py zebra.jpg #run the Inference traffic zebra. The prediction output is 'zebra'. Let's now examine the NGINX Plus logs cat /var/log/nginx/access.log 10.1.1.7 - - [13/Apr/2024:00:18:52 +0000] "POST /tensorflow.serving.PredictionService/Predict HTTP/2.0" 200 4033 "-" "grpc-python/1.62.1 grpc-c/39.0.0 (linux; chttp2)" This log entry shows that a POST request was made to the NGINX server at the specified timestamp, and the server responded with a success status code (200). The request was made using gRPC, as indicated by the user agent string. Conclusion: Using NGINX Plus, organizations can achieve a scalable and efficient AI inference solution. NGINX Plus can address disruptions caused by connection timeouts/errors, sudden spikes in request rates, or changes in network topology. OpenVINO Model Server optimizes model performance and inference speed, utilizing Intel hardware acceleration for enhanced efficiency. NGINX Plus acts as a high-performance load balancer, distributing incoming requests across multiple model server instances for improved scalability and reliability. Together, this enables seamless scaling of AI inference workloads, ensuring optimal performance and resource utilization. You can look at this video for reference: https://youtu.be/Sd99woO9FmQ References: https://hub.docker.com/u/openvino https://docs.nginx.com/nginx/deployment-guides/amazon-web-services/high-availability-keepalived/ https://www.nginx.com/blog/nginx-1-13-10-grpc/ https://github.com/f5businessdevelopment/F5openVino.git https://docs.openvino.ai/nightly/ovms_docs_deploying_server.html702Views1like0CommentsHow I did it - "Securing Nvidia Triton Inference Server with NGINX Plus Ingress Controller”
In this installment of "How I Dit it", we step into the world of AI and Machine learning (ML) and take a look at how F5’s NGINX Plus Ingress Controller can provide secure and scalable external access to Nvidia’s Triton Inference Servers hosted on Kubernetes.735Views0likes0CommentsSecuring and Scaling Hybrid Application with F5 NGINX (Part 1)
If you are using Kubernetes in production, then you are likely using an ingress controller. The ingress controller is the core engine managing traffic entering and exiting the Kubernetes cluster. Because the ingress controller is a deployment running inside the cluster, how do you route traffic to the ingress controller? How do you route external traffic to internal Kubernetes Services? Cloud providers offer a simple convenient way to expose Kubernetes Services using an external load balancer. Simply deploy a Managed Kubernetes Service (EKS, GKE, AKS) and create a Kubernetes Service of type LoadBalancer. The cloud providers will host and deploy a load balancer providing a public IP address. External users can connect to Kubernetes Services using this public entry point. However, this integration only applies to Managed Kubernetes Services hosted by cloud providers. If you are deploying Kubernetes in private cloud/on-prem environments, you will need to deploy your own load balancer and integrate it with the Kubernetes cluster. Furthermore, Kubernetes Load Balancing integrations in the cloud are limited to TCP Load Balancing and generally lack visibility into metrics, logs, and traces. We propose: A solution that applies regardless of the underlying infrastructure running your workloads Guidance around sizing to avoid bottlenecks from high traffic volumes Application delivery use cases that go beyond basic TCP/HTTP load balancing In the solution depicted below, I deploy NGINX Plus as the external LB service for Kubernetes and route traffic to the NGINX Ingress Controller. The NGINX Ingress Controller will then route the traffic to the application backends. The NLK (NGINX Load Balancer for Kubernetes) deployment is a new controller by NGINX that monitors specified Kubernetes Services and sends API calls to manage upstream endpoints of the NGINX External Load Balancer In this article, I will deploy the components both inside the Kubernetes cluster and NGINX Plus as the external load balancer. Note: I like to deploy both the NLK and Kubernetes cluster in the same subnet to avoid network issues. This is not a hard requirement. Prerequisites The blog assumes you have experience operating in Kubernetes environments. In addition, you have the following: Access to a Kubernetes environment; Bare Metal, Rancher Kubernetes Engine (RKE), VMWare Tanzu Kubernetes (VTK), Amazon Elastic Kubernetes (EKS), Google Kubernetes Engine (GKE), Microsoft Azure Kubernetes Service (AKS), and RedHat OpenShift NGINX Ingress Controller – Deploy NGINX Ingress Controller in the Kubernetes cluster. Installation instructions can be found in the documentation. NGINX Plus – Deploy NGINX Plus on VM or bare metal with SSH access. This will be the external LB service for the Kubernetes cluster. Installation instructions can be found in the documentation. You must have a valid license for NGINX Plus. You can get started today by requesting a 30-day free trial. Setting up the Kubernetes environment I start with deploying the back-end applications. You can deploy your own applications, or you can deploy our basic café application as an example. $ kubectl apply –f cafe.yaml Now I will configure routes and TLS settings for the ingress controller $ kubectl apply –f cafe-secret.yaml $ kubectl apply –f cafe-virtualserver.yaml To ensure the ingress rules are successfully applied, you can examine the output of kubectl get vs. The VirtualServer definition should be in the Valid state. NAMESPACE NAME STATE HOST IP PORTS default cafe-vs Valid cafe.example.com Setting up NGINX Plus as the external LB A fresh install of NGINX Plus will provide the default.conf file in the /etc/nginx/conf.d directory. We will add two additional files into this directory. Simply copy the bulleted files into your /etc/nginx/conf.d directory dashboard.conf; This will enable the real-time monitoring dashboard for NGINX Plus kube_lb.conf; The nginx configuration as the external load balancer for Kubernetes. You can change the configuration file to fit your requirements. In part 1 of this series, we enabled basic routing and TLS for one cluster. You will also need to generate TLS cert/keys and place them in the /etc/ssl/nginx folder of the NGINX Plus instance. For the sake of this example, we will generate a self-signed certificate with openssl. $ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout default.key -out default.crt -subj "/CN=NLK" Note: Using self-signed certificates is for testing purposes only. In a live production environment, we recommend using a secure vault that will hold your secrets and trusted CAs (Certificate Authorities). Now I can validate the configuration and reload nginx for the changes to take effect. $ nginx –t $ nginx –s reload I can now connect to the NGINX Plus dashboard by opening a browser and entering http://<external-ip-nginx>:9000/dashboard.html#upstreams The HTTP upstream table should be empty as we have not deployed the NLK Controller yet. We will do that in the next section. Installing the NLK Controller You can install the NLK Controller as a Kubernetes deployment that will configure upstream endpoints for the external load balancer using the NGINX Plus API. First, we will create the NLK namespace $ kubectl create ns nlk And apply the RBAC settings for the NKL deployment $ kubectl apply -f serviceaccount.yaml $ kubectl apply -f clusterrole.yaml $ kubectl apply -f clusterrolebinding.yaml $ kubectl apply -f secret.yaml The next step is to create a ConfigMap defining the API endpoint of the NGINX Plus external load balancer. The API endpoint is used by the NLK Controller to configure the NGINX Plus upstream endpoints. We simply modify the nginx-hosts field in the manifest from our GitHub repository to the IP address of the NGINX external load balancer. nginx-hosts: http://<nginx-plus-external-ip>:9000/api Apply the updated ConfigMap and deploy the NLK controller $ kubectl apply –f nkl-configmap.yaml $ kubectl apply –f nkl-deployment I can verify the NLK controller deployment is running and the ConfigMap data is applied. $ kubectl get pods –o wide –n nlk $ kubectl describe cm nginx-config –n nlk You should see the NLK deployment in status Running and the URL should be defined under nginx-hosts. The URL is the NGINX Plus API endpoint of the external load Balancer. Now that the NKL Controller is successfully deployed, the external load balancer is ready to route traffic to the cluster. The final step is deploying a Kubernetes Service type NodePort to expose the Kubernetes cluster to NGINX Plus. $ kubectl apply –f nodeport.yaml There are a couple things to note about the NodePort Service manifest. Fields on line 7 and 14 are required for the NLK deployment to configure the external load balancer appropriately: The nginxinc.io/nkl-cluster annotation The port name matching the upstream block definition in the NGINX Plus configuration (See line 42 in kube_lb.conf) and preceding nkl- apiVersion: v1 kind: Service metadata: name: nginx-ingress namespace: nginx-ingress annotations: nginxinc.io/nlk-cluster1-https: "http" # Must be added spec: type: NodePort ports: - port: 443 targetPort: 443 protocol: TCP name: nlk-cluster1-https selector: app: nginx-ingress Once the service is applied, you can note down the assigned nodeport selecting the NGINX Ingress Controller deployment. In this example, that node port is 32222. $ kubectl get svc –o wide –n nginx-ingress NAME TYPE CLUSTER-IP PORT(S) SELECTOR nginx-ingress NodePort x.x.x.x 443:32222/TCP app=nginx-ingress If I reconnect to my NGINX Pus dashboard, the upstream tab should be populated with the worker node IPs of the Kubernetes cluster and matching the node port of the nginx-ingress Service (32222). You can list the node IPs of your cluster to make sure they match the IPs in the dashboard upstream tab. $ kubectl get nodes -o wide | awk '{print $6}' INTERNAL-IP 10.224.0.6 10.224.0.5 10.224.0.4 Now I can connect to the Kubernetes application from our local machine. The hostname we used in our example (cafe.example.com) should resolve to the IP address of the NGINX Plus load balancer. Wrapping it up Most enterprises deploying Kubernetes in production will install an ingress controller. It is the DeFacto standard for application delivery in container orchestrators like Kubernetes. DevOps/NetOps engineers are now looking for guidance on how to scale out their Kubernetes footprint in the most efficient way possible. Because enterprises are embracing the hybrid approach, they will need to implement their own integrations outside of cloud providers. The solution we propose: is applicable to hybrid environments (particularly on-prem) Sizing information to avoid bottlenecks from large traffic volumes Enterprise Load Balancing capabilities that stretch beyond a TCP LoadBalancing Service In the next part of our series, I will dive into the third bullet point into much more detail and cover Zero Trust use cases with NGINX Plus, providing extra later of security in your hybrid model.486Views0likes0Comments