NGINX App Protect Deployment in AWS Cloud
Introduction
Official AWS AMI image for NGINX App Protect has been released recently. This fact gives two big benefits for all users. First is that the official image available on the AWS marketplace eliminates the need to manually pre build AMI for your WAF deployment. It contains all the necessary code and packages on top of the OS of your choice. Another benefit that is even more important from my perspective is that the use of official AMI from the AWS marketplace allows you to pay as you go for NGINX App Protect software instead of purchasing a year-long license. Pay as you go licensing model is much more suitable for modern dynamic cloud environments.
The following article proposes an option of how to deploy and automate NGINX App Protect WAF as a motive to play with new AMIs. To make it slightly more useful I'll try to simulate a production-like environment. Here are the requirements:
- Flexibility. A number of instances scale up and down smoothly.
- Redundancy. Loss of an instance or entire datacenter doesn't cause a service outage.
- Automation. All deployment and day to day operations are automated.
Architecture
High-level architecture represents a common deployment pattern for a highly available system. AWS VPC runs an application load balancer and a subset of EC2 instances running NGINX App Protect software behind it. A load balancer is supposed to manage TLS certificates, receive traffic, and distribute it across all EC2 instances. NGINX App Protect VMs inspect traffic and forward it to the application backend. Everything is simple so far.
Diagram 1. High Level Architecture.
Since the system pretends to be production like then redundancy is a must. Deeper dive to AWS architecture on the diagram below reveals more system details.
Diagram 2. VPC Architecture.
AWS VPC has two subnets distributed across two availability zones. Load balancer legs and WAF instances are going to present in each subnet. Such workload distribution provides geographical resiliency. Even if the entire AWS datacenter in one zone goes down WAF instances in other zone keep woking. Therefore WAF deployment keeps handling traffic and applications remain available to the public. Such a scenario reveals the rule of thumb.
Rule: Always keep instances load below fifty percent to prevent overload in case of loss up to half of the instances.
Each tier lives in its security group. The load balancer security group allows access from any IP to HTTPS port for data traffic. WAF security group allows HTTP access from the load balancer and SSH from trusted hosts for administration purposes.
Data traffic enters load balancer public IPs and then reaches one of WAF instances via private IPs. Blocking response pages served right from WAF VMs. Clear traffic departs directly to application backends regardless of their location.
Automation
Automated deployment and operations for modern systems is de facto standard. Similar to any other systems WAF automation should cover deployment and configuration. Deployment automation sets up underlying AWS infrastructure. Configuration automation takes care of WAF policy distribution across all WAF instances. The following diagram represents an option I used to automate the NGINX App Protect instance.
Diagram 3.
Gitlab is used as a CI/CD platform. Gitlab pipeline sets up and configures the entire system from the ground up. The first stage uses terraform to create all necessary AWS resources such as VPC, subnets, load balancer, and EC2 instances out of official NGINX App Protect AMI image. Second stage provisions WAF policy across all instances.
CI/CD Pipeline
Let's take a closer look at the GitLab pipeline listing. The first stage simply uses terraform to create AWS resources as shown in diagram 2.
terraform: stage: terraform image: name: hashicorp/terraform:0.13.5 before_script: - cd terraform - terraform init script: - terraform plan -out "planfile" && \ - terraform apply -input=false "planfile" artifacts: paths: - terraform/hosts.cfg
The second stage applies WAF policy across all NGINX App Protect instances created by Terraform.
provision: stage: provision image: name: 464d41/ansible before_script: - eval $(ssh-agent -s) && \ - echo $ANSIBLE_PRIVATE_KEY | base64 -d | ssh-add - - export ANSIBLE_REMOTE_USER=ubuntu - cd provision - ansible-galaxy install nginxinc.nginx_config script: - ansible-playbook -i ../terraform/hosts.cfg nap-playbook.yaml only: changes: - "terraform/*" - "provision/**/*" - ".gitlab-ci.yml"
WAF Deployment Automation. Terraform
There are a couple of important code snippets I would like to emphasize from terraform code.
...omitted... module "nap" { source = "terraform-aws-modules/ec2-instance/aws" providers = { aws = aws.us-west-2 } version = "~> 2.0" instance_count = 2 name = "nap.us-west-2c.int" ami = "ami-045c0c07ba6b04fcc" instance_type = "t2.medium" root_block_device = [ { volume_type = "gp2" volume_size = 8 } ] associate_public_ip_address = true key_name = "aws-f5-nap" vpc_security_group_ids = [module.nap_sg.this_security_group_id, data.aws_security_group.allow-traffic-from-trusted-sources.id] subnet_id = data.aws_subnet.public.id } resource "local_file" "hosts_cfg" { content = templatefile("hosts.tmpl", { nap_instances = module.nap.public_ip } ) filename = "hosts.cfg" } ...omitted...
A community module to create EC2 instances is in use. It allows to save some time on implementing my own and scale deployment up and down by simply changing "instance_count" or "instance_type" values back and forth. "ami" value represents official NGINX App Protect AMI therefore no need to pre-bake custom images and buy with per-instance licenses. All instances have a public IP address assigned to them for management purposes. Only GitLab IPs are allowed to access those IPs. Data traffic comes from a load balancer through private IPs.
Notice that Terraform creates a "hosts.cfg" local file. This file contains a list of WAF VM IPs which terraform manages. So Ansible in the next stage always knows what instances to provision.
WAF Configuration Automation. Ansible
Ansible generates NGINX and App Protect configuration and applies them across all instances created by Terraform. NGINX team developed a set of Ansible collections that wrap these operations to roles. It allows to avoid dealing with complex Jinja templates but instead define NGINX configuration right as Ansible playbook parameters. Ansible automatically compiles these parameters to the NGINX config file and spreads it across hosts.
The following listing gives an example of a playbook to configure NGINX. First, it copies a custom App Protect policy to all hosts.
--- - name: Converge hosts: all gather_facts: false become: yes tasks: - name: Copy App Protect Policy copy: src: ./app-protect/custom-policy.json dest: /etc/nginx/custom-policy.json
The next task configures general NGINX daemon parameters.
- name: Configure NGINX and App Protect include_role: name: nginxinc.nginx_config vars: nginx_config_debug_output: true nginx_config_main_template_enable: true nginx_config_main_template: template_file: nginx.conf.j2 conf_file_name: nginx.conf conf_file_location: /etc/nginx/ modules: - modules/ngx_http_app_protect_module.so user: nginx worker_processes: auto pid: /var/run/nginx.pid error_log: location: /var/log/nginx/error.log level: warn worker_connections: 1024 http_enable: true http_settings: default_type: application/octet-stream access_log_format: - name: main format: | '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"' access_log_location: - name: main location: /var/log/nginx/access.log keepalive_timeout: 65 cache: false rate_limit: false keyval: false server_tokens: "off" stream_enable: true http_custom_includes: - "/etc/nginx/sites-enabled/*.conf"
The last task of the playbook configures a virtual server with App Protect enabled on it:
nginx_config_http_template_enable: true nginx_config_http_template: app: template_file: http/default.conf.j2 conf_file_name: default.conf conf_file_location: /etc/nginx/conf.d/ servers: server1: listen: listen_localhost: ip: 0.0.0.0 port: 80 opts: - default_server server_name: localhost access_log: - name: main location: /var/log/nginx/access.log locations: frontend: location: / proxy_pass: http://app_servers proxy_set_header: header_host: name: Host value: $host app_protect: enable: true policy_file: /etc/nginx/custom-policy.json upstreams: app_upstream: name: app_servers servers: app_server_1: address: 35.167.144.13 port: 80
Once the pipeline ends successfully the NGINX App Protect WAF cluster is deployed, configured, and ready to inspect traffic.
Conclusion
This is an option for how production grade NGINX App Protect deployment could look like. Simple, redundant architecture automated from the ground up helps to effectively manage WAF deployment and let a team focus on application development and security instead of maintaining a WAF up and running. Official AMIs allow to use pay as you go licensing to easily scale deployment up and down without overpaying for static licenses.
Full listings of configuration files are available at repo. Feel free to reach out with questions and suggestions. Thanks for reading!