NGINX App Protect Deployment in AWS Cloud

Introduction

Official AWS AMI image for NGINX App Protect has been released recently. This fact gives two big benefits for all users. First is that the official image available on the AWS marketplace eliminates the need to manually pre build AMI for your WAF deployment. It contains all the necessary code and packages on top of the OS of your choice. Another benefit that is even more important from my perspective is that the use of official AMI from the AWS marketplace allows you to pay as you go for NGINX App Protect software instead of purchasing a year-long license. Pay as you go licensing model is much more suitable for modern dynamic cloud environments.

The following article proposes an option of how to deploy and automate NGINX App Protect WAF as a motive to play with new AMIs. To make it slightly more useful I'll try to simulate a production-like environment. Here are the requirements:

Flexibility. A number of instances scale up and down smoothly.
Redundancy. Loss of an instance or entire datacenter doesn't cause a service outage.
Automation. All deployment and day to day operations are automated.

Architecture

High-level architecture represents a common deployment pattern for a highly available system. AWS VPC runs an application load balancer and a subset of EC2 instances running NGINX App Protect software behind it. A load balancer is supposed to manage TLS certificates, receive traffic, and distribute it across all EC2 instances. NGINX App Protect VMs inspect traffic and forward it to the application backend. Everything is simple so far.

Diagram 1. High Level Architecture.

Since the system pretends to be production like then redundancy is a must. Deeper dive to AWS architecture on the diagram below reveals more system details.

Diagram 2. VPC Architecture.

AWS VPC has two subnets distributed across two availability zones. Load balancer legs and WAF instances are going to present in each subnet. Such workload distribution provides geographical resiliency. Even if the entire AWS datacenter in one zone goes down WAF instances in other zone keep woking. Therefore WAF deployment keeps handling traffic and applications remain available to the public. Such a scenario reveals the rule of thumb.

Rule: Always keep instances load below fifty percent to prevent overload in case of loss up to half of the instances.

Each tier lives in its security group. The load balancer security group allows access from any IP to HTTPS port for data traffic. WAF security group allows HTTP access from the load balancer and SSH from trusted hosts for administration purposes.

Data traffic enters load balancer public IPs and then reaches one of WAF instances via private IPs. Blocking response pages served right from WAF VMs. Clear traffic departs directly to application backends regardless of their location.

Automation

Automated deployment and operations for modern systems is de facto standard. Similar to any other systems WAF automation should cover deployment and configuration. Deployment automation sets up underlying AWS infrastructure. Configuration automation takes care of WAF policy distribution across all WAF instances. The following diagram represents an option I used to automate the NGINX App Protect instance.

Diagram 3.

Gitlab is used as a CI/CD platform. Gitlab pipeline sets up and configures the entire system from the ground up. The first stage uses terraform to create all necessary AWS resources such as VPC, subnets, load balancer, and EC2 instances out of official NGINX App Protect AMI image. Second stage provisions WAF policy across all instances.

CI/CD Pipeline

Let's take a closer look at the GitLab pipeline listing. The first stage simply uses terraform to create AWS resources as shown in diagram 2.

terraform:
  stage: terraform
  image:
    name: hashicorp/terraform:0.13.5
  before_script:
    - cd terraform
    - terraform init
  script:
    - terraform plan -out "planfile" && \
    - terraform apply -input=false "planfile"
  artifacts:
    paths:
      - terraform/hosts.cfg

The second stage applies WAF policy across all NGINX App Protect instances created by Terraform.

provision:
 stage: provision
 image:
  name: 464d41/ansible
 before_script:
  - eval $(ssh-agent -s) && \
  - echo $ANSIBLE_PRIVATE_KEY | base64 -d | ssh-add -
  - export ANSIBLE_REMOTE_USER=ubuntu
  - cd provision
  - ansible-galaxy install nginxinc.nginx_config
 script:
  - ansible-playbook -i ../terraform/hosts.cfg nap-playbook.yaml
 only:
  changes:
   - "terraform/*"
   - "provision/**/*"
   - ".gitlab-ci.yml"

WAF Deployment Automation. Terraform

There are a couple of important code snippets I would like to emphasize from terraform code.

...omitted...
module "nap" {
 source = "terraform-aws-modules/ec2-instance/aws"
 providers = {
  aws = aws.us-west-2
 }
 version = "~> 2.0"

 instance_count = 2
 name      = "nap.us-west-2c.int"
 ami      = "ami-045c0c07ba6b04fcc"
 instance_type = "t2.medium"
 root_block_device = [
  {
   volume_type = "gp2"
   volume_size = 8
  }
 ]
 associate_public_ip_address = true
 key_name        = "aws-f5-nap"
 vpc_security_group_ids = [module.nap_sg.this_security_group_id, data.aws_security_group.allow-traffic-from-trusted-sources.id]
 subnet_id       = data.aws_subnet.public.id
}

resource "local_file" "hosts_cfg" {
  content = templatefile("hosts.tmpl",
    {
      nap_instances = module.nap.public_ip
    }
  )
  filename = "hosts.cfg"
}
...omitted...

A community module to create EC2 instances is in use. It allows to save some time on implementing my own and scale deployment up and down by simply changing "instance_count" or "instance_type" values back and forth. "ami" value represents official NGINX App Protect AMI therefore no need to pre-bake custom images and buy with per-instance licenses. All instances have a public IP address assigned to them for management purposes. Only GitLab IPs are allowed to access those IPs. Data traffic comes from a load balancer through private IPs.

Notice that Terraform creates a "hosts.cfg" local file. This file contains a list of WAF VM IPs which terraform manages. So Ansible in the next stage always knows what instances to provision.

WAF Configuration Automation. Ansible

Ansible generates NGINX and App Protect configuration and applies them across all instances created by Terraform. NGINX team developed a set of Ansible collections that wrap these operations to roles. It allows to avoid dealing with complex Jinja templates but instead define NGINX configuration right as Ansible playbook parameters. Ansible automatically compiles these parameters to the NGINX config file and spreads it across hosts.

The following listing gives an example of a playbook to configure NGINX. First, it copies a custom App Protect policy to all hosts.

---
- name: Converge
 hosts: all
 gather_facts: false
 become: yes
 tasks:
  - name: Copy App Protect Policy
   copy:
    src: ./app-protect/custom-policy.json
    dest: /etc/nginx/custom-policy.json

The next task configures general NGINX daemon parameters.

  - name: Configure NGINX and App Protect
   include_role:
    name: nginxinc.nginx_config
   vars:
    nginx_config_debug_output: true
    nginx_config_main_template_enable: true
    nginx_config_main_template:
     template_file: nginx.conf.j2
     conf_file_name: nginx.conf
     conf_file_location: /etc/nginx/
     modules:
      - modules/ngx_http_app_protect_module.so
     user: nginx
     worker_processes: auto
     pid: /var/run/nginx.pid
     error_log:
      location: /var/log/nginx/error.log
      level: warn
     worker_connections: 1024
     http_enable: true
     http_settings:
      default_type: application/octet-stream
      access_log_format:
       - name: main
        format: |
         '$remote_addr - $remote_user [$time_local] "$request" '
         '$status $body_bytes_sent "$http_referer" '
         '"$http_user_agent" "$http_x_forwarded_for"'
      access_log_location:
       - name: main
        location: /var/log/nginx/access.log
      keepalive_timeout: 65
      cache: false
      rate_limit: false
      keyval: false
      server_tokens: "off"
     stream_enable: true
     http_custom_includes:
      - "/etc/nginx/sites-enabled/*.conf"

The last task of the playbook configures a virtual server with App Protect enabled on it:

    nginx_config_http_template_enable: true
    nginx_config_http_template:
     app:
      template_file: http/default.conf.j2
      conf_file_name: default.conf
      conf_file_location: /etc/nginx/conf.d/
      servers:
       server1:
        listen:
         listen_localhost:
          ip: 0.0.0.0
          port: 80
          opts:
           - default_server
        server_name: localhost
        access_log:
         - name: main
          location: /var/log/nginx/access.log
        locations:
         frontend:
          location: /
          proxy_pass: http://app_servers
          proxy_set_header:
           header_host:
            name: Host
            value: $host
        app_protect:
          enable: true
          policy_file: /etc/nginx/custom-policy.json
      upstreams:
       app_upstream:
        name: app_servers
        servers:
         app_server_1:
          address: 35.167.144.13
          port: 80

Once the pipeline ends successfully the NGINX App Protect WAF cluster is deployed, configured, and ready to inspect traffic.

Conclusion

This is an option for how production grade NGINX App Protect deployment could look like. Simple, redundant architecture automated from the ground up helps to effectively manage WAF deployment and let a team focus on application development and security instead of maintaining a WAF up and running. Official AMIs allow to use pay as you go licensing to easily scale deployment up and down without overpaying for static licenses.

Full listings of configuration files are available at repo. Feel free to reach out with questions and suggestions. Thanks for reading!

Published Jan 19, 2021

Version 1.0