An Illustrated Hands-on Intro to AWS VPC Networking
Quick Intro
If you're one of those who knows a bit of networking but you feel uncomfortable touching AWS networking resources, then this article is for you.
We're going to go through real AWS configuration and you can follow along to solidify your understanding.
I'm going through the process of what I personally do to create 2 simple virtual machines, one in a private subnet and another one in a public subnet running Amazon Linux AMI instance.
I will assume you already have an AWS account and corresponding credentials. If not, please go ahead and create your free tier AWS account.
Just keep in mind that Amazon's equivalent to a Virtual Machine (VM) is known as EC2 instance.
VPC, Subnets, Route Tables and Internet Gateways
In short, we can think of Virtual Private Cloud (VPC) as our personal Data Centre. Our little private space in the cloud.
Because it's our personal Data Centre, networking-wise we should have our own CIDR block.
When we first create our VPC, a CIDR block is a compulsory field.
Think of a CIDR block as the major subnet where all the other small subnets will be derived from.
When we create subnets, we create them as smaller chunks from CIDR block.
After we create subnets, there should be just a local route to access "objects" that belong to or are attached to the subnet.
Other than that, if we need access to the Internet, we should create and attach an Internet Gateway (IGW) to our VPC and add a default route pointing to the IGW to route table.
That should take care of it all.
Our Topology for Reference
This summarises what we're going to do. It might be helpful to use it as a reference while you follow along:
Don't worry if you don't understand everything in the diagram above. As you follow along this hands-on article, you can come back to it and everything should make sense.
What we'll do here
I'll explain the following VPC components as we go along configuring them:
- Subnets
- Route Tables
- Internet Gateway
- NAT Gateway
- Egress-Only Gateway
- Quick Recap (I'll just quick summarise what we've done so far because our little virtual DC should be ready to go now!)
We'll then perform the tests:
- Launching EC2 Instance from Amazon Marketplace (That's where we create a virtual machine)
- First attempt to connect via SSH (that's where we try to connect to our instance via SSH but fail! Hold on, I'll fix it!)
- Network ACLs and Security Groups (that's where I point the features that are to blame for our previous failed attempt and fix what's wrong)
- Connect via SSH again (now we're successful)
Note that we only tested our Public instance above as it'd be very repetitive configuring Private instance so I added Private Instance config to Appendix section:
- Spinning Up Private EC2 Instance
VPC Components
The first logical question I get asked by those with little experience with AWS is which basic components do we need to build our core VPC infrastructure?
First we pick an AWS Region:
This is the region we are going to physically run our virtual infrastructure, i.e. our VPC.
Even though your infrastructure is in the Cloud, Amazon has Data Centres (DC) around the world in order to provide first-class availability service to your resources if you need to.
With that in mind, Amazon has many DCs located in many different Regions (EU, Asia Pacific, US East, US West, etc).
The more specific location of AWS DCs are called Availability Zones (AZ).
That's where you'll find one (or more DCs).
So, we create a VPC within a Region and specify a CIDR block and optionally request an Amazon assigned /56 IPv6 CIDR block:
If you're a Network Engineer, this should sound familiar, right? Except for the fact that we're configuring our virtual DC in the Cloud.
Subnets
Now that we've got our own VPC, we need to create subnets within the CIDR block we defined (192.168.0.0/16).
Notice that I also selected the option to retrieve an Amazon's provided IPv6 CIDR block above.
That's because we can't choose an IPv6 CIDR block. We've got to stick to what Amazon automatically assigns to us if we want to use IPv6 addresses.
For IPv6, Amazon always assigns a fixed /56 CIDR block and we can only create /64 subnets.
Also, IPv6 addresses are always Public and there is no NAT by design.
Our assigned CIDR block here was 2600:1f18:263e:4e00::/56.
Let's imagine we're hosting webserver/database tiers in 2 separate subnets but keep in mind this just for lab test purposes only.
A real configuration would likely have instances in multiple AZs.
For our Public WebServer Subnet, we'll use 192.168.1.0/24 and 2600:1f18:263e:4e00:01:/64.
For our Private Database Subnet, we'll use 192.168.2.0/24 and 2600:1f18:263e:4e00:02:/64
Here's how we create our Public WebServer Subnet on Availability Zone us-east-1a:
Here's how we configure our Private Database Subnet:
Notice that I put Private Database Subnet in a different Availability Zone.
In real life, we'd likely create 1 public and 1 private subnet in one Availability Zone and another public and private subnet in a different Availability Zone for redundancy purposes as mentioned before.
For this article, I'll stick to our config above for simplicity sake.
That's just a learn by doing kind of article! :)
Route Tables
If we now look at the Route Table, we'll see that we now have 2 local routes similar to what would appear if we had configured 2 interfaces on a physical router:
However, that's the default/main route table that AWS automatically created for our DevCentral VPC.
If we want our Private Subnet to be really private, i.e. no Internet access for example, we can create a separate route table for it.
Let's create 2 route tables, one named Public RT and the other Private RT:
Private RT should be created in the same way as above with a different name.
The last step is to associate our Public subnet to our Public RT and Private subnet to our Private RT.
The association will bind the subnet to route table making them directly connected routes:
Up to know, both tables look similar but as we configure Internet Gateway in next section, they will look different.
Internet Gateway
Yes, we want to make them different because we want Public RT to have direct access to the Internet.
In order to accomplish that we need to create an Internet Gateway and attach it to our VPC:
And lastly create a default IPv4/IPv6 route in Public RT pointing to Internet Gateway we've just created:
So our Public route table will now look like this:
EC2 instances created within Public Subnet should now have Internet access both using IPv4 and IPv6.
NAT Gateway
Our database server in the Private subnet will likely need outbound Internet access to install updates or for ssh access, right?
So, first let's create a Public Subnet where our NAT gateway should reside:
We then create a NAT gateway in above Public Subnet with an Elastic (Public) IPv4 address attached to it:
Yes, NAT Gateways need a Public (Elastic) IPv4 address that is routable over the Internet.
Next, we associate NAT Public Subnet to our Private Route Table like this:
Lastly, we create a default route in our Private RT pointing to NAT gateway for IPv4 Internet traffic:
We're pretty much done with IPv4.
What about IPv6 Internet access in our Private subnet?
Egress-Only Gateway
As we know, IPv6 doesn't have NAT and all IPv6 addresses are Global so the trick here to make an EC2 instance using IPv6 to behave as if it was using a "private" IPv4 address behind NAT is to create an Egress-only Gateway and point a default IPv6 route to it.
As the name implies, an Egress-only Gateway only allows outbound Internet traffic.
Here we create one and and then add default IPv6 route (::/0) pointing to it:
Quick Recap
What we've done so far:
- Created VPC
- Created 2 Subnets (Private and Public)
- Created 2 Route tables (one for each Subnet)
- Attached Public Subnet to Public RT and Private Subnet to Private RT
- Created 1 Internet Gateway and added default routes (IPv4/IPv6) to our Public RT
- Created 1 NAT Gateway and added default IPv4 route to our Private RT
- Created 1 Egress-only Gateway and added default IPv6 route to our Private RT
Are we ready to finally create an EC2 instance running Linux, for example, to test Internet connectivity from both Private and Public subnets?
Launching EC2 Instance from Amazon Marketplace
Before we get started, let's create a key-pair to access our EC2 instance via SSH:
Our EC2 instances are accessed using a key-pair rather than a password.
Notice that it automatically downloads the private key for us.
Ok, let's create our EC2 instance.
We need to click on Launch Instance and Select an image from AWS Marketplace:
As seen above, I picked Amazon Linux 2 AMI for testing purposes. I selected the t2.micro type that only has 1 vCPU and 1 GB of memory.
For the record, AWS Marketplace is a repository of AWS official images and Community images.
Images are known as Amazon Machine Images (AMI).
Amazon has many instance types based on the number of vCPUs available, memory, storage, etc.
Think of it as how powerful you'd like your EC2 instance to be.
We then configure our Instance Details by clicking on Next: Configure Instance Details button:
I'll sum up what I've selected above:
Network: we selected our VPC (DevCentral)
Subnet: Public WebServer Subnet
Auto-assign Public IP: Enabled
Auto-assign IPv6 IP: Enabled
The reason we selected "Enabled" to auto-assignment of IP addresses was because we want Amazon to automatically assign an Internet-routable Public IPv4 address to our instance.
IPv6 addresses are always Internet-routable but I want Amazon to auto-assign an IPv6 address for me here so I selected Enabled to Auto-assign IPv6 IP too..
Notice that if we scroll down in the same screen above we could've also specified our private IPv4 address in the range of Public WebServer Subnet (192.168.1.0/24):
The Public IPv4 address is automatically assigned by Amazon but once instance is rebooted or terminated it goes back to Amazon Public IPv4 pool.
There is no guarantee that the same IPv4 address will be re-used.
If we need an immutable fixed Public IPv4 address, we would need to add an Elastic IPv4 address to our VPC instead and then attach it to our EC2 instance.
IPv6 address is greyed out because we opted for an auto-assigned IPv6 address, remember?
We could've gone ahead and selected our storage type by clicking on Next: Add Storage but I'll skip this.
I'll add a Name tag of DevCentral-Public-Instance, select default Security Group assigned to our VPC as well as our previously created key-pair and lastly click on Launch to spin our instance up (Animation starts at Step 4):
After that, if we click on Instances, we should see our instance is now assigned a Private as well as a Public IPv4 address:
After a while, Instance State should change to Running:
First Attempt to Connect via SSH
If we click on Connect button above, we will get the instructions on how to SSH to our Public instance:
Let's give it a go then:
It didn't work!
That would make me crack up once I got started with AWS, until I learn about Network ACLs and Security Groups!
Network ACLs and Security Groups
When we create a VPC, a default NACL and a Security Group are also created.
All EC2 instances' interfaces belong to a Security Group and the subnet it belongs to have an associated NACL protecting it.
NACL is a stateless Firewall that protects traffic coming in/out to/from Subnet.
Security Group is a stateful Firewall that protects traffic coming in/out to/from an EC2 instance, more specifically its vNIC.
The following simplified diagram shows that:
What's the different between stateful and stateless firewall?
A Security Group (stateful) rule that allows an outbound HTTP traffic, also allows return traffic corresponding to outbound request to be allowed back in.
This is why it's called stateful as it keeps track of session state.
A NACL (stateless) rule that allows an outbound HTTP traffic does not allow return traffic unless you create an inbound rule to allow it.
This is why it's called stateless as it does not keep track of session state.
Now let's try to work out why our SSH traffic was blocked.
Is the problem in the default NACL?
Let's have a look.
This is what we see when we click on Subnets → Public WebServer Subnet:
As we can see above, the default NACL is NOT blocking our SSH traffic as it's allowing everything IN/OUT.
Is the problem the default Security Group?
This is what we see when we click on Security Groups → sg-01.db... → Inbound Rules:
Yes! SSH traffic from my external client machine is being blocked by above inbound rule.
The above rule says that our EC2 instance should allow ANY inbound traffic coming from other instances that also belong to above Security Group.
That means that our external client traffic will not be accepted.
We don't need to check outbound rules here because we know that stateful firewalls would allow outbound ssh return traffic back out.
Creating a new Security Group
To fix the above issue, let's do what we should've done while we were creating our EC2 instance.
We first create a new Security Group:
A newly created non-default SG comes with no inbound rules, i.e. nothing is allowed, not even traffic coming from other instances that belong to security group itself.
There's always an explicit deny all rule in a security group, i.e. whatever is not explicitly allowed, is denied.
For this reason, we'll explicitly allow SSH access like this:
In real world, you can specify a more specific range for security purposes.
And lastly we change our EC2 instance's SG to the new one by going to EC2 → Instances → <Name of Instance> → Networking → Change Security Groups:
Another Window should appear and here's what we do:
Connecting via SSH again
Now let's try to connect via SSH again:
It works!
That's why it's always a good idea to create your own NACL and Security Group rules rather than sticking to the default ones.
Appendix - Spinning Up EC2 instance in Private Subnet
Let's create our private EC2 instance to test Internet access using our NAT gateway and Egress-Only Gateway here.
Our Private RT has a NAT gateway for IPv4 Internet access and an Egress-Only Gateway for IPv6 Internet access as shown below:
When we create our private EC2 instance, we won't enable Auto-assign Public IP (for IPv4) as seen below:
It's not shown here, but when I got to the Security Group configuration part I selected the previous security group I created that allows SSH access from everyone for testing purposes.
We could have created a new SG and added an SSH rule allowing access only from our local instances that belong to our 192.168.0.0/16 range to be more restrictive.
Here's my Private Instance config if you'd like to replicate:
Here's the SSH info I got when I clicked on Connect button:
Here's my SSH test:
All Internet tests passed and you should now have a good understanding of how to configure basic VPC components.
I'd advise you to have a look at our full diagram again and any feedback about the animated GIFs would be appreciated. Did you like them? I found them better than using static images.