AWS VPC for Software Engineers

February 02, 2020

I've been working with AWS for about 3 years now. Going to the cloud can be extremely overwhelming and stressful. I jumped head first on a company that wanted to setup a brand new AWS account from scratch, setup pipelines for CI/CD, dockernize all services and use CloudFormation for provisioning everything. When I joined the company I had never logged into AWS console before and during my first meeting people kept talking about EC2 and I couldn't for the life of me figure out what that means. I ended up leading most of this migration and now, 3 years later, we're about 95% migrated. In this post I want to talk about how I finally managed to understand AWS VPC from a developer's perspective.

VPC

When I was a teen we used to have a lot of places called "LAN HOUSE" where you would pay per hour to use computers. They were connected to the internet and to each other, allowing games like Counter Strike to be played in LAN (Local Area Network) mode. A Virtual Private Cloud is a fancy name for a network of computers. It helped me associate these terms and assume a VPC is almost like an Internet Café from the 90's, early 2000's: computers in the same physical place directly connected to each other, forming a network and sharing an internet access point.

Subnets

Let me start by saying something that might be obvious for some, but took me longer than I'd like to admit to understand: There's nothing special about a subnet that makes it public or private. Every subnet is pretty much the same. The difference is in the Route Table, not the subnet itself.

Now that I got that out of the way, let's go back to the beginning. If you've been fiddling with computers in the 90's, you've probably configured a network card once in your life where you had an IP address of 192.168.1.1, 255.255.255.0 and a default gateway 192.168.1.254. Perhaps you didn't have these exact numbers, but if you did it's not coincidence that I know them. A subnet is like a boundary for a local network. On Linux, it's common to refer to the subnet mask as 192.168.1.0/24 and that's how AWS refers to it as well.

Travelling back to the Internet Café example, I remember once going to the mall and there was a cyber café there, the biggest I've ever seen. There was 3 floors and maybe more than 70 computers. It was amazing for a 13 year old kid. Now let's pretend for the sake of fun that each floor has it's own local network. People on the 3rd floor cannot play together in LAN mode with people on the 2nd floor. They are in different subnets. Computers in the 1st floor use IP addresses like 192.168.1.1 and the 3rd floor uses 192.168.3.1. The owner of the cyber cafe decided to define 3 subnets, one for each floor and restrict communication between them. One subnet would have a mask of 192.168.1.0/24, the 2nd would have 192.168.2.0/24 and the 3rd would be 192.168.3.0/24. On it's own, these subnets would not be able to communicate with each other.

Route Table

The route table contains rules and patterns on how a network package is dispatched through the local network area. AWS offers a special rule called local, which means the VPC itself. This is how Subnets are able to communicate with each other. When something on subnet 1 wants to communicate with an IP that belongs to subnet 2, the local entry in the Route Table defines how that communication will be established. I think the Route Table doesn't get the attention it deserves. There are so many content talking about public and private subnet that doesn't touch on this: The Route Table is responsible for defining whether a subnet is public or private. These are human readable constructs. All subnets are practically the same. What AWS defined as a public subnet is a subnet that has an Internet Gateway in it's Route Table. In other words, a device running on a public subnet can communicate with the internet. A private subnet, to communicate with the internet, will require a NAT Gateway in the Route Table.

NAT

NAT stands for Network Address Translation and is used to translate local network addresses into public routable addresses. Understanding IPv4 is key to understanding why NAT even exists. Let's remember. IPv4 has 3 classes of private ip addresses: 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16. Any local computer network will have IP addresses that are inside these subnet definition. They are reserved as private ips and can never be routed on the internet. Put differently, a website will never have an ip address of 192.168.56.1 on the internet. With the exception of loopback interface, pretty much all other ip addresses on IPv4 is public. It means that your internet device at your home is capable of sending a request to 55.55.55.55 and getting a response back.

The easiest way to understand NAT is to use your home as an example. Your smartphone or laptop is assigned a local ip address from your internet modem. When you try to communicate with the internet, the network card in your laptop sends a request to your internet modem containing information you're trying to retrieve. The modem will then make a note of your computer's local ip address and use it's own public ip address to talk to the internet. That is the essence of translating network addresses. It translates a private one with a public one. Once the response from the internet comes back to the modem, it can then check who had requested that information (your laptop) and translate back to local address so that the package reaches your computer.

Things running on a private subnet will have a private ip address and will need NAT to communicate with the internet. Nothing on the internet can communicate directly with a device in a private subnet.

Things running on a public subnet will also have a private ip address. And this was partially the confusing part for me for a long time. The reality is that any compute device you launch on a VPC will either be part of a public or private subnet but will have a private ip address either way. The key difference is that if you auto-asign public ip to said device, it will be able to communicate directly to the internet without NAT. The Route Table of a public subnet contains an internet gateway rule and not a NAT Gateway rule.

Bonus confusion: A NAT Gateway is a compute device. as such, it has a network card and is in a subnet. We place the NAT Gateway in a public subnet because it needs to be able to communicate with the internet. When a device in a private subnet tries to reach the internet, it goes more or less like this:

Your device with a private IP prepares a network packet
The Route Table says the packet should go to the NAT Gateway
The NAT Gateway translates the private ip into a public IP
The NAT Gateway's Route Table routes the packet to the Internet Gateway
The packet reaches the internet and comes back to the NAT Gateway
The NAT Gateway translates the IP back to your device's private ip.
The packet reaches back to your device.

Availability Zones

AWS is much more complex and interesting than a cyber café with a few computers connected directly via a network cable. A VPC exists within an AWS Region and is available on several availability zones. This means AWS has to establish a secure channel between 2 different cities to establish communication between two subnets in the same VPC. This, however, is completely abstrated away and not that much relevant for a day to day work with AWS. When configuring the Route Table with local construct, AWS will handle any communication between 2 subnets in different AZs.

VPC Endpoint

Once I finally understood subnets and how the Route Table is responsible for internet communication (either via IG or NAT), it was mind-blowing to understand how AWS leverage this to offer VPC Endpoint.

Some AWS services are VPC-agnostic, such as S3, SQS, SNS, DynamoDB, etc. They are computeless resources from the account perspective. Any compute resource you want to run on your AWS Account has to run inside a VPC (except Lambda, but that's a whole other story). Services that are API-driven and does not require any compute resources on our side does not live inside our VPC. The only way to access them is via the public internet. What if AWS bought a range of public ip addresses and assign it to a specific service? For instance, S3 could be running on 54.231.0.0/17. This means we can leverage our Route Table for a specific rule. If an internet packet is directed to an IP address that matches this rule, use this VPC endpoint. Instead of falling back to the Internet Gateway, AWS can actually identify that you're going to communicate with S3 and use a dedicated infrastructure for that so that you don't have to pay for NAT. It is still a NAT that is making it all possible, but from AWS perspective, they built a special kind of NAT just for S3 so that they can charge less and use a structure optimized for that.

Conclusion

This post covers many components such as AWS VPC, Subnet, Internet Gateway, NAT Gateway, VPC Endpoint, etc. Perhaps one or two terms used in the text might be semantically incorrect or from a Network Administrator perspective I might be saying a lot of crap, but I decided to publish this anyway because for me, a developer, this knowledge helps a lot to debug and understand AWS infrastructure.

Hope you enjoyed the reading. If you have any questions, send them my way on Twitter.

Cheers.