AWS VPC NAT Instance Failover and High Availability
The Cloud Zone is brought to you in partnership with Iron.io. Discover how Microservices have transformed the way developers are building and deploying applications in the era of modern cloud infrastructure.
Amazon Virtual Private Cloud
(VPC) is a great way to setup an isolated portion of AWS and control
the network topology. It is a great way to extend your data center and
use AWS for burst requirements. With the latest VPC for Everyone
announcement, what was earlier "Classic" and "VPC" in AWS will soon be
only VPC. That is, every deployment in AWS will be on a VPC even though
one might not need all the additional features that VPC provides. One
might eventually start looking at utilizing VPC features such as
multiple Subnets, Network isolation, Network ACLs, etc.. Those who have
already worked with VPC's understand the role of NAT Instance in a VPC.
- A Public Subnet that has direct internet connectivity through the Internet Gateway. Web Instances can be placed within the Public Subnet
- The custom Route Table associated with Public Subnet will have the necessary routing information to route traffic to the Internet Gateway
- A NAT Instance is also provisioned in the Public Subnet
- A Private Subnet that has outbound internet connectivity through the NAT Instance in the Public Subnet
- The Main Route Table is by default associated with the Private Subnet. This will have necessary routing information to route internet traffic to the NAT Instance
- Instances in the Private Subnet will use the NAT Instance for outbound internet connectivity. For example, DB backups from standby that needs to be stored in S3. Background programs that make external web services calls
|Public and Private Subnets with multiple Availability Zones|
- Additional Subnets (Public and Private) are created in one another Availability Zone
- Both Private Subnets are attached to the Main Routing Table
- Both Public Subnets are attached to the same Custom Routing Table
- Instances in the Private Subnet still continue to use the NAT Instance for outbound internet connectivity
|NAT Instance High Availability|
- Each Subnet is associated with its own Route Table
- NAT1 is provisioned in Public Subnet 1
- NAT2 is provisioned in Public Subnet 2
- Private Subnet 1's Route Table (RT) has routing entry to NAT1 for internet traffic
- Private Subnet 2's Route Table (RT) has routing entry to NAT2 for internet traffic
|NAT Instance HA Illustration|
A script can be installed on both the NAT Instances to monitor each other and swap the routing table association if one of them fails. For example, if NAT1 detects that NAT2 is not responding to its ping requests, it can change the Route Table of Private Subnet 2 to NAT1 for internet traffic. Once NAT2 becomes operational again, a reverse swapping can happen. AWS has a pretty good documentation on this and a sample script for the swapping.
Apart from HA, the above architecture also provides better overall throughput, since during normal conditions, both NAT Instances can be used to drive the outbound internet requirements of the VPC. If there are workloads that requires a lot of outbound internet connectivity, having more than one NAT Instance would make sense. Of course, you are still limited with one NAT Instance per Subnet.