Lessons Learned From 1,000 of SoftNAS' AWS VPC Configurations
In this webinar, one company covers what they've learned from a host of AWS VPC configurations that they've made for companies of all sizes.
Join the DZone community and get the full member experience.Join For Free
Summary: Missed our webinar on, "What We Learned from 1,000 AWS VPC Configurations?" No worries. Watch the recording of our webinar, access the slides, and read below for a word-for-word summary of the webinar.
In this webinar, we covered what we learned from 1,000 AWS VPC configurations that we've configured for companies of all sizes: small businesses, Fortune 100 companies and everything in between. We covered some of the lessons that we personally learned in over 1,000 different AWS VPC deployments and configurations that we've seen here at SoftNAS.
Watch the recording:
See the slides on Slideshare: What We Learned from 1,000 AWS VPC Configurations.
Whitepaper: SoftNAS Architecture on AWS: Read the Whitepaper.
AWS VPC Configuration with SoftNAS Cloud NAS: See the guide here.
SoftNAS Cloud NAS on the AWS Marketplace: Visit SoftNAS on the AWS Marketplace.
What is an AWS VPC?
We're going to cover some AWS VPC information and the lessons that we've learned from the 1,000+ AWS VPC deployments and scenarios that we've seen over the past few years. We'll cover how SoftNAS Cloud NAS fits into the AWS VPC. I'll give a quick little demo of SoftNAS Cloud NAS high availability in an AWS VPC, and then we're going to leave some time at the end to cover a Q&A for any AWS VPC questions you might have.
Let's talk about the AWS VPC, or Amazon Virtual Private Cloud. What exactly is an AWS VPC? It's a virtual network that's dedicated for your environment. It's logically isolated from other virtual networks in the cloud, and think of it as your own private data center or your own private network within your AWS account. It provides you with a location for launching resources, such as EC2 instances, as a SoftNAS and EC2 instance or whatever EC2 instances you want that you want to physically isolate into your own private environment.
It gives you some configuration options that are not generally available in the EC2 classic surface, such as being able to configure your private IP address ranges, set up subnets, control and manipulate the routing versus route tables, set up different networking gateways which will cover the different types of gateways that are available, and get very granular with your security settings from both a network ECL as well as a security group type setting.
When you talk about the main features that are available within an AWS VPC, you can talk about control, right? You actually configure what is going to be your IP address range, how the routing actually works, whether or not you're actually going to allow VPN access, and what the actual architecture of the different subnets are going to be within the AWS VPC. You have security options such as security groups and network ACLs as well as specific routing rules that can be configured. That allows you some different features, such as running multiple NIC interfaces. You get static private IP addresses, and some of the EC2 instances are actually only available for launching the VPC, such as the T2s.
You can use an AWS VPC to leverage and create an AWS hybrid cloud by leveraging the AWS Direct Connect service, which allows you to extend your premise into the AWS cloud over a high-bandwidth, low-latency type of connection, and there's some network advantages, leveraging things such as VPC peering or you can actually connect your AWS VPC to another AWS VPC. This could be done for your organization or you could actually use it to connect to other organizations for specific services or specific access if that were required. Plus you have things like endpoint flow logs that can actually help you with troubleshooting, connectivity issues or problems that you may be having providing gaining access into specific services within the VPC itself.
AWS VPC Topology
Just a couple of notes on AWS VPC topology. AWS VPCs are used in a single region, right? But they are multi-availability zones, which basically means that each subnet that you create has the availability to live in a different availability zone, or you can put them all in a single-availability zone should you like. All of the subnets that you create within an AWS VPC can route to each other by default. The overall network size for a single VPC can be anywhere between a16 or a28 subnet for the overall CIDR of the AWS VPC, and that's also configurable for each of the subnets that you want to set within the AWS VPC. It also gives you the ability to choose your own IP prefix, so if you want the 10 network or the 50 network or whatever you'd like your private IP address to be, that's going to be configurable within your AWS VPC topology.
How can you gain access to this AWS VPC and how do the resources within it actually gain access out? There're several different types of gateways, and one of the things that we get asked quite frequently is, what does each of these gateways do? How do they work, right? There was this IGW and VPC and CG, and what do they mean and how do they function? Hopefully this gives you a pretty brief and easy explanation.
The internet gateway is actually the internet gateway, so you can point specific resources within your AWS VPC via route tables to actually gain access to the outside world via the use of the internet gateway, or you can actually leverage a NAT instance, and more about NAT later, right? And then when it comes to actually providing VPN access into your VPC, whether that would be done via like, say, direct connect where you had dedicated bandwidth to connect to your AWS VPC, or leverage a hardware based VPN. There's two parts; there's the VPG which is the virtual private gateway which is the actual AWS side of a VPN connection, and then there's the customer gateway which is the customer side of a VPN connection. Most of the major VPN hardware vendors have supported template configurations that you can download directly from the virtual private gateway interface within your AWS VPC via the AWS console.
How do the packets actually flow within an AWS VPC? Let's just take a sample setup in your AWS VPC, and let's actually talk about how the packets will flow and how you would connect. In our example here, we have three subnets: the 10.0.0.0, the 10.0.1.0, the 10.0.2.0. We've got three instances; instance C here is connected to subnet 3, instance A is connected to subnet 1. And instance B actually has two elastic network interfaces, or ENIs, that are connected to two different subnets, right, subnet 1 and subnet 2, and understanding the logical flow of packets within an AWS VPC really was an eye-opening and enlightening experience for me and my team that allowed us to actually be able to troubleshoot and deploy environments a lot better.
Accessing the AWS VPC
Let's talk about how instance A and instance B actually connect to each other over subnet 1. They're both living in the same subnet, so by default the routing table is the first thing that it hits, and that routing table has automatically a default route associated with it to route to all traffic within the overall CIDR of the AWS VPC. Next, it hits the ARP table, the outbound portion of the firewall, a source and destination check actually occurs, which is a configurable option within AWS. Then it hits the outbound security group which by default, the outbound security group is wide open, okay? All traffic is allowed out. It then goes over to the other instance and checks the inbound security group, a second source and destination check and then hits the firewall before the packet actually flows into instance B.
People will say, "I can SSH or I can't SSH or I can ping but I can't ping," and a lot of the problems that people experience in troubleshooting connectivity here is primarily around the security groups, primarily on the inbound side because the outbound is actually opened by default. This is usually the first place to check when you're having some type of connectivity issue within the AWS VPC to ensure that the security group is not actually blocking the type of traffic based upon either source or destination IP or port number, for example, that may be impairing your connectivity.
AWS VPC Packet Flow
Now where it's a little bit more complicated is, how would the packets flow to instance B and C? So let's just go quickly back to make sure that we understand that we've got instance B which is living in two subnets and instance C which is actually living in subnet 3, right? So how would that actually look? So if instance B wanted to talk to instance C, it can go one of two ways. It could go out subnet 1 or subnet 2, but the same actual rules apply, right? It's going to hit the route table, go to the firewall, source destination check, security group out.
It's going to check the route table to make sure it has a route to that destination network, and then because it's going to a different network it's actually going to check the network ACL out, and then on the reverse side it comes back in. It's going to check the network ACL in before it checks the security group, so it's different types of connectivity options for instances that happen to live in a different subnet, right? So this is some very important information. Hopefully you'll find that it's useful. I know that from my perspective and my team's perspective, once we really understood how the packets flow and where they were going to, how everything was being checked and designed, it really allowed us to understand a lot better troubleshooting and looking at different connectivity issues.
Lessons Learned: AWS VPC Best Practices
Let's talk about some of the lessons that we've learned in all of these multiple different AWS VPC scenarios that we've seen. I've personally put my hands on, 85% of these 1,000 plus AWS VPCs, either from different tests that I've ran and created myself or engaging with customers who are deploying SoftNAS Cloud NAS in their environment, troubleshooting SoftNAS Cloud NAS in their environment, etc.
The first best practice is to be very organized with your AWS environment. We highly recommend that you use tags, and you'll thank us for that tip later on one day because as you continue to add instances, create route tables, subnets, all kinds of different things, it's nice to know what's associated with what, right? And the simple just use of the tags will make life so much easier when it comes to actually troubleshooting things. Make sure you plan your CIDR block very carefully. We would suggest that you go a little bit bigger than you think you need and not smaller.
Remember that for every subnet that you create, AWS takes five of those IP addresses for subnet, so when you create a subnet know that off the top there's a five IP overhead. Avoid using overlapping CIDR blocks, and the reason being that at some point, you may not want to do it today but you may want to do it down the road, you may want to pair this AWS VPC with another AWS VPC, and if you have overlapping CIDR blocks, the pairing of the AWS VPC will not function correctly and you're going to find yourself in a world of configuring nightmare in order to be able to get those VPCs to pair. Try to avoid using overlapping CIDRs, and always save a little bit of space for future expansion. There's no cost associated here with using a bigger CIDR block, so don't undersize what you think you may need from an IP's perspective just to try to make it clean and easy, you know, off the bat.
You can subnet your way to success, right? And so understand, what is your subnet strategy going to be? I would suggest that you align your subnets to different tiers as humanly possible, such as, you know, DMZ/Proxy layer, ELB layer if you're going to be using load balancers, application or database layer. Remember, if your subnet is not associated to a specific route table, then by default they're going to the main route table, and that little tip right there has caught up a lot of people in my dealings where they created this route table and they've got a subnet but they've associated the subnet to the route table but they thought they did, so the packets aren't flowing where they think that they are.
I would suggest that you put everything in a private subnet by default and use either ELB filtering and monitoring type services in your public subnet. You can use NAT to gain access to public networks. I would highly recommend, and you'll see this later, that you use a, you know, dual NAT configuration for redundancy. There's some great cloud formation templates that are available to set up, you know, highly available NAT instances and make sure that you size those instances properly for the amount of traffic you're going to actually push into your network.
You can go ahead and set up AWS VPC peering for access to other AWS VPCs within your environment or maybe from a customer or a partner environment, and I would highly suggest leveraging the endpoints for access to services like S3 instead of actually going out either over a NAT instance or over an internet gateway in order to gain access to services that may not live within the specific AWS VPCs. They're very easy to configure and they're actually much more efficient and have lower latency by leveraging an endpoint than actually going out over a NAT or over an internet gateway to gain access to something like S3 from your instance, okay?
Control your access. Don't get lazy and just put a default route to the internet gateway so you can get out. I see a lot of people that do this, and, you know, it comes back to cause them problems later on. I mentioned to use redundant NAT instances. There is some great cloud formation templates available from Amazon on creating a highly available redundant NAT instance, and again make sure you size those instances properly. The default NAT instance size is an m1.small, which may or may not suit your needs depending upon, you know, the amount of traffic you're going to use, and I would highly recommend that you use IAM for access control, especially configuring IAM roles to instances, and remember that IAM roles cannot be assigned to running instances. It has to be set during instance creation time, and using those IAM roles will actually allow you to not have to continue to populate, you know, AWS keys within the specific products in order to gain access to some of those API services.
SoftNAS and AWS VPC
How does SoftNAS Cloud NAS fit into AWS VPCs? We have a highly available architecture from a storage perspective, leveraging our SNAP HA capability, which allows us to provide high availability across multiple different availability zones. We leverage our underlying secure block replication with SnapReplicate, and we highly recommend using SNAP HA in a high-availability mode which would give you a no downtime guarantee, plus, you know, a five nine uptime, and also it's important to remember that Amazon provides no SOA unless you run in a multi-zone deployment, right? So a single AZ deployment has no SLE within AWS.
We have two methods of actually deploying our cross-zone high availability here at SoftNAS. The first is actually to leverage the use of elastic IPs, where you have two separate controllers, each in their own availability zones. They're actually placed in the public subnet and we assign each node an EIP or an elastic IP address. We use a third elastic IP address as our VIP or virtual interface. You can figure SnapReplicate between the two instances which will provide you the underlying block replication, and then what happens is that the elastic IP address that's considered to be the VIP IP address is assigned to whatever's the primary controller, and whatever services you have from an NFS, CIFS or iSCSI perspective will actually mount or map drives to that elastic IP address, and then if there is a failover or failure of the storage instance, then it will, you know, move that elastic IP address over from the primary controller to the secondary controller should anything trigger our HA monitor, which looks at things like health of the file system, health of the network, at multiple different levels. This is applicable for doing things like backing EBS with SoftNAS, using S3 with SoftNAS, any of the AWS storage types can be leveraged here to give you a highly available architecture, utilizing elastic IP addresses.
The second mode is to use a private virtual IP address where both SoftNAS Cloud NAS instances actually live within a private subnet and don't have any access out, and what you would actually do there is it's the same underlying SnapReplicate technology and monitoring technology. However what happens here is you actually pick a virtual IP address that is outside of the CIDR block of your AWS VPC, your clients map to it, there's an entry that's automatically placed into the route table, and should there be a failover occur we'll update the route table automatically in order to route the track properly to the proper controller that should be the primary at the time. This is probably the more common way of deploying SoftNAS in a highly available architecture.
Common AWS VPC Mistakes
And so just a couple of common mistakes, and this comes from our support team that they see the customers do. It's that each of these deployments require two ENIs or two NIC interfaces, and both of those NICs need to be in the same subnet. You need to make sure that you check this when you're creating your instances or adding the ENS, and make sure that both NICs are in the same subnet. The other common is that one of the health checks we actually perform is to do a ping between the two instances, and the security group isn't always open to allow the ICMP health check to happen which will cause an automatic failover to happen if we can't gain access to the other instance. We do actually leverage an S3 bucket here in our HA deployment as a third party witness, so if you deploy SoftNAS as your private subnet, we do need to gain access to the S3, either via NAT or the configuration of an S3 endpoint within the VPC.
And again, as I mentioned just a few moments ago, for private HA, a virtual IP address must not be in the same CIDR of the AWS VPC. So if your CIDR is 10.0.0.0/16, then you need to pick a virtual IP address that doesn't fit within that subnet, so say 220.127.116.11.1 would work in that particular case or whatever, you know, works for you best, but it cannot fall within the CIDR block of the AWS VPC, or the route failover mechanism that we're leveraging will not function properly.
AWS VPC Q&A – Questions from Webinar Attendees
- We use VLANs in our data centers for isolation purposes today. What VPC construct is recommended to replace VLANs in AWS?
- That would be subnets, so you could either leverage the use of subnets or if you really wanted to get, you know, a different isolation mechanism, create another VPC to isolate those resources further and then actually pair them together via the use of VPC pairing technology.
- You said to use IAM for access control. What do you see in terms of IAM best practices for AWS VPC security?
- So the biggest thing is that you deal with, you know, either third party products or customized software that you made on your web server. Anything that requires use of AWS API resources need to use, you know, a secret key and an access key, so, you know, a, you can store that secret key and access key in some type of text file and have it, you know, reference it, or, b, the easier way is just to set the minimum level of permissions that you need in the IAM role, create this role and attach it to your instance and start time. Now, the role itself can't be assigned, only during start time. However, the permissions of several can be modified on the fly. So you can add or subtract permissions should the need arise.
- When you're troubleshooting the complex VPC networks, what approaching tools have you found to be the most effective?
- We love to use traceroute. I love to use ICMP when it's available, but I also like to use the AWS Flow Logs which will actually allow me to see what's going on, you know, in a much more granular basis, and also leveraging some tools like CloudTrail to make sure that I know, you know, what API calls were made by what user in order to really understand what's gone on.
- What do you recommend for VPN intrusion detection?
- There's a lot of them that are available. We've got some experience with Cisco and Juniper for things like VPN and, Fortinet, whoever you have, and as far as IVS goes, you know, it seems like Alert Logic is a popular solution. I see a lot of customers that use that particular product. Some people, you know, like some of the open source tools like Snort and things like that as well.
- Any recommendations around secure junk box configurations within AWS VPC?
- If you're going to deploy a lot of your resources within a private subnet and you're not actually going to use a VPN, one of the ways that a lot of people do this is to just configure a quick junk box, and what I mean by that is just to take a server, whether it be a Windows or Linux, depending upon your preference, and put that in the public subnet and only allow access from a certain amount of IP addresses over to either SSH from a Linux perspective or RDP from a Windows perspective. It puts you inside of the network and actually allows to, you know, gain access to the resources within, you know, the private subnet.
- And do junk boxes sometimes also work? Are people using VPNs to access the junk box too for added security
- Some people do that. Sometimes they'll just put like a junk box inside of the VPN and your VPN into that. Some people don't want it to be exposed to the internet in any way shape or form. It's just a matter of your organization security policies.
- Any performance or further considerations when designing the VPC?
- It's important to understand that each instance has its own available amount of resources, from not only from a network IO but from a storage IO perspective, and also it's important to understand that 10GB, a 10GB instance, like let's say take the c3.8xl which is a 10GB instance. That's not 10GB worth of network bandwidth or 10GB worth of storage bandwidth. That's 10GB for the instance, right? So if you have a high, you know, amount of IO that you're pushing there from both a network and a storage perspective, that 10GB is shared, not only from the network but also to access the underlying EBS storage network. This confuses a lot of people, so it's 10GB for the instance not just a 10GB network pipe that you have.
- Why use an elastic IP instead of the virtual IP?
- What if you had some people that, you know, wanted to access this from outside of AWS? We do have some customers that, you know, primarily their servers and things are within AWS, but they want access to files that are running, that they're not inside of the AWS VPC. So you could leverage it that way, and this was the first way that we actually created HA to be honest because this was the only method at first that allowed us to share an IP address or work around some of the public cloud things like, you know, node layer to broadcast and things like that.
- Looks like this next question is around AWS VPC tagging. Any best practices for example?
- Yeah, so I see people that basically take different services, like web and database or application, and they tag everything, you know, within the security groups and everything with that particular tag. For people that are deploying SoftNAS, I would recommend just using the name SoftNAS as my tag. It's really up to you, but I do suggest that you use them. It will make your life a lot easier.
- Is storage level encryption a feature of SoftNAS Cloud NAS or does the customer need to implement that on their own?
- So as of our version that's available today which is 3.3.3, on AWS you can leverage the underlying EBS encryption. We provide encryption for Amazon S3 as well, and coming in our next release which is due out at the end of the month we actually do offer encryption, so you can actually create encrypted storage pools which encrypts the underlying disk devices.
- Virtual VIP for HA: does the subnet this event would be part of add in to the AWS VPC routing table?
- That's automatically taken care of. What will happen is that when you select that VIP address in the private subnet, it will automatically add a host route into the routing table to allow the clients to route that traffic, and if you need some more assistance, you can contact us via support and we'll happily help you get that set up.
- Can you clarify the requirement on an HA pair with two next, that both have to be in the same subnet?
- So each instance you need to move NIC ENIs, and each of those ENIs actually need to be in the same subnet.
- Do you have HA capability across regions? What options are available if you need to replicate data across regions? Is the data encrypted at-rest, in-flight, etc.?
- We cannot do HA with automatic failover across regions. However, we can actually do SnapReplicate across regions, and then you could do a manual failover should the need arise and that data that we transfer via SnapReplicate is sent over SSH, so the data could be encrypted in-flight and at-rest and you could replicate across region. You could replicate across data centers. You could even replicate across different cloud markets.
- Could AWS VPC pairings span across regions?
- The answer is, no, that it cannot.
- Can an HA endpoint be created internally to AWS for use with direct connect?
- Absolutely. You could go ahead and create an HA pair of SoftNAS Cloud NAS, leverage direct connect from your data center and access that highly available storage.
- When using S3 as a backend and a write cache, is it possible to read the file while it's still in cache?
- The answer is, yes, it is. I'm assuming that you're speaking about the eventual consistency challenges of the AWS, you know, standard region; with the manner in which we deal with S3 where we treat each bucket as its own hard drive, we do not have to deal with the S3 consistency challenges.
- Regarding subnets, the example where a host lives in two subnets, can you clarify both these subnets are in the same AZ?
- In the examples that I've used, each of these subnets is actually within, you know, its own VPC, assuming its own availabilities. So, again, each subnet is in its own separate availability zone, and if you want to discuss more, please feel free to reach out and we can discuss that.
- Is there a white paper on the website dealing with the proper engineering for SoftNAS Cloud NAS for our storage pools, EBS vs. S3, etc.?
- Click here to access the white paper, which is our SoftNAS architectural paper. It was co-written by SoftNAS and Amazon Web Services for proper configuration settings, options, etc. We also we have a pre-sales architectural team that can help you out with best practices, configurations, and those types of things from an AWS perspective. Please contact firstname.lastname@example.org and someone will be in touch.
- How do you solve the HA and failover problem?
- We actually do a couple of different things here. When we have an automatic failover, one of the things that we do when we set up HA is we create an S3 bucket that has to act as a third party witness. Before anything is allowed to take over as the master controller or the primary controller, whatever term you prefer, it queries that S3 bucket and, you know, makes sure that it's entitled to actually go ahead and take over. The other thing that we do is after a take-over, the old source node is actually shut down. You don't want to have a situation where the node is flapping up and down and, you know, it's kind of up but kind of not and it keeps trying to take over, so if there's a take-over that occurs, whether it's manual or automatic, the old source node in that particular configuration is shut down. That information is logged, and we're assuming that you'll go out and, you know, investigate as to why the failover took place. If there's questions about that in a production scenario, email@example.com is always available.
- Can we monitor SoftNAS logs using SplunkSumo and see which log file we should monitor?
- Absolutely, but we also provide some built-in log monitoring. They key logs here are going to be in the SnapReplicate.log which controls all of your SnapReplicate and HA functionality. The snserv.log, which is the SoftNAS server log which controls all things done via StorageCenter, and then of course, you know, because this is a Linux-based operating system, monitoring our log messages is always a good idea. That's just a smattering of those. You know, for more specifics please reach out to sales and let one of our pre-sales consultants help you out with those types of things as well.
Looks like that's all the questions. We hope that you found it to be useful and insightful and that you gained something out of it, and didn't feel as if we just completely marketed to you. Our goal here was just to pass on some of the lessons that we've learned when configuring AWS VPC deployments for our customers and some of the different things that we've seen. As you're making that journey to deploying in the cloud or you're already operational in the cloud, maybe this webinar saved you time from tripping over some of the things that other customers have tripped over.
Published at DZone with permission of Taran Soodan, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.