CloudFormation is the standard to provision AWS resources. But developing a template is a lot of work. Let’s speed up development and maintenance by working together on high-quality templates:
- Reviewed by experts
- Highly available
- Easy to deploy and update
- Built-in monitoring, logging, and visibility
- Automatically tested
As a maintainer of Free Templates for AWS CloudFormation, I recently reflected on the project. I came up with the following questions to see how things are going.
How to Keep Stacks Updated?
Many of our clients use our Free Templates for AWS CloudFormation. Sometimes they create a stack on their own; sometimes I help them to create the first stacks. But many of those stacks get never updated after creation. That’s a shame in three ways:
- There are security fixes
- There are bugs
- There are improvements to be made. I've thought about a way to make it super easy to create and update stacks. At the moment, I'm experimenting with pipelines (CodePipeline) for each template. But I have not yet found an easy way to:
- Define acceptance and production environments in a general way that works for all templates
- Distribute the changes either from a repo or directly from S3 (should we host the templates or should the user clone them?)
How to Assure That Templates Are Working?
In February 2017, I added a test suite to the project. Before that, we maintained the tests in a private repository. But now the tests are also open source. What do we test? E.g.:
So each time the master branch changes, we run the entire test suite and create many CloudFormation stacks, which take hours to complete and costs us hundreds of US dollars each month. But this has paid off. We can make changes to the templates and can be sure that everything is still working.
We are always trying to improve the tests and reduce the time it takes to run them. At the moment, we cannot start the tests automatically on new pull requests, and we are a bit concerned because of potential abuse (each PR creates AWS charges on our end). The test suite is written in Java, and large parts are developed only for this project. I’m thinking about launching a project to make it easy to test CloudFormatiom templates, both by looking at the template but also by looking at the stack. Let me know if this is interesting to you!
What Is a Production-Ready Template?
I make mistakes. That’s why I always request a review by another expert. The posts on this blog, each template, all pages of AWS in Action were reviewed by my brother Andreas. He always catches something that I have missed. He adds a new perspective to the problem. He questions the whole approach and asks me to solve it differently. You can imagine that this drives my crazy from time to time. But in the end, the result is always better than before. Even if you don’t have a brother or sister with similar interests, you can still use Pull Request to ask a stranger for a review. Maybe you'll become friends one day! Pull Requests work pretty well.
A production-ready template needs to be secure. Security on AWS is complex, but we always try to follow the principle of least privilege. Keep security groups as tight as possible, avoid
* in IAM policies. It’s much easier to control Security Groups. The hard part is IAM policies. That’s why I started the Complete AWS IAM Reference: a place where anyone can find the information that is needed to create solid IAM policies. The two projects are closely linked together. I couldn’t maintain the one without the other, though I did not understand that when I started them. Today, I know that the reference stays up-to-date because of the templates that I develop. And the templates give me the chance to test what’s in the reference.
Reliable infrastructure is king. One component to achieve this is a highly available architecture. No single point of failures means fewer troubles. This approach also enables things like rolling updates and deployments without downtime. Now you can patch your system at any time, not only during short maintenance windows at night. AWS makes it super easy to run highly available infrastructure. Many services support it out of the box. Other services need more care. And sometimes you need to be aware of the limitations. In the Free Templates for AWS CloudFormation project, we document limitations. Otherwise, everything is highly available by default.
Elasticity is likely one of the reasons you use AWS. From the beginning, we wanted our templates to be scalable. Our Jenkins template adds build-agents when builds queue up. The WordPress template adds instances when load goes up. I recently read Brendan Gregg’s book about Systems Performance where I learned about queuing theory. One key takeaway was that if the utilization goes above 80%, the waiting time explodes. That’s why we now scale up when usage hits 60%. This should be enough time for most scenarios to bring up new servers before you hit the 80%. Still, this is not an optimal solution. That’s why Andreas started to work on a load test that you can use to adjust the scaling thresholds more easily.
AWS infrastructure can become complex. And as we all know, things go wrong in complex systems. That’s why you need as much information as possible about the system that is running. Two sources are monitoring and logging. CloudWatch Metrics store monitoring information from mostly all AWS services. CloudWatch Logs is a place where you can store and search your logs. Free Templates for AWS CloudFormation always had the CloudWatch Logs agent installed on EC2 instances. Since April 2017, we started to add CloudWatch Alarms. A CloudWatch Alarm observes a CloudWatch Metric. If the metric crosses a threshold, an alert is sent to an SNS topic. The Alert Topic template provides the SNS topic and also defines who will receive the alerts. We support email subscriptions and also integrate with our chatbot marbot, ensuring you never miss an alert from Amazon Web Services. Another important piece is to make monitoring and logging data visible. Since July 2017, CloudFormation supports CloudWatch Dashboards so we can now start to add dashboards to our templates. We will work on this in the next months.
How to Reuse and Modularize Templates
All templates that use EC2 instances will need a VPC. All templates need a place to send alerts to. As you can see, some modules can be separated out into dedicated templates. Luckily, AWS added Cross-Stack References in September 2016. Since then, you only provide a reference to the parent VPC stack and the all the IDs (VPC, subnet, route tables) are fetched automatically. This makes your life a lot easier and reduces a big source of errors. But that’s not yet optimal; you still need to pass the stack references around. I started to experiment with a small command line tool that makes it easier to create and update stacks based on our Free Templates for AWS CloudFormation. I’m not yet convinced, but I will continue to experiment.
How to Create a Sustainable Library
Last but not least, proper documentation is important. It makes it easy to get started, and it reduces the amount of GitHub issues. We moved to Read the Docs in May 2017, and since then, we have a versioned documentation with hosted, versioned templates as well. This should make it easier for you to learn about the templates. We still can improve the docs. At the moment we assume a certain knowledge of AWS. But we could make it easier for AWS newcomers to use the templates.
We are happy about the state of Free Templates for AWS CloudFormation. A few things we are experimenting with at the moment:
- Adding dashboards to all templates in separate templates to make monitoring data visible
- Offering a solution to create and update a stack based on our templates with CodePipeline to make updating templates easier
- Improving the runtime of the test suite
- Finding a better solution to create stacks that depend on each other (e.g. with a CLI tool)
- Enabling AWS newcomers to use our templates: The docs assume certain AWS knowledge at the moment
- Shipping load testing capabilities for templates that do auto scaling to allow you to fine tune the scale up/down thresholds