Best Practices for Infrastructure as Code with Terraform, Kubernetes, and Helm (Part 1)

In these series I’m going to explain how to set up your workspace to accomplish Infrastructure as Code with Terraform, Kubernetes and Helm. This setup is based on my real world experience as a DevOps…

Steven Hermans

May. 26, 20 · Opinion

Likes (2)

Comment

Save

5.7K Views

In these series I’m going to explain how to set up your workspace to accomplish Infrastructure as Code with Terraform, Kubernetes, and Helm. This setup is based on my real-world experience as a DevOps Engineer working with these techniques for over 3 years.

Concepts that these series will cover:

Disaster Recovery and Infrastructure as Code.
Setting up a remote workspace.
File structure.
Storing secrets.
Setting up a Terraform project.
Deploying applications with Helm.
Backup and restore process.

Disaster Recovery (DR) and Infrastructure as Code (IaC)

In this article, I’ll tell you some things you need to know about Disaster Recovery Plan and Infrastructure as Code. Disaster Recovery is the process of bringing your application back online and (partly) functional in any way possible when a major outage has happened. So it is good to have a plan for that. Infrastructure as Code on the other hand ensures that the current state of the infrastructure is written in Code. Which helps a lot during a DR event.

Mean Time to Repair (MTTR)

There is one thing that a DR Plan and IaC have in common, which is reducing the Mean Time to Repair during an outage. All outages are avoidable, but it still happens even to the best of us. Therefore, you should not only focus on how to prevent an outage but also on how to reduce the time it takes to repair it or go back to the previous working state.

Change Management

In order to reduce the Mean Time to Repair, it is important to have a clear overview of the changes that are made to the infrastructure and also the applications running on it. Next to server overloads, “changes” are the most leading causes of outages. Therefore this phrase: “Version everything!”.

Git is a useful and easy tool to track changes. In order to use Git, you’ll first need to manage your Infrastructure as Code. There are several tools that are really helpful in accomplishing IaC, like Terraform and Helm. I’ll dig deeper into these tools in one of the next episodes.

Deploy and Rollback Changes

I’ve read that some Ops teams use Continuous Deployment (CD) for deploying Infrastructure changes. As this sounds like a good idea, there are some drawbacks to it.

The first one is that you don’t have hands-on when things are starting to break and the changes that are being deployed are not always at the top of your mind anymore. This eventually will increase the Mean Time to Repair the disruption. Next to that, how are you going to do complex maintenance, like a database migration through CD?

My personal preference is to always be at the buttons when deploying something, so in case it goes wrong you’ll have all the possibilities open to resolving it quickly.

Consistency

Another thing that IaC will solve, is inconsistency throughout the infrastructure. Throughout my experiences, I’ve seen a lot of times when there are two servers that should be identical; they eventually become inconsistent over time.

Back-Up and Restore Process

An important part a DR Plan solves is to have a clear process of how to restore backups. The process of making backups is something that is done a lot of times automatically. So you gain experience over time with it when it breaks and you’ll have to fix it. But the restore process you’ll hopefully never use. Nevertheless it should be clear how it is done. Because when you’ll have to use it, you don’t have much time to figure it out.

In the end, there is one thing that is really important when doing Ops. In modern infrastructures, there are a lot of changes happening every day, so a mistake that causes a disruption is not a rare thing. Mistakes don’t matter and are inevitable. It matters how you respond to them. Therefore, focus on the Mean Time to Repair.

In the next article, I’ll talk about how to set up up your workspace to get started with IaC. Stay tuned!

Infrastructure as code Infrastructure Terraform (software) Kubernetes Disaster recovery

Opinions expressed by DZone contributors are their own.

Related

Trending