Policy-as-Code for Terraform in Regulated Environments

Treat your security rules and compliance like tests that run every time you perform Terraform Plan. Learn how Policy-as-Code (PaC) allows you to do that.

Jas Tandon

Oct. 01, 25 · Analysis

Likes (2)

Comment

Save

5.8K Views

Why Does It Matter?

When we talk about a regulated workload, we talk about compliance. These compliances are industry standards that govern how data is processed, stored, and managed. That is why these workloads need to be clean and should be assessed based on controls we can prove. Examples of such practices are Least-Privilege access, encryption at rest, clear network boundaries, and auditability, to name a few.

And then we have frameworks like NIST SP 800-53 Rev. 5, Security and Privacy Controls for Information Systems and Organizations. It provides a comprehensive set of security and privacy controls, and then we have CIS Foundations Benchmarks that translate security best practices into cloud-specific configuration checks. But none of them are enforced by themselves. But if you configure your pipeline in such a way, it can then be enforced.

Policy-as-Code (PaC) does exactly that. It basically converts written controls into a repeatable program that can be reviewed as well.

It can be conceived as a small file in Git, which is peer-reviewed just like any other change. It can then be versioned and executed automatically just before infrastructure is created or modified.

This helps as it becomes a gate-guard for your infrastructure. The first line of defense. Because when the policy fails, you will end up with a deterministic result along with a very short but informative explanation. Which is easy to understand and interpret.

This then substantiates that these policies are enforced by your infra team.

The 3 Practical Tiers of the Enforcement Model

1. Git Pull Request Checks

Every time you create a pull request (PR) on GitHub, you run a lightweight Infrastructure-as-Code scan. This will be more like a spell check. It will flag the obvious mistakes early on. For example, if there is a security group opened to the world, or a storage bucket is without encryption, or a resource is missing ownership tags.

Tools like Checkov work with raw HCL, and better yet, can analyze the output of a Terraform plan so that they see values as they are determined by modules, variables, and data sources.

Terraform can then expand the blueprint of that plan in a machine-readable form:

    Shell
   
   terraform plan -out=tfplan

terraform show -json tfplan > tfplan.json

Feeding the JSON to the scanner will give higher fidelity than HCL-only checks and cuts false fires.

2. Terraform Plan-Time Gate Enforcement

Next up is enforcing decisions against what Terraform will actually change. In Terraform/Terraform Enterprises, you can group policies into policy sets and attach them globally or to particular selected workspaces and projects.

Each policy will have an enforcement level (Common advisory, soft-mandatory, or hard-mandatory), so that you can evolve the strategy from just a “nudge” to a hard “block” as the rule stabilizes.

If you prefer open tooling, Open Policy Agent (OPA) evaluates Rego policies against the plan JSON from Terraform directly. Either way, this is where non-negotiables live, i.e., encryption, network exposure on admin ports, and production IAM scope.

3. Organizational Guardrails

Pipelines aren’t the only way that infrastructure changes.

AWS Service Control Policies (SCPs). These limit the maximum permissions available to principals in a member account. What’s fascinating about these is that they even apply to the member account root user and require Organizations in “all features” mode. So, we can use them to forbid unwanted actions globally, or we can force certain patterns like “no creation of public IPs” or “only approved regions.”
If we talk about Azure, we can attach policies with definitions with effects like audit, deny, and deployIfNotExists (which can be remediated by deploying missing resources). Microsoft’s guidance is to start with an audit, then move to deny once you understand the impact. This is absolutely the correct way forward in such scenarios. Azure also has first-class exemptions with scope and expiration dates, which prevent exceptions from lingering indefinitely.
For GCP Organization Policy, it enforces constraints at the org, folder, or project level, which is perfect.

Finally, we should keep the runtime posture enabled. For example, AWS Security Hub implements the CIS AWS Foundations checks, so any configuration drift is viewed as a finding.

A Small Policy Pack With Big Payoff

The idea should always be that small changes must yield high results. We can achieve that with five sets of rules that deliver risk reduction almost immediately and are simple to explain.

Encryption at rest with customer-managed keys for object stores, disks, and databases. This maps to NIST cryptographic controls and gives you clean rotation and ownership stories.
No 0.0.0.0/0 default root/all traffic on administrative ports. You must restrict access to an approved CIDR set or bastion. This is important and is mostly ignored in a lot of cases in organizations.
No IAM wildcards in the production environments. You should only enforce role-based and least-privilege access. This helps put a backstop on unwanted access to resources by roles that are not sanctioned.
Standard ownership tags on every resource (i.e., Owner, CostCenter, Env). This helps route incidents quickly and keeps cost reporting honest and clean.
Region allow-list via SCP/Azure Policy/ GCP constraints. This helps keep data in approved jurisdictions and disaster-recovery plans stay sane.

Each of these rules must live as a short policy with a clear failure message (example: “EBS volumes must use a customer-managed KMS key”), a unit test, and a link to the control it enforces.

Example Guardrails (Rego + Sentinel + Terraform)

A. OPA/Rego (Used By Confest/Checkov Custom Rules)

Require KMS-encrypted S3 and forbid public ACLs.

Enforce required tags on all resources.

B. Sentinel (Terraform Cloud/Enterprise)

Block unencrypted EBS volumes.

Require business tags on every resource.

C. Terraform Safety Belts (In Code)

Protect crown-jewel resources from deletion.

Rollout That Balances Safety and Speed

Phase 1 (Advisory)

Turn on PR Scanning and the Terraform plan-time evaluation in advisory mode. That means, in Terraform, keep the policy set advisory, and in Azure, prefer audit first. Then monitor it for a while. Measure signal-to-noise. And once you have enough information and data, you can tune or rewrite any rule that fires too often or explains itself poorly.

Phase 2 (Targeted Enforcement)

This phase is when you step up a notch. Promote only the highest-impact rules to hard fail in production environments (i.e., encryption, public exposure on admin ports, and IAM wildcards). Everything else can remain advisory until we have struck a balance and it is quiet.

Phase 3 (Organizational Guardrails)

This is when you will start enforcing restrictions. You will start with SCPs or equivalent policies for region restrictions, public IP defaults, and permitted services. The goal of this phase is to be the last line of defense that still holds if someone tries to change infrastructure outside of Terraform.

Phase 4 (Runtime Posture and Feedback)

This phase will be something that is ongoing. You will keep CIS/Benchmark checks enabled. If or when a pattern shows up repeatedly, you will write a preventive policy and add a unit test based on that observation/feedback and link that policy to the control ID in its header for audit traceability.

Notes and Common Pitfalls

Organisations are always shifting their strategies, and based on that infrastructure always evolves. So, there is so much one can do. But there are always areas where one can make mistakes and needs to be mindful of. I try to follow certain tenets, like:

Adopt by degrees. There is a reason that enforcement levels exist. Begin with advisory, then once the rule is stable and specific, implement hard fails.
Prefer plan-aware checks. Always strive to reduce false positives. Evaluating the plan means policies see resolved values from modules and data sources. These will reduce false positives compared to HCL-only scanning.
Keep one source of truth. For example, it may feel like it’s correct to run both Sentinel and OPA and then store the rule metadata (i.e., Owners, control IDs, parameters like approved CIDRs) in a single catalogue and then generate policy variants from it. But it is not always the case. In fact, it can be counterproductive. Drift between two policy sets can be costly.
Manage exceptions deliberately. Always track who owns each exception and when it expires. Azure’s exemption model is a good pattern. You can copy it even if your cloud doesn’t provide the same feature.

Closing

To summarize, Policy-as-Code isn’t just about slapping some documentation on your deployments. It’s converting the expectations into small testable programs that run every time the infrastructure changes. It is achievable with three levels of enforcement and should be implemented in phases. This will always give you fewer incidents and better evidence. The rules become part of the build and not an afterthought.

Git Terraform (software) Infrastructure as code security

Opinions expressed by DZone contributors are their own.

Related

Trending