Smart Controls for Infrastructure as Code with LLMs
By generating context-aware controls and performing sophisticated code reviews, LLMs significantly enhance our ability to build secure and efficient cloud infrastructure.
Join the DZone community and get the full member experience.
Join For FreeInfrastructure as Code (IaC) has transformed how we manage and provision infrastructure in the cloud. It enabled developers to consider compute, storage, network, and other infrastructure components as software which was not the case before infra was modeled as code. This approach has addressed multiple challenges including consistency and repeatability. IaC provides guarantees that identical environments will be created every time for a given IaC template, improving reliability and minimizing drift in configuration. Whereas manual provisioning was prone to errors, which can lead to inconsistencies between environments. IaC also integrates with version control systems such as Git, enabling teams to review changes, track changes, rollback to prior states, and collaborate on infrastructure definitions using code — similar to application development. IaC can also help reduce the costs through automated provisioning and de-provisioning of resources, optimizing the utilization and reducing idle resource costs.
Risks and Challenges
IaC introduced significant risks such as increased blast radius despite the benefits stated above. A single error or misconfiguration could propagate across multiple environments, potentially affecting entire production systems because IaC facilitates deployments at scale. This could result in widespread outages or security vulnerabilities. As an example, a single line of code can accidentally allow public access to an S3 bucket that could expose sensitive organizational data if overlooked while writing or reviewing code.
IaC templates that lack proper security validations can result in security vulnerabilities. Templates might create resources with overly permissive access, open network ports, or unencrypted data stores resulting in security gaps. While IaC simplifies deployment, maintaining templates requires ongoing effort and the inherent complexity and steep learning curve present additional challenges.
Essential Controls
Given these risks, establishing robust controls is critical to effective and secure use of IaC. Organizations must proactively enforce security, compliance, and operational policies for policy enforcement. This ensures that deployed infrastructure adheres to defined standards such as encryption requirements, network segmentation, and least privilege access.
Manual reviews for large-scale IaC deployments are insufficient. Automated tools must validate IaC templates against established policies before deployment and continuously audit deployed infrastructure for compliance. Integrating security checks early in the development lifecycle catches misconfigurations and vulnerabilities before they reach production, reducing costs associated with the mitigation and impact.
How Large Language Models Can Help ?
There is rapid improvement in large language models (LLMs) and the larger context window provided by most models can be used to read and generate code. Reasoning models that can break problems and reason about those have resulted in effective usage in code generation and review. Models such as Claude demonstrated effectively that they can understand and generate code. LLMs can be used effectively for IaC because of these improvements in the models. Let us see how these models can help in IaC development and validations
Accelerating IaC Development
LLMs can generate IaC code for CloudFormation, Terraform, or CDK directly from natural language descriptions. This reduces manual effort, speeds up initial setup, and helps ensure adherence to best practices.
Example: When an engineer describes AWS infrastructure requirements in plain English LLM can translate this into executable code.
User Input: "I need a Lambda function to process S3 events, and it needs to be connected to a DynamoDB table. Let us call that table CustomerData. Provide necessary permissions to access table" LLM Generated CDK.
Note: This is a sample code snippet.
const myTable = new dynamodb.Table(this, 'CustomerDataTable', {
partitionKey: {
name: 'id',
type: dynamodb.AttributeType.STRING
},
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
});
const myLambda = new lambda.Function(this, 'S3EventProcessorLambda', {
runtime: lambda.Runtime.NODEJS_18_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('lambda'),
environment: {
TABLE_NAME: myTable.tableName,
},
});
myTable.grantReadWriteData(myLambda);
myLambda.addEventSource(new S3EventSource(myBucket, {
events: [s3.EventType.OBJECT_CREATED],
}));
Policy Enforcement and Security
LLMs enable teams to enforce security and compliance policies with minimal manual effort, shifting from reactive to proactive security management. Lets see some examples:
-
Automated Policy Generation and Validation: Intent-Driven Policies: LLMs can infer infrastructure intent and criticality. For example, if a database is named "ProductionMarketingDB," an LLM understands its high availability and encryption requirements and can automatically generate appropriate security and operational policies.
-
Dynamic Policy Enforcement: Instead of static rule sets, LLMs can create dynamic policies that adapt based on infrastructure context. A "production" database might automatically require encryption at rest and in transit, multi-AZ deployment, and restricted network access.
-
Intelligent Code Review: LLMs can review IaC code for compliance with coding guidelines, security best practices, and organizational policies. By providing the set of code review guidelines to use they provide actionable feedback and suggest fixes, augmenting traditional human code reviews.
Code Example - The following Python example demonstrates how LLMs can be integrated into IaC pipelines for policy enforcement using AWS Bedrock for AWS infra. This validator checks for common security issues like overly permissive IAM policies:
import json
import os
import argparse
import sys
import boto3
class ClaudeBedrockClient:
"""
Claude API client using AWS Bedrock Runtime.
"""
def __init__(self, region_name='us-east-1'):
self.bedrock_runtime = boto3.client(
service_name='bedrock-runtime',
region_name=region_name
)
def send_prompt(self, prompt_text, model_id="anthropic.claude-3-sonnet-20240229-v1:0"):
print(f"--- Sending prompt to Claude via Bedrock ({model_id}) ---")
body = json.dumps({
"prompt": f"\n\nHuman: {prompt_text}\n\nAssistant:",
"max_tokens_to_sample": 2000,
"temperature": 0.1,
"top_p": 1.0,
"stop_sequences": ["\n\nHuman:"],
})
try:
response = self.bedrock_runtime.invoke_model(
modelId=model_id,
contentType='application/json',
accept='application/json',
body=body
)
response_body = json.loads(response.get('body').read())
claude_response_text = response_body.get('completion')
return claude_response_text
except Exception as e:
print(f"Error invoking Bedrock model: {e}")
return f"Error: Could not get response from Claude via Bedrock. {e}"
def load_template(template_path):
"""Loads and parses a CloudFormation template from the given path."""
try:
with open(template_path, 'r') as f:
template = json.load(f)
return template
except FileNotFoundError:
print(f"Error: Template file not found at {template_path}")
return None
except json.JSONDecodeError:
print(f"Error: Invalid JSON in template file at {template_path}")
return None
def validate_iac_policies(template, claude_client):
"""
Validates IaC template against security policies using Claude.
"""
# Define security policies in natural language
security_policies = """
Security Policy Requirements:
1. No IAM policies should allow wildcard (*) actions on all resources
2. S3 buckets must not allow public read or write access
3. Database instances must have encryption enabled
4. EC2 instances in production environments must not have public IP addresses
5. IAM roles should follow principle of least privilege
6. No hardcoded credentials or secrets should be present
"""
# Create prompt for Claude to analyze the template
prompt = f"""
You are a security auditor reviewing an Infrastructure as Code template.
{security_policies}
Please analyze the following CloudFormation template and identify any violations of the security policies listed above.
For each violation, provide:
1. The specific resource that violates the policy
2. What the violation is
3. A recommended fix
Template to analyze:
{json.dumps(template, indent=2)}
Provide your analysis in the following format:
VIOLATIONS FOUND: [number]
For each violation:
RESOURCE: [resource name]
VIOLATION: [description]
RECOMMENDATION: [how to fix]
---
If no violations are found, respond with "VIOLATIONS FOUND: 0"
"""
return claude_client.send_prompt(prompt)
def main():
parser = argparse.ArgumentParser(description='Validate IaC template using Claude')
parser.add_argument('template_path', help='Path to CloudFormation template')
parser.add_argument('--region', default='us-east-1', help='AWS region for Bedrock')
args = parser.parse_args()
# Load template
template = load_template(args.template_path)
if not template:
sys.exit(1)
# Initialize Claude client
claude_client = ClaudeBedrockClient(region_name=args.region)
# Validate policies
print("Validating template against security policies...")
validation_result = validate_iac_policies(template, claude_client)
print("\n" + "="*50)
print("VALIDATION RESULTS")
print("="*50)
print(validation_result)
# Exit with error code if violations found
if "VIOLATIONS FOUND: 0" not in validation_result:
print("\n⚠️ Template validation failed - security violations detected!")
sys.exit(1)
else:
print("\n✅ Template validation passed - no security violations found!")
sys.exit(0)
if __name__ == "__main__":
main()
Save this as validate_iac.py file and then run below commands
# First synthesize your CDK app to CloudFormation
cdk synth > cdk-output.json
# Then validate the synthesized template
python validate_iac.py cdk-output.json --region <<bedrock region to use>>
Expanding Controls
Above script can be expanded to enforce other controls too. Another example is to extend LLM validator's capabilities to include cost optimization controls. You can add a method that identifies potential over-provisioning. This could identify cases such as where large EC2 instance types (m5.24xlarge) is defined for non-production environments or expensive database tiers (db.r5.24xlarge) without clear justification, the LLM could flag these configurations.
Conclusion
Infrastructure as Code has transformed how we all build and manage infrastructure in the cloud. This provides speed, consistency, and capability to collaborate in building infra. However, this benefit came with responsibilities and risks, particularly increasing the blast radius of human mistakes and the need for constant security validations. This requires dynamic and intelligent controls that are proactive rather than being reactive.
Large language models come to our rescue here. By introducing automation in IaC development, generating context-aware controls and policies, and performing sophisticated code reviews, LLMs enhance our ability to deliver secure, compliant, and efficient cloud infrastructure. They help enforce best practices, making IaC more powerful and less risky. As LLMs continue to evolve, their role in making IaC more robust and intelligent will only grow, paving the way for even more secure and efficient cloud operations.
Opinions expressed in this article are solely mine and do not express the views or opinions of my employer.
Opinions expressed by DZone contributors are their own.
Comments