Automating AWS Glue Infra and Code Reviews With RAG and Amazon Bedrock

Automate AWS Glue reviews with infra-first RAG governance, enforcing enterprise standards, reducing manual work, and shifting checks left.

pooja chhabra

Jan. 29, 26 · Tutorial

Likes (0)

Comment

Save

1.9K Views

In many enterprises, the transition from a "working" pipeline to a "production-ready" pipeline is gated by a manual checklist. In most enterprises, a “simple” Glue review involves answering questions like:

Is the Glue job deployed?
Was it provisioned via CloudFormation?
Does the expected crawler exist?
Is the code production-grade?
Does it follow internal best practices?

Traditionally, a senior engineer would spend 4–6 hours per use case and manually:

Cross-references CloudFormation
Opens job scripts
Reviews against a checklist stored somewhere
Writes feedback by hand

In a fast-scaling organization, this creates a bottleneck. To solve this, I developed a utility that implements shift-left governance — moving the review process from the final "pre-production" stage directly into the development cycle.

So instead of waiting for a Senior Engineer to find a missing Crawler or a hardcoded S3 path near production deployment, the developer runs an automated script during development.

Proactive vs. reactive: Issues are fixed while in the development phase.
Cost reduction: It is significantly cheaper that 4-6hrs a senior engineer would spend.
Empowerment: Developers receive instant, expert-level feedback without waiting on a human reviewer.

The Solution Architecture: RAG-Powered Governance

The core of this implementation is a retrieval-augmented generation (RAG) pattern. While LLMs are smart, they don't natively know your specific "Enterprise Internal Standards."

The RAG Approach: Grounding the Review

Instead of hardcoding rules into a prompt, we store the Enterprise Best Practices Checklist as a document in Amazon S3.

Design Principle: Deterministic Infrastructure Validation Before Probabilistic Inference

The key insight was simple: A Glue job review is meaningless without infrastructure context.

So instead of building “another code reviewer,” let's build a system that:

Starts with real deployed infrastructure
Verifies it against enterprise standards
Reviews code only if infra checks pass
Uses GenAI with strict grounding

The High-Level Architecture

At a glance, the system works like this:

The reviewer runs a local CLI command with a use case name.
The system discovers all Glue jobs for that use case.
It validates:
- Glue job existence
- Deployment status
- CloudFormation provenance
- Expected crawlers
It retrieves enterprise review standards from Amazon S3, along with the glue job script to be reviewed against this checklist.
It runs a GenAI-powered code review.
It generates Markdown reports per job.

All fully automated.

Step 1: Discover Glue Jobs Automatically

Instead of hardcoding job names, I used the AWS Glue Data Catalog to discover jobs associated with a use case.

This allows the review process to:

Stay decoupled from naming conventions
Work across environments
Scale as pipelines grow

Step 2: Infrastructure Validation (The Missing Piece)

Before touching the code, the system validates the infrastructure.

For each Glue job, it checks:

Does the job exist?
Is it deployed?
Was it created via CloudFormation?
Does required crawler exist?

Example output:

    Plain Text
   
   Glue Job Exists

- Resource: Glue Job (<gluejob name>)  

                    Status: Deployed

                    Source: IaC (CloudFormation: <stack name>)

                    Compliance: Project standards met

                    If something is missing:

                    Crawler Doesn't Exist

                    <crawler name> Expected

                    - Compliance: Project standards not met

Step 3: The RAG Approach (Grounding the review)

Once infra checks pass, the system sends the script and the RAG-retrieved checklist to a Claude Sonnet 3.5 LLM hosted on Amazon Bedrock.

Here’s how:

The enterprise checklist is stored in Amazon S3
Relevant checklist is retrieved at runtime
Glue job script is retrieved from job metadata at runtime.
They are injected directly into the LLM prompt
The LLM evaluates code only against retrieved standards

The model does not invent rules.
It reasons strictly within enterprise-defined constraints.

This allows the governance rules to evolve (e.g., adding a new requirement) just by updating an S3 file, without changing a single line of the review code.

Why Not Free-Form GenAI?

Unconstrained GenAI:

Hallucinates standards
Produces inconsistent feedback
Is impossible to audit

That’s unacceptable in an enterprise.

What the Code Review Output Looks Like

Each Glue job gets a Markdown report like this:

    Markdown
   
   ## Code Structure & Readability

Is the code modular and function-based?

- Status: NEEDS IMPROVEMENT

- Explanation: Logic is implemented in a single main function. But there is repetitive code for listing objects in S3 buckets.

- Recommendation: Create a separate function for listing S3 objects to reduce repetition.

Every checklist item includes:

Pass/fail status
Clear explanation
Actionable recommendation

Exactly how a senior engineer would review — but consistently.

Performance and Productivity Gains

By shifting governance left and using RAG, we can turn a slow, expert-driven, manual process into a fast, deterministic, self-service workflow:

metric	manual process	rag-automated process	improvement
Review Time	~4 hrs	~3-4 Minutes	98% Faster
Rule Updates	Manual Re-training	Update S3 File	Instant
Reviewer dependency	Senior engineer	Self-service	Reallocation of senior expertise to high-impact architecture
Consistency	Human-dependent	100% Policy-aligned	No subjective bias
Review quality	Variable	Standardised	Institutionalisation of "Best Practices.
Cost	High-Cost Engineering Hours	Nominal API/Compute Cost. (<$0.10 per PR via Amazon Bedrock)	>99% Reduction in Direct Review Expense
Audit trail	Manually 1 to 1 shared review comments	Markdown artifacts	High-fidelity visibility for stakeholders

Net result: ~95% reduction in review time.

Conclusion: Governance as a Service

This implementation proves that "Enterprise Standards" don't have to be a manual burden. By utilizing a RAG approach with Amazon Bedrock, a living governance engine can be created that executes in minutes and is active from the development phase itself. This allows senior engineers to focus on architecture and innovation, while the "bottleneck" of checklist-checking is eliminated.

The code for this implementation is available in my GitHub repository.

Clone URL: https://github.com/chhabrapooja/infra-aware-code-review-automation.git

I will make the repository public following the publication of this article.

Why This Works (and Scales)

Some reasons are:

Infra comes first. Most review tools ignore infrastructure. This one starts there.
Grounded GenAI implementation. RAG ensures correctness, consistency, and auditability.
Standards are decoupled from code. Updating the checklist requires no redeployment.
Decoupling infrastructure by lifecycle and/or use case. Instead of maintaining a single stack, we implemented granular stack segmentation based on the specific pipeline stage to locate specific resources instantly.

Future Roadmap

CI/CD Integration: Trigger reviews automatically on GitHub Pull Requests.

AWS RAG

Opinions expressed by DZone contributors are their own.

Related

Trending