DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook
  • AI Paradigm Shift: Analytics Without SQL
  • The ORM Is Over: AI-Written SQL Is the New Data Access Layer
  • SELECT AI Query Integration Using Oracle Autonomous Database 26AI and OpenAI

Trending

  • The Developer's Guide to Context-Aware AI: When Your Code Documentation Becomes Intelligent
  • Build Self-Managing Data Pipelines With an LLM Agent
  • Native SQL in Java Without JDBC Boilerplate — Meet Ujorm3
  • Why Round-Robin Won't Save You: Load Balancing Challenges in Data Streaming Services With Heterogeneous Traffic
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. When Valid SQL Was Still the Wrong Answer

When Valid SQL Was Still the Wrong Answer

A personal project exploring why AI-generated SQL isn't always trustworthy and how semantic context, validation, and governance improve analytics accuracy.

By 
Anusha Kovi user avatar
Anusha Kovi
DZone Core CORE ·
Jun. 22, 26 · Analysis
Likes (0)
Comment
Save
Tweet
Share
117 Views

Join the DZone community and get the full member experience.

Join For Free

Editor’s Note: The following is an article written for and published in DZone’s 2026 Trend Report, Cognitive Databases, Intelligent Data: Unified Infrastructure for Vector Search, AI-Optimized Queries, and Hybrid Workloads.


I started working on a personal project with a simple question: If AI can analyze a database schema and generate SQL, what still makes the answer hard to trust?

The first version of my prototype worked at a surface level. A user could ask a business question, and the system would retrieve the relevant schema, generate SQL, run the query against a sample analytics database, and return a result. Technically, that felt like progress. But then I tested a question like, What is monthly revenue? The SQL ran. The database returned an answer. Still, I could not say the answer was truly correct because the meaning of revenue was not clear enough.

After that test, I stopped treating the prototype as a text-to-SQL demo. I started handling it as an experiment in the database context that an AI assistant needs: metric definitions, semantic retrieval, validation rules, and governance signals.

Governed contextual retrieval for analytics

The semantic registry and validation layer help move the prototype from raw text-to-SQL toward governed, context-aware analytics.

Governance Layer

Governance Layer



The Problem: Context Is Not the Same as Understanding

The model could write SQL; the problem was that SQL execution alone did not prove the answer was right. In my database, monthly revenue could mean net revenue after refunds, gross order value, or paid revenue. Active customer could mean a customer with a login, a purchase, or both. Even if the model retrieved the right tables, it still needed to understand which business definition to use.

The main failure signals were:

  • Metric names that had more than one possible meaning
  • Multiple date columns that could change the answer
  • Joins that were technically possible but not analytically safe
  • Missing filters like date range, status, or region
  • Questions that needed clarification before execution

The Constraints: Keep It Small, But Make It Reliable

Because this was a personal project, I was not trying to build a full BI platform or enterprise data catalog. I wanted to focus on one narrow and realistic piece: how to make an AI-generated database answer more trustworthy before it reaches the user.

Prototype boundaries

Constraint

Design Response

Small project scope

Focused on a few high-risk metrics

Ambiguous business terms

Created explicit metric contracts

The schema alone was not enough

Added semantic retrieval over definitions

No analyst review step

Added validation before SQL execution

Simple user experience

Used clarification instead of exposing schema complexity


The hard part was keeping the prototype lightweight without making it too shallow. I wanted enough structure to make the answers safer, but not so much complexity that the project turned into a full governance platform.

The Tradeoffs: What I Changed in the Pipeline

The first major decision was to add a lightweight semantic registry. Each metric contract included the metric name, definition, grain, default date column, required filters, safe dimensions, and approved join path.

YAML
 
metric: net_revenue
definition: paid_amount - refunded_amount
grain: month
date_column: payment_date
required_filters:
  - payment_status = 'completed'
clarify_if_missing:
  - date_range


I almost relied only on schema retrieval because it was easier to build. But the schema only tells the model what exists. It does not tell the model what is correct for a specific business question.

The second decision was to retrieve both schema metadata and metric definitions. This made the retrieval step more useful because the model was both matching a question to tables and grounding the answer in business meaning.

The third decision was to validate SQL before execution. The generated query had to pass checks for allowed tables, approved joins, required filters, and selected dimensions. If it failed, the system either regenerated the query with stronger constraints or asked the user a clarification question.

Decisions, tradeoffs, and outcomes

Choice

Tradeoff

Outcome

Add metric contracts

More setup work

Clearer business meaning

Retrieve semantic context, not just schema

More retrieval complexity

Better grounding

Validate before execution

Slightly slower response

Fewer misleading answers


What Changed

The biggest lesson was that query execution is not the same as analytical correctness. A query can run successfully and still answer the wrong question. The prototype became more reliable when I stopped treating the database as just tables and columns. The model needed more context: what the metric means, which joins are safe, which filters are required, and when it should ask the user for clarification.

My main takeaway was practical: Before tuning prompts, define the meaning layer the prompt is expected to respect.

For intelligent data systems, the interesting work is not only faster retrieval or cleaner SQL. It is the connection between data, definitions, governance rules, and answers people can trust.

This is an excerpt from DZone’s 2026 Trend Report, Cognitive Databases, Intelligent Data: Unified Infrastructure for Vector Search, AI-Optimized Queries, and Hybrid Workloads.

Read the Free Report

AI sql

Opinions expressed by DZone contributors are their own.

Related

  • Keeping AI-Powered BI Honest: A Human-in-the-Loop (HITL) Playbook
  • AI Paradigm Shift: Analytics Without SQL
  • The ORM Is Over: AI-Written SQL Is the New Data Access Layer
  • SELECT AI Query Integration Using Oracle Autonomous Database 26AI and OpenAI

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook