DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • AWS EventBridge as Your System's Nervous System: The Architecture Nobody Talks About
  • Cost Is a Distributed Systems Bug
  • From Distributed Monolith to Composable Architecture on AWS
  • How to Backup SQL Server RDS to an S3 Bucket

Trending

  • The Third Culture: Blending Teams With Different Management Models
  • No More Cheap Claude: 4 First Principles of Token Economics in 2026
  • Detecting Bugs and Vulnerabilities in Java With SonarQube
  • Has AI-Generated SQL Impacted Data Quality? We Reviewed 1,000 Incidents
  1. DZone
  2. Data Engineering
  3. Databases
  4. A Practical Guide to Modernizing Data Serving Architectures Using DBSQL

A Practical Guide to Modernizing Data Serving Architectures Using DBSQL

Learn how we replaced our traditional RDS serving layer with direct lakehouse querying to improve performance, reduce cost, and achieve real-time data freshness.

By 
PRAVEEN DOOTAGANI user avatar
PRAVEEN DOOTAGANI
·
Nov. 06, 25 · Analysis
Likes (0)
Comment
Save
Tweet
Share
2.6K Views

Join the DZone community and get the full member experience.

Join For Free

If you’ve ever built analytical dashboards, chances are you’ve worked with the old “batch-load-to-RDS” pattern, where we load the data into RDS from a batch process and read from that operational data store, such as AWS RDS or other databases. We certainly did in our case — and for a while, it served us well.

Our applications pulled data from a lakehouse like Snowflake or Delta Lake, transformed it in AWS Glue, and loaded it into Amazon RDS. The RDS database powered Lambda APIs that fed dashboards and front-end apps.

But as our platform scaled, the pattern started to show its age.

  • Data latency: Batch updates meant dashboards were always several hours — sometimes a full day — behind the source.
  • Operational overhead: ETL pipelines broke, schema changes cascaded, and recovery processes drained engineering time.
  • Rising costs: We were paying for RDS clusters and ETL compute even when no one was querying data.
  • Data drift: Analysts saw one version of data in Snowflake, while users saw another through the RDS API.

Eventually, it became clear that our batch-based design wasn’t sustainable. We needed something faster, leaner, and closer to real time.

Our Legacy Setup: The Batch-Driven Design

Here’s what our old data flow looked like:

Plain Text
 
UI → API (AWS Lambda) → RDS  ←  AWS Glue Batch Job → DataLake / Snowflake


Each night, an AWS Glue job populated RDS tables from Snowflake and Delta Lake. The API then served those tables to dashboards and analytical apps.

It worked — until a point, batch jobs stretched past six hours, RDS costs skyrocketed, and every schema tweak required multiple coordinated changes. The whole thing was rigid and costly to maintain. Yes, including the engineer's time.

The Shift: Moving to a Query-at-Source Architecture

The goal was simple: get rid of the middleman. We wanted to eliminate the persistent RDS layer and stop pushing data through nightly ETL cycles. Instead, the API would query governed data sources directly through Databricks SQL.

The new data flow looks like this:

Plain Text
 
UI → API (AWS Lambda) → Databricks SQL (JDBC) → DataLake / Snowflake


That single change removed multiple moving parts and simplified our serving layer dramatically. We have a centralized team that handles DB SQL, so the infrastructure maintenance is also taken off of the engineering team.  

Key Building Blocks

  • Frontend: Interactive dashboards for data health and monitoring.
  • API layer: A stateless REST API built on AWS Lambda, connecting directly to Databricks SQL via JDBC.
  • Query engine: A Databricks SQL endpoint governed by Unity Catalog to handle access and security.
  • Data sources: Curated Delta tables and federated data from Snowflake and the lakehouse.

Implementation details:

  • Lambda uses token-based authentication to establish JDBC sessions securely.
  • The API translates incoming requests into parameterized SQL queries, ensuring safe filtering and injection prevention.
  • Databricks SQL executes those queries in real time against Delta and federated tables.
  • Results are streamed directly back to the client — no staging, no lag.

The Results: Leaner, Faster, Cheaper

Once the migration was complete, the impact was immediate. Our dashboards started showing near real-time data, and the ETL pipelines that once ran overnight were gone.

Below is a snapshot of how our data flow evolved — from the traditional batch-driven RDS setup to a live query architecture powered by Databricks SQL:

Before vs. After: Evolution of the Data-Serving Architecture

Evolution of the data-serving architecture


By removing the RDS layer and batch ETL jobs, we:

  • Reduced infrastructure costs by nearly 70%.
  • Eliminated 4–6 hours per week of maintenance and monitoring effort.
  • Removed a 1–2 hour nightly batch window, improving data freshness to near real time.
  • Simplified the stack to just two components: AWS Lambda and Databricks SQL.

The setup is now completely serverless — scaling on demand and removing the need for fixed infrastructure.

Lessons Learned and Optimization Tips

A Query-at-Source model isn’t plug-and-play; it needs careful optimization to perform well at scale.
Here are some things we learned along the way:

Challenges

  • Queries on large, unoptimized datasets can be slower than pre-aggregated RDS tables.
  • High concurrency from API-driven requests can cause queueing.
  • Lambda cold starts add connection overhead for JDBC.

Optimizations That Worked

  • Result caching: Databricks SQL caches repeated queries, returning results in milliseconds.
  • Serverless endpoints: Automatically scale to handle varying query loads.
  • Partitioning and Z-ordering: Optimize Delta tables to minimize data scanned.
  • API guardrails: Pagination, projection pushdown, and LIMIT clauses ensure efficient queries.
  • Pre-aggregated tables: For sub-second latency dashboards, we maintain lightweight summary tables directly in the lakehouse.

The Takeaway

Moving to a Query-at-Source pattern transformed how we serve analytical data. What began as a cost-optimization exercise turned into a mindset shift — letting the lakehouse itself become the serving layer.

Modern query engines like Databricks SQL are fast, secure, and scalable enough to power live analytical APIs — if paired with the right optimizations. For teams struggling with stale dashboards, rising database bills, or brittle ETL pipelines, this architecture offers a clear path to simplicity, speed, and sustainability.

AWS Architecture sql

Opinions expressed by DZone contributors are their own.

Related

  • AWS EventBridge as Your System's Nervous System: The Architecture Nobody Talks About
  • Cost Is a Distributed Systems Bug
  • From Distributed Monolith to Composable Architecture on AWS
  • How to Backup SQL Server RDS to an S3 Bucket

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook