DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • The State of Observability 2024: Navigating Complexity With AI-Driven Insights
  • Revolutionizing Observability: How AI-Driven Observability Unlocks a New Era of Efficiency
  • Making Waves: Dynatrace Perform 2024 Ushers in New Era of Observability
  • The Human Side of Logs: What Unstructured Data Is Trying to Tell You

Trending

  • Doris: Unifying SQL Dialects for a Seamless Data Query Ecosystem
  • Chaos Engineering for Microservices
  • Unlocking AI Coding Assistants Part 1: Real-World Use Cases
  • Optimize Deployment Pipelines for Speed, Security and Seamless Automation
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. Data Observability: Reliability in the AI Era

Data Observability: Reliability in the AI Era

For GenAI, data observability must prioritize resolution, pipeline efficiency, and streaming and vector infrastructures.

By 
Lior Gavish user avatar
Lior Gavish
·
Dec. 02, 23 · Opinion
Likes (2)
Comment
Save
Tweet
Share
3.2K Views

Join the DZone community and get the full member experience.

Join For Free

When we introduced the concept of data observability four years ago, it resonated with organizations that had unlocked new value…and new problems thanks to the modern data stack. 

Now, four years later, we are seeing organizations grapple with the tremendous potential…and tremendous challenges posed by generative AI. 

The answer today is the same as it was then: improve data product reliability by getting full context and visibility into your data systems. However, the systems and processes are evolving in this new AI era and so data observability must evolve with them, too. 

Perhaps the best way to think about it is to consider AI another data product and data observability as the living, breathing system that monitors ALL of your data products. The need for reliability and visibility into what is a very black box is just as critical for building trust in LLMs as it was in building trust for analytics and ML. 

For GenAI in particular, this means data observability must prioritize resolution, pipeline efficiency, and streaming/vector infrastructures. Let’s take a closer look at what that means.

Going Beyond Anomalies

Software engineers have long since gotten a handle on application downtime, thanks in part to observability solutions like New Relic and Datadog (who, by the way, just reported a stunning quarter). 

Data teams, on the other hand, recently reported that data downtime nearly doubled year over year and that each hour was getting more expensive.

Image courtesy of Monte Carlo.

Data products — analytical, ML and AI applications — need to become just as reliable as those applications to truly become enmeshed within critical business operations. How?

Well, when you dig deeper into the data downtime survey, a trend starts to emerge: the average time-to-resolution (once detected) for an incident rose from 9 to 15 hours. 

In our experience, most data teams (perhaps influenced by the common practice of data testing) start the conversation around detection. While early detection is critically important, teams vastly underestimate the significance of making incident triage and resolution efficient. Just imagine jumping around between dozens of tools trying to hopelessly figure out how an anomaly came to be or whether it even matters. That typically ends up with fatigued teams that ignore alerts and suffer from data downtime. 

Monte Carlo has accelerated the root cause analysis of this data freshness incident by correlating it to a dbt model error resulting from a GitHub pull request where the model code was incorrectly modified with the insertion of a semi-colon on line 113. Image courtesy of Monte Carlo.

Data observability is characterized by the ability to accelerate root cause analysis across data, system, and code and to proactively set data health SLAs across the organization, domain, and data product levels.

The Need for Speed (and Efficiency)

Data engineers are going to be building more pipelines faster (thanks Gen AI!) and tech debt is going to be accumulating right alongside it. That means degraded query, DAG, and dbt model performance.

Slow running data pipelines cost more, are less reliable, and deliver poor data consumer experience. That won’t cut it in the AI era when data is needed as soon as possible. Especially not when the economy is forcing everyone to take a judicious approach with expense.

That means pipelines need to be optimized and monitored for performance. Data observability has to cater for it.

Observing the GenAI Data Stack

This will shock no one who has been in the data engineering or machine learning space for the last few years, but LLMs perform better in areas where the data is well-defined, structured, and accurate. 

Not to mention, there are few enterprise problems to be solved that don’t require at least some context of the enterprise. This is typically proprietary data — whether it is user ids, transaction history, shipping times or unstructured data from internal documents, images and videos. These are typically held in a data warehouse/lakehouse. I can’t tell a Gen AI chatbot to cancel my order if it doesn’t have any idea of who I am, my past interactions, or the company cancellation policy.

Ugh, fine. Be that way, Chat-GPT 3.5. Image courtesy of Monte Carlo.

To solve these challenges, organizations are typically turning to RAG or pre-training/fine tuning approaches, both of which require smart and reliable data pipelines. In an (oversimplified) nutshell, RAG involves providing the LLM additional context through a database (oftentimes a vector database…) that is regularly ingesting data from a pipeline, while fine tuning or pre-training involves tailoring how the LLM performs on specific or specialized types of requests by providing it a training corpus of similar data points.

Data observability needs to help data teams deliver reliability and trust in this emerging stack.

In the Era of AI, Data Engineering Is More Important Than Ever

Data engineering has never been a slowly evolving field. If we started talking to you ten years ago about Spark clusters, you would have politely nodded your head and then crossed the street. 

To paraphrase a Greek data engineer philosopher, the only constant is change. To this we would add, the only constants in data engineering are the eternal requirements for more. More data, more reliability, and more speed (but at less cost, please and thank you). 

Gen AI will be no different, and we see data observability as an essential bridge to this future that is suddenly here.

AI Observability Data (computing)

Published at DZone with permission of Lior Gavish. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The State of Observability 2024: Navigating Complexity With AI-Driven Insights
  • Revolutionizing Observability: How AI-Driven Observability Unlocks a New Era of Efficiency
  • Making Waves: Dynatrace Perform 2024 Ushers in New Era of Observability
  • The Human Side of Logs: What Unstructured Data Is Trying to Tell You

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!