DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • The Data (Pipeline) Movement: A Guide to Real-Time Data Streaming and Future Proofing Through AI Automation and Vector Databases
  • Architectural Patterns for Enterprise Generative AI Apps: DSFT, RAG, RAFT, and GraphRAG
  • Pivoting Database Systems Practices to AI: Create Efficient Development and Maintenance Practices With Generative AI
  • A Framework for Building Semantic Search Applications With Generative AI

Trending

  • The 4 R’s of Pipeline Reliability: Designing Data Systems That Last
  • AI's Dilemma: When to Retrain and When to Unlearn?
  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. How to Scale RAG and Build More Accurate LLMs

How to Scale RAG and Build More Accurate LLMs

Retrieval augmented generation (RAG) needs the right data architecture to scale efficiently. Learn how data streaming helps data and application teams innovate.

By 
Andrew Sellers user avatar
Andrew Sellers
·
Aug. 08, 24 · Analysis
Likes (2)
Comment
Save
Tweet
Share
3.3K Views

Join the DZone community and get the full member experience.

Join For Free

Retrieval augmented generation (RAG) has emerged as a leading pattern to combat hallucinations and other inaccuracies that affect large language model content generation. However, RAG needs the right data architecture around it to scale effectively and efficiently. A data streaming approach grounds the optimal architecture for supplying LLMs with large volumes of continuously enriched, trustworthy data to generate accurate results. This approach also allows data and application teams to work and scale independently to accelerate innovation.

Foundational LLMs like GPT and Llama are trained on vast amounts of data and can often generate reasonable responses about a broad range of topics, but do generate erroneous content. As Forrester noted recently, public LLMs “regularly produce results that are irrelevant or flat wrong,” because their training data is weighted toward publicly available internet data. In addition, these foundational LLMs are completely blind to the corporate data locked away in customer databases, ERP systems, corporate Wikis, and other internal data sources. This hidden data must be leveraged to improve accuracy and unlock real business value.

RAG allows data teams to contextualize prompts in real-time with domain-specific company data. Having this additional context makes it far more likely that the LLM will identify the right pattern in the data and provide a correct, relevant response. This is critical for popular enterprise use cases like semantic search, content generation, or copilots, where outputs must be based on accurate, up-to-date information to be trustworthy.

Why Not Just Train an LLM on Company-Specific Data?

Current best practices for generative AI often necessitate creating foundation models by training billion-node transformers on massive amounts of data, making this approach prohibitively expensive for most organizations. For example, OpenAI has said it spent more than $100 million to train GPT-4. Research and industry are beginning to provide promising results for small language models and less expensive training methods, but those aren’t generalizable and commoditized yet. Fine-tuning an existing model is another, less resource-intensive approach and may also become a good option in the future, but this technique still requires significant expertise to get right. One of the benefits of LLMs is that they democratize access to AI, but having to hire a team of PhDs to fine-tune a model largely negates that benefit.

RAG is the best option today, but it must be implemented in a way that provides accurate and up-to-date information and in a governed manner that can be scaled across applications and teams. To see why an event-driven architecture is the best fit for this, it’s helpful to look at four patterns of GenAI application development.

1. Data Augmentation

An application must be able to pull relevant contextual information, which is typically achieved by using a vector database to look up semantically similar information typically encoded in semi-structured or unstructured text. This means gathering data from disparate operational stores and “chunking” it into manageable segments that retain its meaning. These chunks of information are then embedded into the vector database where they can be coupled with prompts.

An event-driven architecture is beneficial here because it’s a proven method for integrating disparate sources of data from across an enterprise in real-time to provide reliable and trustworthy information. By contrast, a more traditional ETL (extract, transform, load) pipeline that uses cascading batch operations is a poor fit because the information will often be stale by the time it reaches the LLM. An event-driven architecture ensures that when changes are made to the operational data store, those changes are carried over to the vector store that will be used to contextualize prompts. Organizing this data as streaming data products also promotes reusability, so these data transformations can be treated as composable components that can support data augmentation for multiple LLM-enabled applications.

2. Inference

Inference involves engineering prompts with data prepared in the previous steps and handling responses from the LLM. When a prompt from a user comes in, the application gathers relevant context from the vector database or an equivalent service to generate the best possible prompt.

Applications like ChatGPT often take a few seconds to respond, which is an eternity in distributed systems. Using an event-driven approach means this communication can take place asynchronously between services and teams. With an event-driven architecture, services can be decomposed along functional specializations, which allows application development teams and data teams to work separately to achieve their objectives of performance and accuracy.

Further, by having decomposed, specialized services rather than monoliths, these applications can be deployed and scaled independently. This helps decrease time to market since the new inference steps are consumer groups, and the organization can template infrastructure for instantiating these quickly.

3. Workflows

Reasoning agents and inference steps are often linked into sequences where the next LLM call is based on the previous response. This is useful in automating complex tasks where a single LLM call will not be sufficient to complete a process. Another reason for decomposing agents into chains of calls is because the popular LLMs today tend to return better results when we ask multiple, simpler questions, although this is changing.

As the example workflow below illustrates, with a data streaming platform, the web development team can work independently from the backend system engineers, allowing each team to scale according to its needs. The data streaming platform enables this decoupling of technologies, teams, and systems. 

Retail example: Data streaming and RAG

4. Post-Processing

Despite our best efforts, LLMs can still generate erroneous results, so we need a way to validate outputs and enforce business rules to prevent those errors from causing harm. 

Typically, LLM workflows and dependencies change much more quickly than the business rules that determine whether outputs are acceptable. In the example above, we again see good use of decoupling with a data streaming platform: The compliance team validating LLM outputs can operate independently to define the rules without needing to coordinate with the team building the LLM applications. 

Conclusion

RAG is a powerful model for improving the accuracy of LLMs and making generative AI applications viable for enterprise use cases. But RAG is not a silver bullet. It needs to be surrounded by an architecture and data delivery mechanisms that allow teams to build multiple generative AI applications without reinventing the wheel, and in a manner that meets enterprise standards for data governance and quality. 

A data streaming model is the simplest and most efficient way to meet these needs, allowing teams to unlock the full power of LLMs to drive new value for their business. As technology becomes the business and AI enhances this technology, those firms that compete effectively will incorporate AI to augment and streamline more and more processes.  

By having a common operating model for RAG applications, the enterprise can bring the first use case to market quickly while also accelerating delivery and reducing costs for everyone that follows.

AI Event-driven architecture generative AI vector database Data stream

Published at DZone with permission of Andrew Sellers. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The Data (Pipeline) Movement: A Guide to Real-Time Data Streaming and Future Proofing Through AI Automation and Vector Databases
  • Architectural Patterns for Enterprise Generative AI Apps: DSFT, RAG, RAFT, and GraphRAG
  • Pivoting Database Systems Practices to AI: Create Efficient Development and Maintenance Practices With Generative AI
  • A Framework for Building Semantic Search Applications With Generative AI

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!