DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Automatic Code Transformation With OpenRewrite
  • Chain-of-Thought Prompting: A Comprehensive Analysis of Reasoning Techniques in Large Language Models
  • Multimodal RAG Is Not Scary, Ghosts Are Scary
  • The Future Speaks: Real-Time AI Voice Agents With Ultra-Low Latency

Trending

  • Simplify Authorization in Ruby on Rails With the Power of Pundit Gem
  • Chaos Engineering for Microservices
  • Integrating Security as Code: A Necessity for DevSecOps
  • Understanding and Mitigating IP Spoofing Attacks
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Mixtral: Generative Sparse Mixture of Experts in DataFlows

Mixtral: Generative Sparse Mixture of Experts in DataFlows

Explore the use of a new type of GenAI LLM with streaming pipelines in this tutorial about how to build a real-time LLM flow with Mixtral AI's new open model.

By 
Tim Spann user avatar
Tim Spann
DZone Core CORE ·
Timothy Spann user avatar
Timothy Spann
·
Mar. 13, 24 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
3.0K Views

Join the DZone community and get the full member experience.

Join For Free

“The Mixtral-8x7B Large Language Model (LLM) is a pre-trained generative Sparse Mixture of Experts.”

When I saw this come out it seemed pretty interesting and accessible, so I gave it a try. With the proper prompting, it seems good. I am not sure if it’s better than Google Gemma, Meta LLAMA2, or OLLAMA Mistral for my use cases.

Today I will show you how to utilize the new Mixtral LLM with Apache NiFi. This will require only a few steps to run Mixtral against your text inputs.

Mixtral LLM image

This model can be run by the lightweight serverless REST API or the transformers library. You can also use this GitHub repository. The context can have up to 32k tokens. You can also enter prompts in English, Italian, German, Spanish, and French. You have a lot of options on how to utilize this model, but I will show you how to build a real-time LLM pipeline utilizing Apache NiFi.

One key thing to decide is what kind of input you are going to have (chat, code generation, Q&A, document analysis, summary, etc.). Once you have decided, you will need to do some prompt engineering and will need to tweak your prompt.  In the following section, I include a few guides to help you improve your prompt-building skills. I will give you some basic prompt engineering in my walk-through tutorial.

Guides To Build Your Prompts Optimally

  • Mixtral: Prompt Engineering Guide
  • Getting Started with Mixtral 8X7B

The construction of the prompt is very critical to make this work well, so we are building this with NiFi.

Overview of the Flow

Overview of the flow

Step 1: Build and Format Your Prompt

In building our application, the following is the basic prompt template that we are going to use.

Prompt Template

{ 
"inputs": 
"<s>[INST]Write a detailed complete response that appropriately 
answers the request.[/INST]
[INST]Use this information to enhance your answer: 
${context:trim():replaceAll('"',''):replaceAll('\n', '')}[/INST] 
User: ${inputs:trim():replaceAll('"',''):replaceAll('\n', '')}</s>" 
}  

You will enter this prompt in a ReplaceText processor in the Replacement Value field.
Enter Replacement Value

Step 2:  Build Our Call to HuggingFace REST API To Classify Against the Model

Add an InvokeHTTP processor to your flow, setting the HTTP URL to the Mixtral API URL.

Add an InvokeHTTP processor to your flow, setting the HTTP URL to the Mixtral API URLStep 3:  Query To Convert and Clean Your Results

We use the QueryRecord processor to clean and convert HuggingFace results grabbing the generated_text field.

Use the QueryRecord processor to clean and convert HuggingFace results grabbing the generated_text field

Step 4: Add Metadata Fields

We use the UpdateRecord processor to add metadata fields, the JSON readers and writers, and the Literal Value Replacement Value Strategy. The fields we are adding are adding attributes.

Use the UpdateRecord processor to add metadata fields, the JSON readers and writers, and the Literal Value Replacement Value Strategy.

Overview of Send to Kafka and Slack:

Overview of Send to Kafka and Slack

Step 5: Add Metadata to Stream

We use the UpdateAttribute processor to add the correct "application/json Content Type", and set the model type to Mixtral.

Add Metadata to Stream

Step 6: Publish This Cleaned Record to a Kafka Topic

We send it to our local Kafka broker (could be Docker or another) and to our flank-mixtral8x7B topic. If this doesn't exist, NiFi and Kafka will automagically create one for you.

Publish This Cleaned Record to a Kafka Topic

Step 7: Retry the Send

If something goes wrong, we will try to resend three times, then fail.

Retry the Send

Overview of Pushing Data to Slack:

Overview of Pushing Data to Slack

Step 8: Send the Same Data to Slack for User Reply

The first step is to split into a single record to send one at a time. We use the SplitRecord processor for this.

Use the SplitRecord processor to split into a single record to send one at a time

As before, reuse the JSON Tree Reader and JSON Record Set Writer. As usual, choose "1" as the Records Per Split.

Step 9: Make the Generated Text Available for Messaging

We utilize EvaluateJsonPath to extract the Generated Text from Mixtral (on HuggingFace).

We utilize EvaluateJsonPath to extract the Generated Text from Mixtral

Step 10: Send the Reply to Slack

We use the PublishSlack processor, which is new in Apache NiFi 2.0. This one requires your Channel name or channel ID. We choose the Publish Strategy of Use 'Message Text' Property. For Message Text, use the Slack Response Template below.

Send the reply to Slack

For the final reply to the user, we will need a Slack Response template formatted for how we wish to communicate.  Below is an example that has the basics.

Slack Response Template

===============================================================================================================
HuggingFace ${modelinformation} Results on ${date}:

Question: ${inputs}

Answer:
${generated_text}

=========================================== Data for nerds ====

HF URL: ${invokehttp.request.url}
TXID: ${invokehttp.tx.id}

== Slack Message Meta Data ==

ID: ${messageid} Name: ${messagerealname} [${messageusername}]
Time Zone: ${messageusertz}

== HF ${modelinformation}  Meta Data ==

Compute Characters/Time/Type: ${x-compute-characters} / ${x-compute-time}/${x-compute-type}

Generated/Prompt Tokens/Time per Token: ${x-generated-tokens} / ${x-prompt-tokens} : ${x-time-per-token}

Inference Time: ${x-inference-time}  // Queue Time: ${x-queue-time}

Request ID/SHA: ${x-request-id} / ${x-sha}

Validation/Total Time: ${x-validation-time} / ${x-total-time}
===============================================================================================================

When this is run, it will look like the image below in Slack.

Slack response to the question "What does Apache NiFi do?"

Slack response to the question "What does Apache Iceberg do?"

You have now sent a prompt to Hugging Face, had it run against Mixtral, sent the results to Kafka, and responded to the user via Slack.

We have now completed a full Mixtral application with zero code.

Conclusion

You have now built a full round trip utilizing Apache NiFi, HuggingFace, and Slack to build a chatbot utilizing the new Mixtral model.

Summary of Learnings

  1. Learned how to build a decent prompt for HuggingFace Mixtral
  2. Learned how to clean up streaming data
  3. Built a HuggingFace REST call that can be reused
  4. Processed HuggingFace model call results
  5. Send your first Kafka message
  6. Formatted and built Slack calls
  7. Built a full DataFlow for GenAI

If you need additional tutorials on utilizing the new Apache NiFi 2.0, check out:

  • Apache NiFi 2.0.0-M2 Out!

For additional information on building Slack bots:

  • Building a Real-Time Slackbot With Generative AI
  • Building an LLM Bot for Meetups and Conference Interactivity

Also, thanks for following my tutorial. I am working on additional Apache NiFi 2 and Generative AI tutorials that will be coming to DZone.

Finally, if you are in Princeton, Philadelphia, or New York City please come out to my meetups for in-person hands-on work with these technologies.

Resources

  • Mixtral of Experts
  • Mixture of Experts Explained
  • mistralai/Mixtral-8x7B-v0.1
  • Mixtral Overview
  • Invoke the Mixtral 8x7B model on Amazon Bedrock for text generation
  • Running Mixtral 8x7b on M1 16GB
  • Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI
  • Retro-Engineering a Database Schema: Mistral Models vs. GPT4, LLama2, and Bard (Episode 3)
  • Comparison of Models: Quality, Performance & Price Analysis
  • A Beginner’s Guide to Fine-Tuning Mixtral Instruct Model

AI Open source Slack (software) artificial intelligence large language model

Published at DZone with permission of Tim Spann, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Automatic Code Transformation With OpenRewrite
  • Chain-of-Thought Prompting: A Comprehensive Analysis of Reasoning Techniques in Large Language Models
  • Multimodal RAG Is Not Scary, Ghosts Are Scary
  • The Future Speaks: Real-Time AI Voice Agents With Ultra-Low Latency

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!