DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Generic and Dynamic API: MuleSoft
  • JQueue: A Library to Implement the Outbox Pattern
  • How to Integrate a Distributed Database With Event Streaming
  • A Camunda Greenhouse: Part 3

Trending

  • Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering
  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
  • Is the Data Warehouse Dead? 3 Patterns From Enterprise Architecture That Answer This Question
  • Implementing Observability in Distributed Systems Using OpenTelemetry
  1. DZone
  2. Data Engineering
  3. Data
  4. Event Streaming With MuleSoft

Event Streaming With MuleSoft

This article offers an integration pattern solution to transmitting order data as a stream at scale i.e., thousands of events per second.

By 
Srini Rangaswamy user avatar
Srini Rangaswamy
·
Jun. 24, 20 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
11.0K Views

Join the DZone community and get the full member experience.

Join For Free

Event Streaming — A Brief Overview

Events are everywhere -- order placed, shares traded, health condition reported, temperature threshold reached, and so on. 

Event streaming is “continuous” transmission along with real-time processing of events as it is being generated for making time-critical business decisions.

Sample use cases include click-stream data analysis to identify user behavior and optimizing the website in real-time, log/reporting analytics, machine learning-based alerts/actions.

At a high-level there are three stages in event processing: 

  • Generation (source/producer)
  • Transmission (i.e., data movement)
  • Processing (destination/consumer)

In many modern/IoT applications the event generator will potentially have the ability to transmit it as well. But in the enterprise setting events typically accumulate in a data store like database and from there it has to be transmitted to the destination like a data lake or warehouse for processing.

Sample Use Case

In a retail scenario, the order data (event) is generated by a Distributed Order Management (DOM) and then stored in an Order table. Now you need a solution to stream (continuously transfer at scale) the order data to the destination for analytics processing and take time-sensitive actions like forecasting inventory levels or predicting the next best offer in real-time. 

This article offers a solution to solve the above use case of transmitting order data as a stream at scale i.e., thousands of events per second. 

Below is the flow diagram and where MuleSoft capability fits in.

data streaming with mulesoft

Note: MuleSoft natively supports the processing of input streams with different strategies that are different than transmitting data streams as explained in this article.

Solution Design

The solution provides an integration pattern design that will primarily focus on the continual transmission of events at scale from the event database to the event consumer. The design consists of the following three steps and here is a link to the Github project.

  1. Data ingestion
  2. Data transformation
  3. Data delivery

Data Ingestion

The first and the critical step is data ingestion and in this case, it is to fetch order records from the database as soon as it is inserted. This is achieved in two parallel steps:

  • Continually publish time window ({t0, t1}, {t1, t2}..) covering the time scale
  • Fetch records using the time window using multiple threads and/or workers for scale

A scheduler is used in step an above to publish the time window at the desired frequency e.g. every 100 ms. Note there will be scheduler only runs in the primary node in a multi-worker scenario.

A VM queue is used to publish and consume time windows. The database query is a simple one-line Select statement filtered by time window {t0, t1}, {t1, t2}, and so on (it is required that each event record has a timestamp column). See the design below.

Data Transformation 

Transformation can be as simple or complex as needed to meet the needs of the consumer which can be accomplished using Mule DataWeave script. In this case, the selected records are transformed into a JSON array.

Data Delivery 

This step sends the transformed JSON array to the event destination for further processing. An out-of-the-box Mule connector could be used to post the payload to public cloud service or an on-premise data storage solution.

API-Led Approach

Below is the API-led design where Event Streaming is a headless API and Streaming Manager provides controls to start/stop streaming and rewind streaming start time. To simulate the event producer an Event System API is designed to continually load events into the Order database.

inventory database


Performance and Scalability

The solution design is performance tested in the following setting: Order table hosted in Oracle database in AWS, event streaming API running in MuleSoft CloudHub with 4 workers each 0.2 vCore, and the event destination in Google Cloud. At the lower end, the solution streamed 7,000 - 10,000 events/sec using less than 10% CPU leaving plenty of room for further scaling.

To mitigate any potential performance impact on the database server due to continuous concurrent select queries it is recommended to index the SQL where clause column (timestamp in this case) and to tune cache size.

Summary

In today's always-on business environment responding to market events and customer behavior in real-time is critical and event streaming/processing enables that. This article offers a solution to stream data at scale using the MuleSoft platform and provides an API-led approach creating a reusable integration design pattern.

Event Database MuleSoft Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Generic and Dynamic API: MuleSoft
  • JQueue: A Library to Implement the Outbox Pattern
  • How to Integrate a Distributed Database With Event Streaming
  • A Camunda Greenhouse: Part 3

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook