Real-Time Streaming Pattern: Analyzing Trends
Using the Twitter API, we learn how to perform analysis of trends using real-time streaming patterns with an open source big data platform.
Join the DZone community and get the full member experience.Join For Free
This week, we continue to look at data processing patterns used to build event triggered stream processing applications, the use cases that the patterns relate to, and how you would go about implementing them within Wallaroo.
This purpose of these posts is to help you understand the data processing use cases that Wallaroo is best designed to handle and how you can go about building them right away.
I will be looking at the Wallaroo application builder, the part of your application that hooks into the Wallaroo framework, and some of the business logic of the pattern.
You should also check out all the posts in the "Wallaroo in Action" category.
Pattern: Analyzing Trends
What makes stream processing different from alternatives like batch processing is that we continuously run our application logic over data as it comes in. Rather than running that logic in periodic intervals.
Similar to triggering alerts based on an event, sometimes you want to perform more detailed analysis on events in your application. We can all benefit from more monitoring or testing of user interface improvements.
Some specific examples could include:
- A/B testing.
- Analyzing rage clicking in your application.
- Determining infrastructure load based on user location.
- Sentiment analysis of reviews for your product.
- Updating the "Most Popular" filter for an e-commerce website.
In this post, we continue to use Twitter's tweet API like in previous Wallaroo posts; Real-time Streaming Pattern: Preprocessing for Sentiment Analysis and Identifying Trending Twitter Hashtags in Real-time with Wallaroo.
Like in the more detailed "Identifying Trending Twitter Hashtags in Real-time with Wallaroo" post, imagine we are creating an application showing the trending hashtags on Twitter.
The application itself requires only two computations. One to extract the hashtags in a tweet and a stateful computation to keep the count of hashtags.
Wallaroo Application Builder
ab.new_pipeline( "Analyzing Trends", wallaroo.TCPSourceConfig(in_host, in_port, in_decoder) ) ab.to_parallel(extract_hashtags) ab.to_stateful(compute_hashtags, HashtagState, "hashtags state") ab.to_sink(wallaroo.TCPSinkConfig(out_host, out_port, encoder)) return ab.build()
ab.new_pipeline( "Analyzing Trends", wallaroo.TCPSourceConfig(in_host, in_port, in_decoder) )
We define a new pipeline named
"Analyzing Trends" and the source of the data, in this case it's coming from a socket over TCP.
Our first computation is on that in parallel calls
extract_hashtags. Like other examples in the past,
extract_hashtags isn't modifying state and by calling
to_parallel we're able to send the computation to all the workers available.
ab.to_stateful(compute_hashtags, HashtagState, "hashtags state")
Takes a commutation, a state class, and a name.
HashtagState describes how we would define our state. More information on
State in our documentation.
ab.to_sink(wallaroo.TCPSinkConfig(out_host, out_port, encoder)) return ab.build()
Lastly we tell Wallaroo the host and port to send the results. This could be another application listening on that host and port or a Kafka topic. In the case of our Twitter trending hashtags example, this data eventually was rendered on a webpage so the user could see the top 10 hashtags in real-time.
There are many different use-cases for stream processing and I hope this provided a good overview on how you could go about integrating Wallaroo into your infrastructure.
If you're interested in running the example application from the Identifying Trending Twitter Hashtags in Real-time with Wallaroo example you can find the repository here.
Wallaroo's lightweight API gives you the ability to construct your data processing pipeline and run whatever application logic you need to power your application.
Give Wallaroo a Try
We hope that this post has piqued your interest in Wallaroo!
If you are just getting started, we recommend you try our Docker image, which allows you to get Wallaroo up and running in only a few minutes.
Wallaroo provides a robust platform that enables developers to implement business logic within a streaming data pipeline quickly.
Published at DZone with permission of Erik Nilsen, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.