DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Reactive Big Data on OpenShift In-Memory Data Grids

Reactive Big Data on OpenShift In-Memory Data Grids

Follow a demo about how Infinispan-based in-memory data grids can help you deal with the problems of real-time big data and how to do big data analytics.

Galder Zamarreno user avatar by
Galder Zamarreno
·
May. 08, 17 · Opinion
Like (5)
Save
Tweet
Share
3.69K Views

Join the DZone community and get the full member experience.

Join For Free

Thanks a lot to everyone who attended the Infinispan sessions I gave at Great Indian Developer Summit! Your questions after the talks were really insightful.

One of the talks I gave was titled Big Data in Action With Infinispan (slides are available here), where I was looking at how Infinispan-based in-memory data grids can help you deal with the problems of real-time Big Data and how to do Big Data analytics.

During the talk, I live-coded a demo showing both real-time and analytics parts, running on top of OpenShift and using Vert.x for joining the different parts. The demo repository contains background information on instructions to get started with the demo, but I thought it'd be useful to get focused step-by-step instructions in this blog post.

Set Up

Before we start with any of the demos, it's necessary to run some set up steps:set up steps:

  1. Check out git repo: git clone https://github.com/galderz/swiss-transport-datagrid.

  2. Install OpenShift Origin or Minishift to get an OpenShift environment running on your own machine. I decided to use OpenShift Origin, so the instructions below are tailored for that environment, but similar instructions could be used with Minishift.

  3. Install Anaconda for Python 3 (this is required to run Jupyter notebook for plotting).

Demo Domain

Once the setup is complete, it's time to talk about the demos before we run them.

Both demos below work with the same application domain: Swiss Rail Transport Systems. In this domain, we differentiate between physical stations, trains, station boards that are located in stations, and finally stops, which are individual entries in station boards.

Real-Time Demo

The first demo is about working with real-time data from station boards around the country and presenting a centralized dashboard of delayed trains around the country. The following diagrams show how the following components interact with each other to achieve this:

Infinispan, which provides the in-memory data grid storage, and Vert.x, which provides the glue for the centralized delayed dashboard to interact with Infinispan, all run within OpenShift cloud.

Within the cloud, the Injector verticle cycles through station board data and injects it into Infinispan. Also within the cloud, a Vert.x verticle that uses Infinispan's Continuous Query to listen for station board entries that are delayed, and these are pushed into the Vert.x event bus, which in turn, via a SockJS bridge, get consumed via WebSockets from the dashboard. The centralized dashboards are written with JavaFX and run outside the cloud.

To run the demo, do the following:

  1. Start OpenShift Origin if you've not already done so: oc cluster up --public-hostname=127.0.0.1.

  2. Deploy all the OpenShift cloud components:

    cd ~/swiss-transport-datagrid
    ./deploy-all.sh
  3. Open the OpenShift console and verify that all pods are up.

  4. Load the GitHub repository into your favorite IDE and run the delays.query.continuous.fx.FxApp Java FX application. This will load the centralized dashboard. Within seconds, delayed trains will start appearing. For example:

Analytics Demo

The first demo is focused on how you can use Infinispan for doing offline analytics. In particular, this demo tries to answer the following question:

What is the time of the day when there is the biggest ratio of delayed trains?

Once again, this demo runs on top of OpenShift cloud, uses Infinispan as in-memory data grid for storage and Vert.x for gluing components together.

To answer this question, Infinispan data grid will be loaded with three weeks worth of data from station boards using a Vert.x verticle. Once the data is loaded, the Jupyter notebook will invoke an HTTP restful endpoint which will invoke an Vert.x verticle called AnalyticsVerticle.

This verticle will invoke a remote server task which will use Infinispan Distributed Java Streams to calculate the two pieces of information required to answer the question: per hour, how many trains are going through the system, and out of those, how many are delayed.

An important aspect to bear in mind about this server tasks is that it will only be executed in one of the nodes in the cluster. It does not matter which one. In turn, this node will ship the lambdas required to do the computation to each of the nodes so that they can executed against their local data. The other nodes will reply with the results and the node where the server task was invoked will aggregate the results.

The results will be sent back to the originating invoker, the Jupyter notebook which will plot the results. The following diagrams shows how the following components interact with each other to achieve this:

Here is the demo step-by-step guide:

  1. Start OpenShift Origin and deploy all components as shown in the previous demo.

  2. Start the Jupyter notebook:

    cd ~/swiss-transport-datagrid/analytics/analytics-jupyter
    ~/anaconda/bin/jupyter notebook
  3. Once the notebook opens, click to open the live-demo.ipynb notebook and execute each of the cells in order. You should end up seeing a plot like this:

So, the answer to the question is 2 AM! That's because last connecting trains of the day wait for each other to avoid leaving passengers stranded.

Conclusion

This has been a summary of the demos that I presented at Great Indian Developer Summit with the intention of getting you running these demos as quickly as possible. The repository contains more detailed information on these demos. If there's anything unclear or any of the instructions above are not working, please let us know!

Once again, a very special thanks to Alexandre Masselot for being the inspiration for these demos. Merci, Alex! Over the next few months, we will be enhancing the demo and hopefully, we'll be able to do some more live demonstrations at other conferences.

Big data OpenShift jupyter notebook Infinispan Data grid

Published at DZone with permission of Galder Zamarreno, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Orchestration Pattern: Managing Distributed Transactions
  • Implementing PEG in Java
  • 11 Observability Tools You Should Know
  • Beyond Coding: The 5 Must-Have Skills to Have If You Want to Become a Senior Programmer

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: