Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Fun Example of Streaming Data Into Minecraft

DZone's Guide to

A Fun Example of Streaming Data Into Minecraft

This project shows how to use three data engineering tools to visualize data in a video game. It aims to solve a common data engineering problem with a twist to make it fun.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Angel Alvarado is a senior software engineer at One Degree, a San Francisco-based non-profit, and also helps run the Molanco data engineering community. In his spare time, Angel enjoys playing Minecraft with his 11 year-old-cousin. Recently, Angel found a fun way to combine his gaming with data engineering. This blog entry, reposted from the original with Angel's kind permission, picks up the story...

Data engineering can get really complex really quickly, and being aware of the hundreds of tools and data platforms in the industry can get very overwhelming. The following project is about how to use three data engineering tools to visualize data in a video game. It aims to solve a common data engineering problem with a twist to make it fun and entertaining.

I got involved with Minecraft thanks to my 11-year-old cousin who lives in Mexico City. He actually taught me how to play Minecraft and inspired me to combine it with data engineering.

After playing for a while, I realized that I actually enjoyed it and it was a great way for my little cousin and I to connect even though 2,000 miles separated us.

Eventually, I decided to combine data engineering with this video game, and for this, I used the following data engineering tools:

The plan for this project was to create a map of the world where we could see the location of users visiting a website in real-time. Luckily, StreamSets helped us to get ahead of the game and we were able to do this really quickly using StreamSets Data Collector!

Here's what the project looks like when it's running:

  • Apache Kafka: Here at the Molanco Data Engineering community, this is our preferred tool when it comes to processing events in real-time. If you are looking into building publisher/subscriber distributed systems, this is a great piece of software to start with. Lately, some of our members have been moving away from Kinesis and instead using Kafka for their data architectures.

  • StreamSets Data Collector: If you are a fan of ETLing and love developing customized ETL processes, I'd encourage you to look at StreamSets. We've been using this tool for 1.5 years now. It's open-source and seeing how it has matured so quickly gives us hope that it's here to stay. SDC provides dozens of connectors out-of-the-box: connectors to Hadoop, Hive, ElasticSearch, SQL databases, Jython processors, and much more.

  • Docker: This may not be news to anybody, but we believe that microservices and containers are where the industry is heading. If you are in the DevOps or/and data engineering worlds and you are still using VMs, it's about time to explore Docker — it's going to be worth it. Docker was used for this project to allow anyone to replicate it with just one command: $ docker compose up.

This project was presented at Strata Data Conference 2018:


And Data Day México 2018:

You can also find the slides here.

References

The easiest way to replicate this project is by using the code in GitHub. Feel free to reach out with any questions.

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Topics:
data streaming ,big data ,apache kafka ,streamsets ,docker ,minecraft

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}