Streaming ETL With Apache Flink
Join the DZone community and get the full member experience.
Join For FreeStreaming data computation is becoming more and more common with the growing Big Data landscape. Many enterprises are also adopting or moving towards streaming for message passing instead of relying solely on REST APIs.
Apache Flink has emerged as a popular framework for streaming data computation in a very short amount of time. It has many advantages in comparison to Apache Spark (e.g. lightweight, rich APIs, developer-friendly, high throughput, an active and vibrant community).
When I started working on a new project where I had to process streaming data (e.g. events, server logs), after initial research, I found Flink to be the most suitable framework for my particular use case.
This blog is based on my work in Flink, starting with a simple example to a subset of real-world use cases. I also share a few exceptions and ways to solve them, which will help beginners.
You may also like: The State of ETL: Traditional to Cloud.
Links to the articles in this series and a short summary of the content. (All code examples are available on GitHub.)
Part 1 - Getting started guide, I share an example of computing sum of Integers generated as a stream using custom SourceFunction
and a TumblingWindow
(fixed size, fixed time, non-overlapping).
Part 2 - Improving upon from part 1, in this article, I share an example of keyed data stream computation. This one uses Flink's reduce
and sum
methods to achieve the same result.
Part 3 - Changing gear, I take a subset of a real-world use case of Flink (see this post from zalando.com). I share an example of how to process connectivity events to identify a simple pattern.
Part 4 - Improving upon the example from part 3, I share how to achieve the same result using Flink's CEP.
Upcoming Articles on Flink
1. Moving towards more real-world, deployment use cases, I will share how to set up and use Flink in a cluster mode. This will also have a DB and Grafana to complete the tutorial end to end.
2. AWS Kinesis Stream with the same example as above.
Further Reading
- Top 5 Enterprise ETL Tools.
- Things to Understand Before Implementing ETL Tools.
- Transforming ETL for Data-Driven Age.
Opinions expressed by DZone contributors are their own.
Trending
-
Stack in Data Structures
-
DevOps vs. DevSecOps: The Debate
-
Personalized Code Searches Using OpenGrok
-
Auditing Tools for Kubernetes
Comments