DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Flex for J2EE Developers: The Case for Granite Data Services
  • Enterprise RIA With Spring 3, Flex 4 and GraniteDS
  • Cloud Migration Checklist
  • Real-Time Edge Application With Apache Pulsar

Trending

  • DGS GraphQL and Spring Boot
  • Agile’s Quarter-Century Crisis
  • Automating Data Pipelines: Generating PySpark and SQL Jobs With LLMs in Cloudera
  • Beyond Simple Responses: Building Truly Conversational LLM Chatbots
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Data Streaming Using Apache Flink and Apache Ignite

Data Streaming Using Apache Flink and Apache Ignite

How to build a simple data streaming application using Apache Flink and Apache Ignite and create stream processing topology.

By 
Saikat Maitra user avatar
Saikat Maitra
·
Sep. 29, 18 · Tutorial
Likes (12)
Comment
Save
Tweet
Share
13.7K Views

Join the DZone community and get the full member experience.

Join For Free

Stream Processing Topology

Repo: https://github.com/samaitra/streamers

Apache IgniteSink offers a streaming connector to inject Flink data into the Ignite cache. The sink emits its input data to the Ignite cache. The key feature to note is the performance and scale both Apache Flink and Apache Ignite offer. Apache Flink can process unbounded and bounded data sets and has been designed to run stateful streaming applications at scale. Application computation is distributed and concurrently executed in clusters. Apache Flink is also optimized for local state access for tasks and does checkpointing of local state for durability. Apache Ignite provides streaming capabilities that allow data ingestion at a high scale in its in-memory data grid.

In this article, we will discuss how we can build a data streaming application using Apache Flink and Apache Ignite. Building a data streaming application offers the benefit of ingesting large finite and infinite volumes of data in an optimized and fault tolerant way into the Ignite cluster. The data ingestion rate is very high and can scale up to millions of events per second.

Setup: Download and Start Flink

Download a binary from the downloads page. You can pick any Hadoop/Scala combination you like. If you plan to just use the local file system, any Hadoop version will work fine. Go to the download directory.

Unpack the Downloaded Archive

$ cd ~/Downloads        # Go to download directory
$ tar xzf flink-*.tgz   # Unpack the downloaded archive
$ cd flink-1.5.0

Start a Local Flink Cluster

$ ./bin/start-cluster.sh  # Start Flink

Check the Dispatcher’s web front-end at http://localhost:8081 and make sure everything is up and running. The web front-end should report a single available TaskManager instance.

Dispatcher: Overview

You can also verify that the system is running by checking the log files in the logs directory:

$ tail log/flink-*-standalonesession-*.log

Download Kafka

Download a binary from the downloads page (https://kafka.apache.org/downloads). You can pick Apache Kafka 0.10.2.2version with scala 2.11.

Start a Zookeeper Server

$./bin/zookeeper-server-start.sh ./config/zookeeper.properties

Start Broker

./bin/kafka-server-start.sh ./config/server.properties 

Create Topic “mytopic”

$ ./bin/kafka-topics.sh --create --topic mytopic --zookeeper localhost:2181 --partitions 1 --replication-factor 1

Describe the Topic "mytopic"

$ ./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic mytopic

Produce Something Into the Topic (Write Something and Hit Enter)

$ ./bin/kafka-console-producer.sh --topic mytopic --broker-list localhost:9092

Consume From the Topic Using the Console Producer

$ ./bin/kafka-console-consumer.sh --topic mytopic --zookeeper localhost:2181

Clone Apache Ignite

As of writing this document, the IgniteSink support for data streaming application in Flink cluster is available in the masterbranch.

$ git clone https://github.com/apache/ignite

Build Apache Ignite

$ mvn clean package install -DskipTests

Build the Flink Program

$ mvn clean package

Submit the Flink Program

$ ./bin/flink run streamers-1.0-SNAPSHOT.jar

Produce Something in the Topic (Write Something and Hit Enter)

$ ./bin/kafka-console-producer.sh --topic mytopic --broker-list localhost:9092

The .out file will print the counts at the end of each time window as long as words are floating in, for example:

$ tail -f log/flink-*-taskexecutor-*.out
lorem : 1
bye : 1
ipsum : 4

Ignite REST Service

To check the cache key values you can use the Ignite REST service

$ curl -X GET http://localhost:8080/ignite\?cmd\=getall\&k1\=jam\&cacheName\=testCache

Scan Cache

To check all the keys from an Ignite cache the following REST service can be used

$ curl -X GET http://localhost:8080/ignite?cmd=qryscanexe&pageSize=10&cacheName=testCache

Ignite Web Console

Ignite Web Console Build Instructions

  1. Install MongoDB (version >=3.2.0 <=3.4.15) using instructions from http://docs.mongodb.org/manual/installation.
  2. Install Node.js (version >=8.0.0) using the installer from https://nodejs.org/en/download/current for your OS.
  3. Change directory to 'modules/web-console/backend' and run "npm install --no-optional" for download backend dependencies.
  4. Change the directory to 'modules/web-console/frontend' and run npm install --no-optional for downloading front-end dependencies.
  5. Build an ignite-web-agent module to follow instructions from 'modules/web-console/web-agent/README.txt'.
  6. Copy ignite-web-agent-.zip from 'modules/web-console/web-agent/target' to the 'modules/web-console/backend/agent_dists' folder.
  7. Unzip ignite-web-agent-.zip in 'modules/web-console/backend/agent_dists'.
  8. Run './ignite-web-agent.sh' inside the ignite-web-agent- folder

Steps 1-4 should be executed once.

Run Ignite Web Console in Development Mode

  1. Configure MongoDB to run as a service or in the terminal change dir to $MONGO_INSTALL_DIR/server/3.2/bin and start MongoDB by executing mongod.
  2. In a new terminal, change the directory to 'modules/web-console/backend'. If needed run npm install --no-optional(if dependencies changed) and run npm start to start the backend.
  3. In a new terminal, change the directory to 'modules/web-console/frontend'. If needed run npm install --no-optional (if dependencies changed) and start Webpack in development mode npm run dev.
  4. In the browser open: http://localhost:9000

The web console can be used to scan cache and view all the cache contents.

To stop Flink when you’re done, type:

$ ./bin/stop-cluster.sh

Summary

We covered how we can build a simple data streaming application using Apache Flink and Apache Ignite and create stream processing topology that will allow data streaming in a distributed, scalable, and fault tolerant way that can process unbounded data sets consisting of millions of events.

Apache Flink Apache Ignite Data (computing) kafka application Download Web Service

Published at DZone with permission of Saikat Maitra. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Flex for J2EE Developers: The Case for Granite Data Services
  • Enterprise RIA With Spring 3, Flex 4 and GraniteDS
  • Cloud Migration Checklist
  • Real-Time Edge Application With Apache Pulsar

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!