DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. How a Stream Works

How a Stream Works

DZone 's Guide to

How a Stream Works

In this post, we take look at Node.js and Java Streams and tools like Apache Kafka and Amazon Kinesis, with an overview of each tool.

by
Raphael Amoedo
CORE ·
Jun. 27, 18 ·
Free Resource
Like (22)

Comment (0)

Save
Tweet
Share
{{ articles[0].views | formatCount}} Views
  • Edit
  • Delete
  • Delete without notifying
  • {{ articles[0].isLocked ? 'Enable' : 'Disable' }} comments
  • {{ articles[0].isLimited ? 'Remove comment limits' : 'Enable moderated comments' }}

Join the DZone community and get the full member experience.

Join For Free

A stream is a sequence of elements. An array is a data structure that stores a sequence of values. Then, a stream is an array? Well, not really - let's look at what a stream really is and see how it works.

First of all, streams don't store elements, an array does. So, no, a stream is not an array. Also, while collections and arrays have a finite size, streams don't. But, if a stream doesn't store elements, how can it be a sequence of elements?

Streams are actually a sequence of data being moved from one point to the another, but they're computed on demand. So, they have at least one source, like arrays, lists, I/O resources, and so on. Let's take a file for an example: when a file is opened for editing, all or part of it remains in memory, thus allowing for changes, so only when it is closed there's a guarantee that no data will be lost or damaged.

Fortunately, a stream can read/write data chunk by chunk, without buffering the whole file at once. Just so you know, a buffer is a region of a physical memory storage (usually RAM) used to temporarily store data while it is being moved from one place to another.

Node.js has four stream types and that are worthy of mentioning:

  • Writable - streams to which data can be written (for example, writing to a file, sending HTTP request/response).
  • Readable - streams from which data can be read (for example reading from a file, receiving HTTP request/response).
  • Duplex - streams that are both Readable and Writable (for example, a TCP socket).
  • Transform - Duplex streams that can modify or transform the data as it is written and read it (for example, a zlib compression file).

Functions that operate on a stream and produce another stream are known as filters and can be connected in pipelines, like below:

Arrays.asList(10,3,13,4,1,52)
  .stream()
      .filter(number -> number % 2 == 0) //10,4,52
      .sorted() //4,10,52
      .skip(1) //10,52
      .forEach(System.out::println); //prints 10 and prints 52

When it comes to Java Streams, what is interesting is that they provide lazy evaluation. The Javadoc says:

Stream operations are divided into intermediate ( Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.

So, if I do this:

List<Integer> numbers = Arrays.asList(10,3,13,4,1,52);
Stream<Integer> numberStream = numbers.stream()
      .filter(number -> number % 2 == 0) //10,4,52
      .sorted() //4,10,52
      .skip(1) //10,52
      .peek(System.out::println); //used to execute something while stream is processing

The stream is not executed yet, because it is smart enough to wait for a terminal operation to be called, like forEach, reduce, anyMatch, and so on. In addition to having declarative style, it is also smart enough to stop as soon as the terminal operation is met. For example:

Integer integer = Arrays.asList(10,3,13,4,1,52)
    .stream()
      .filter(number -> number % 2 == 0)
      .sorted()
      .skip(1)
      .peek(System.out::println) //it prints only 10 instead of 10 and 52
      .findFirst().get();

Because of sorted() on the above stream, the filter method will run on the whole stream, but skip doesn't run on the whole filtered and sorted stream. Let's see this example:

Integer integer = Arrays.asList(10,3,13,4,1,52)
    .stream()
      .filter(number -> number % 2 == 0)
      .findFirst().get();

Java streams

Some might think that filter would run on every element and then find first, but as I said: the Java stream is smart enough.

Another interesting thing about Java Streams is parallel streams:

Arrays.asList(10,3,13,4,1,52,2,6,8)
    .parallelStream()
      .filter(number -> number % 2 == 0)
      .forEach(number -> System.out.println(Thread.currentThread())); //prints which thread is being executed

When a stream executes in parallel, the Java runtime partitions the stream into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.

Java Parallel Streams

Now that you know the concepts of how streams work, let's look at some tools.

Apache Kafka

Kafka is a distributed streaming platform that has three key capabilities:

  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
  • Store streams of records in a fault-tolerant, durable way.
  • Process streams of records as they occur.

The purpose is to enable real-time processing of streams and it supports many data sources using Kafka Connect (such as JDBC, ActiveMQ, REST API, and others). Some use cases are: Messaging, Website Activity Tracking, Metrics, Log Aggregation, Stream Processing, Event Sourcing, and Commit Logs.

Here is the anatomy of an application that uses the Kafka Streams API. It provides a logical view of a Kafka Streams application that contains multiple stream threads, that each contain multiple stream tasks.

Kafka Streams

Amazon Kinesis

Amazon Kinesis is the fully managed Amazon Web Service (AWS) offering for collecting, processing, and analyzing video and data streams in real time. Amazon shows four capabilities:

  • Kinesis Video Streams - Capture, process, and store video streams.

  • Kinesis Data Streams - Capture, process, and store data streams.

  • Kinesis Data Firehose - Load data streams into AWS data stores.

  • Kinesis Data Analytics - Analyze data streams with standard SQL. 

The purpose is also to enable real-time processing of streams and some use cases: build video analytics applications, evolve from batch to real-time analytics, build real-time applications, and analyze IoT device data.

Here is how Kinesis Data Streams usually works:

Amazon Kinesis Data Streams

Summarizing

That's how stream works. We saw a little bit about Node.js streams and Java Streams and tools like Apache Kafka and Amazon Kinesis, plus an overview of each tool. That's it, I hope you liked it.

Like This Article? Read More From DZone

related article thumbnail

DZone Article

related article thumbnail

DZone Article

related refcard thumbnail

Free DZone Refcard

DOWNLOAD
Stream (computing) kafka Data (computing) Data structure AWS

Opinions expressed by DZone contributors are their own.

Partner Resources

X

    {{ editionName }}

  • {{ node.blurb }}
    {{ node.type }}
    Trend Report

    {{ ::node.title }}

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.linkDescription }}

{{ parent.urlSource.name }}
by
CORE
· {{ parent.articleDate | date:'MMM. dd, yyyy' }} {{ parent.linkDate | date:'MMM. dd, yyyy' }}
Tweet
{{ parent.views }} ViewsClicks
  • Edit
  • Delete
  • {{ parent.isLocked ? 'Enable' : 'Disable' }} comments
  • {{ parent.isLimited ? 'Remove comment limits' : 'Enable moderated comments' }}