DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Building a Flask Web Application With Docker: A Step-by-Step Guide
  • Low Code vs. Traditional Development: A Comprehensive Comparison
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • What Is Envoy Proxy?

Trending

  • Building a Flask Web Application With Docker: A Step-by-Step Guide
  • Low Code vs. Traditional Development: A Comprehensive Comparison
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • What Is Envoy Proxy?
  1. DZone
  2. Data Engineering
  3. Big Data
  4. What Is Data Streaming?

What Is Data Streaming?

Data streaming is an extremely important process in the world of big data. Read on to learn a little more about how it helps in real-time analyses and data ingestion.

Garrett Alley user avatar by
Garrett Alley
·
Nov. 05, 18 · Analysis
Like (10)
Save
Tweet
Share
40.23K Views

Join the DZone community and get the full member experience.

Join For Free

Data Streaming Defined

Visualize a river. Where does the river begin? Where does the river end? Intrinsic to our understanding of a river is the idea of flow. The river has no beginning and no end. Streaming data is ideally suited to data that has no discrete beginning or end. For example, data from a traffic light is continuous and has no "start" or "finish." Data streaming is the process of sending data records continuously rather than in batches. Generally, data streaming is useful for the types of data sources that send data in small sizes (often in kilobytes) in a continuous flow as the data is generated. This may include a wide variety of data sources such as telemetry from connected devices, log files generated by customers using your web applications, e-commerce transactions, or information from social networks or geospatial services.

Traditionally, data is moved in batches. Batch processing often processes large volumes of data at the same time, with long periods of latency. For example, the process is run every 24 hours. While this can be an efficient way to handle large volumes of data, it doesn't work with data that is meant to be streamed because that data can be stale by the time it is processed.

Data streaming is optimal for time series and detecting patterns over time. For example, tracking the length of a web session. Most IoT data is well-suited to data streaming. Things like traffic sensors, health sensors, transaction logs, and activity logs are all good candidates for data streaming.

This streamed data is often used for real-time aggregation and correlation, filtering, or sampling. Data streaming allows you to analyze data in real time and gives you insights into a wide range of activities, such as metering, server activity, geolocation of devices, or website clicks.

Consider the following scenarios:

  • A financial institution tracks market changes and adjusts settings to customer portfolios based on configured constraints (such as selling when a certain stock value is reached).
  • A power grid monitors throughput and generates alerts when certain thresholds are reached.
  • A news source streams clickstream records from its various platforms and enriches the data with demographic information so that it can serve articles that are relevant to the audience demographic.
  • An e-commerce site streams clickstream records to find anomalous behavior in the data stream and generates a security alert if the clickstream shows abnormal behavior.

Data Streaming Challenges

Data streaming is a powerful tool, but there are a few challenges that are common when working with streaming data sources. The following list shows a few of the things to plan for when data streaming:

  • Plan for scalability.
  • Plan for data durability.
  • Incorporate fault tolerance in both the storage and processing layers.

Data Streaming Tools

With the growth of streaming data, comes a number of solutions geared for working with it. The following list shows a few popular tools for working with streaming data:

  • Amazon Kinesis Firehose. Amazon Kinesis is a managed, scalable, cloud-based service which allows real-time processing of large data streams.
  • Apache Kafka. Apache Kafka is a distributed publish-subscribe messaging system which integrates applications and data streams.
  • Apache Flink. Apache Flink is a streaming data flow engine which provides facilities for distributed computation over data streams.
  • Apache Storm. Apache Storm is a distributed real-time computation system. Storm is used for distributed machine learning, real-time analytics, and numerous other cases, especially with high data velocity.
Data stream

Published at DZone with permission of Garrett Alley, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Building a Flask Web Application With Docker: A Step-by-Step Guide
  • Low Code vs. Traditional Development: A Comprehensive Comparison
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • What Is Envoy Proxy?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: