DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Apache Kafka Is NOT Real Real-Time Data Streaming!

Apache Kafka Is NOT Real Real-Time Data Streaming!

Learn how Apache Kafka enables low latency real-time use cases in milliseconds, but not in microseconds; learn from stock exchange use cases at NASDAQ.

Kai Wähner user avatar by
Kai Wähner
CORE ·
Mar. 17, 23 · Presentation
Like (1)
Save
Tweet
Share
3.40K Views

Join the DZone community and get the full member experience.

Join For Free

Real-time data beats slow data. It is that easy! But what is real-time? The term always needs to be defined when discussing a use case. Apache Kafka is the de facto standard for real-time data streaming. Kafka is good enough for almost all real-time scenarios. But dedicated proprietary software is required for niche use cases. Kafka is NOT the right choice if you need microsecond latency! This article explores the architecture of NASDAQ that combines critical stock exchange trading with low-latency streaming analytics.

Kafka Real Time?

What Is Real-Time Data Streaming?

Apache Kafka is the de facto standard for data streaming. However, every business has a different understanding of real-time data. And Kafka cannot solve every real-time problem.

Hard real-time is a deterministic network with zero spikes and zero latency, which is a requirement for embedded systems using programming languages like C, C++, or Rust to implement safety-critical software like flight control systems or collaborative robots (cobots). Apache Kafka is not the right technology for safety-critical latency requirements.

Soft real-time is data processing in a non-deterministic network with potential latency spikes. Data is processed in near real-time. That can be microseconds, milliseconds, seconds, or slower.

Real-Time Categories for Data Streaming With Apache Kafka

I typically see three kinds of real-time use cases. But even here, Apache Kafka does not fit into each category:

  1. Critical real-time: Limited set of use cases that require data processing in microseconds. 
    • Famous use case:
      • Trading markets in financial services. This is NOT Kafka.
  2. Low-latency real-time: Fast data processing is required in tens or hundreds of milliseconds to enable specific use cases. 
    • Examples: 
      • Sensor analytics in manufacturing, end-to-end data correlation in ride-hailing between mobile apps and backends, and fraud detection in instant payments. This is Kafka.
  3. Near real-time: Fast data processing improves the business process but is not mandatory. 
    • For instance: 
      • Data ingestion (streaming ETL) into a data warehouse is better in seconds than a batch process that runs every night. This is Kafka.

Note: this article focuses on Apache Kafka as it is the de facto standard for data streaming. However, the same is true for many complementary or competitive technologies, like Spark Streaming, Apache Flink, Apache Pulsar, or Redpanda.

Let’s look at a concrete example of the financial services industry.

NASDAQ Market Feeds vs. Real-Time Data

Ruchir Vani, the Director of Software Engineering at Nasdaq, presented at the “Current 2022—the Next Generation of Kafka Summit” in Austin, Texas: Improving the Reliability of Market Data Subscription Feeds.

The Nasdaq Stock Market (National Association of Securities Dealers Automated Quotations Stock Market) is an American stock exchange based in New York City. It is ranked second on the list of stock exchanges by market capitalization of shares traded, behind the New York Stock Exchange. The exchange platform is owned by Nasdaq, Inc. While most people only know the stock exchange, it is just the tip of the iceberg.

Nasdaq Exchange/Technology

Source: Nasdaq, Inc.

Nasdaq Cloud Data Service Powered by Apache Kafka

The Nasdaq Cloud Data Service has democratized access to financial data for companies, researchers, and educators. These downstream consumers have different requirements and SLAs.

The core engine for processing the market data feeds requires sub-15 microsecond latency. This is NOT Kafka but dedicated (expensive) proprietary software. Consumers need to be co-located and use optimized applications to leverage data at that speed.

Speed

Source: Nasdaq, Inc.

NASDAQ, Inc. wanted to capture more market share by providing additional services on top of the critical market feed. They built a service on top called Nasdaq Data Link Streaming. Kafka powers this:

Data Link Streaming

Source: Nasdaq, Inc.

The following architecture shows the combination. Critical real-time workloads run in the Nasdaq Data Centers. The data feeds are replicated to the public cloud for further processing and analytics. Market traders need to co-locate with the critical real-time engine. Other internal and external subscribers (like research and education consumers) consume from the cloud with low latency, in near real-time, or even in batches from the historical data store:

Architecture

Source: Nasdaq, Inc.

Real Real-Time Is Proprietary (Expensive) Technology and Rarely Needed

Real-time data beats slow data. Apache Kafka and similar technologies, like Apache Flink, Spark Streaming, Apache Pulsar, Redpanda, Amazon Kinesis, Google Pub Sub, RabbitMQ, and so on, enable low latency real-time messaging or data streaming.

Apache Kafka became the de facto standard for data streaming because Kafka is good enough for almost all use cases. Most use cases do not even care if end-to-end processing takes 10ms, 100ms, or 500ms (as downstream applications are not built for that speed anyway). Niche scenarios require dedicated technology. Kafka is NOT the right choice if you need microsecond latency! The NASDAQ example showed how critical proprietary technology and low-latency data streaming work very well together.

If you want to see more use cases, read my article about low-latency data streaming with Apache Kafka and cloud-native 5G infrastructure.

What kind of real-time do you need in your projects? When do you need critical real-time? If you “just” need low latency, what use case is Kafka not good enough for?

De facto standard Use case Cloud Data (computing) kafka

Published at DZone with permission of Kai Wähner, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Microservices 101: Transactional Outbox and Inbox
  • Getting a Private SSL Certificate Free of Cost
  • What Is Advertised Kafka Address?
  • What Are the Different Types of API Testing?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: