DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
Hadoop on AmpereOne Reference Architecture
Hadoop on AmpereOne M shows improved throughput, scaling, and efficiency, with setup, tuning, and benchmark insights for optimizing big data workloads.
April 3, 2026
by RamaKrishna Nishtala
· 5,348 Views
article thumbnail
End-to-End Streaming Optimization: Kafka to Delta With Exactly-Once Guarantees
Kafka feeds the stream, Spark tracks progress via checkpoints, and Delta's transaction log ensures every event lands exactly once, even across failures and restarts.
April 1, 2026
by Seshendranath Balla Venkata
· 2,455 Views · 2 Likes
article thumbnail
Delta Change Data Feed Deep Dive: Building Incremental Pipelines Without Complexity
Delta CDF in Databricks enables pipelines to process only changed rows with commit metadata, simplifying incremental ETL without full scans.
April 1, 2026
by Seshendranath Balla Venkata
· 2,752 Views · 1 Like
article thumbnail
Queues Don't Absorb Load — They Delay Bankruptcy
Queues hide overload. Without back-pressure, limits, and scaling, lag just grows until failure. Bound queues, alert on lag, fail fast, and plan capacity.
March 30, 2026
by David Iyanu Jonathan
· 1,577 Views · 2 Likes
article thumbnail
Scaling Kafka Consumers: Proxy vs. Client Library for High-Throughput Architectures
Scaling Apache Kafka consumption requires new patterns; proxy layers and client libraries offer practical solutions for high-throughput.
March 30, 2026
by Kai Wähner DZone Core CORE
· 1,289 Views · 3 Likes
article thumbnail
Stop Leap-Second AI Drift in IoT Streams With PySpark
Leap seconds can corrupt timestamps and trigger AI drift in fintech IoT systems. Learn about drift types and how PySpark streaming fixes them in real time.
March 27, 2026
by Ram Ghadiyaram DZone Core CORE
· 2,047 Views · 1 Like
article thumbnail
From Stream to Strategy: How TOON Enhances Real-Time Kafka Processing for AI
The TOON data format specifically targets the propagation of structured, validated, and semantically consistent data, thereby reducing ambiguity in real time.
March 27, 2026
by Gautam Goswami DZone Core CORE
· 1,881 Views
article thumbnail
The Phantom Write Problem: Why Your Idempotency Implementation Is Silently Losing Data
A practical explanation of why idempotent APIs still produce phantom writes in production, and a race-free, transactional pattern to prevent them.
March 24, 2026
by Saumya Tyagi
· 2,831 Views · 2 Likes
article thumbnail
From DLT to Lakeflow Declarative Pipelines: A Practical Migration Playbook
Migrating from DLT to Lakeflow is mostly an API refactor, swapping DLT for pipelines, separating streaming and materialized tables, and updating CDC logic.
March 19, 2026
by Seshendranath Balla Venkata
· 3,848 Views · 1 Like
article thumbnail
How Piezoelectric Energy Harvesting Is Solving the Battery Waste Crisis in Industrial IoT
Industrial piezoelectric sensors decouple IIoT reliability from battery dependence that compromises data resolution and responsiveness.
March 18, 2026
by Emily Newton
· 3,553 Views
article thumbnail
Online Feature Store for AI and Machine Learning with Apache Kafka and Flink
Real-time feature store powered by Apache Kafka and Flink enable fast, scalable AI personalization with fresh data and low-latency processing.
March 16, 2026
by Kai Wähner DZone Core CORE
· 2,802 Views · 1 Like
article thumbnail
Why Reporting Is the Hardest Problem in Enterprise SaaS (And How We Solved It in Workday)
Workday solves real-time reporting with unified data, in-memory architecture and embedded analytics for fast, secure insights.
March 13, 2026
by Suresh Kurapati
· 3,001 Views
article thumbnail
How We Rebuilt a Legacy HBase + Elasticsearch System Using Apache Iceberg, Spark, Trino, and Doris
We replaced HBase + Elasticsearch with an Iceberg lakehouse, cutting cost and complexity while supporting analytics and near-real-time access.
March 10, 2026
by Mikhail Povolotskii
· 3,852 Views · 1 Like
article thumbnail
Square, SumUp, Shopify: Data Streaming for Real-Time Point-of-Sale (POS)
POS systems are transforming into real-time, AI-driven platforms, fueled by mobile payments, Kafka, and Flink to empower every retail merchant.
March 9, 2026
by Kai Wähner DZone Core CORE
· 3,251 Views
article thumbnail
Databricks Lakeflow Spark Declarative Pipelines Migration From Non‑Unity Catalog to Unity Catalog
Migrating DLT to Unity Catalog mainly involves updating table references, permissions, and removing path-based access while keeping pipeline logic largely unchanged.
March 4, 2026
by Seshendranath Balla Venkata
· 1,048 Views · 1 Like
article thumbnail
5 Surprising Truths About Scaling Apache Spark
Strategies for optimizing Apache Spark performance by addressing core bottlenecks like data shuffling, join inefficiencies, and excessive data scanning.
March 3, 2026
by Anurag Malik
· 935 Views
article thumbnail
Unified Intelligence: Mastering the Azure Databricks and Azure Machine Learning Integration
Bridge the gap between Big Data and production ML. Learn to integrate Azure Databricks with Azure Machine Learning for a seamless, scalable end-to-end MLOps workflow.
February 27, 2026
by Jubin Abhishek Soni DZone Core CORE
· 1,210 Views
article thumbnail
The Hidden Cost of Custom Logic: A Performance Showdown in Apache Spark
A deep dive into PySpark UDF performance, showing why standard Python UDFs slow pipelines and when to use Pandas UDFs or native Spark functions instead.
February 26, 2026
by Abhilash Rao Mesala
· 1,571 Views
article thumbnail
AWS SageMaker HyperPod: Distributed Training for Foundation Models at Scale
Master distributed training at scale with AWS SageMaker HyperPod's resilient cluster management and high-performance interconnects.
February 19, 2026
by Jubin Abhishek Soni DZone Core CORE
· 1,392 Views
article thumbnail
Green AI in Practice: How I Track GPU Hours, Energy, CO₂, and Cost for Every ML Experiment
Practice Green AI by tracking GPU-hours, energy, and cost for every ML run, so you pick models that are not just accurate, but also cheaper, leaner, and greener.
February 13, 2026
by Sai Teja Erukude
· 2,172 Views · 2 Likes
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×