DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
Online Feature Store for AI and Machine Learning with Apache Kafka and Flink
Real-time feature store powered by Apache Kafka and Flink enable fast, scalable AI personalization with fresh data and low-latency processing.
March 16, 2026
by Kai Wähner DZone Core CORE
· 2,853 Views · 1 Like
article thumbnail
Why Reporting Is the Hardest Problem in Enterprise SaaS (And How We Solved It in Workday)
Workday solves real-time reporting with unified data, in-memory architecture and embedded analytics for fast, secure insights.
March 13, 2026
by Suresh Kurapati
· 3,059 Views
article thumbnail
How We Rebuilt a Legacy HBase + Elasticsearch System Using Apache Iceberg, Spark, Trino, and Doris
We replaced HBase + Elasticsearch with an Iceberg lakehouse, cutting cost and complexity while supporting analytics and near-real-time access.
March 10, 2026
by Mikhail Povolotskii
· 3,927 Views · 1 Like
article thumbnail
Square, SumUp, Shopify: Data Streaming for Real-Time Point-of-Sale (POS)
POS systems are transforming into real-time, AI-driven platforms, fueled by mobile payments, Kafka, and Flink to empower every retail merchant.
March 9, 2026
by Kai Wähner DZone Core CORE
· 3,296 Views
article thumbnail
Databricks Lakeflow Spark Declarative Pipelines Migration From Non‑Unity Catalog to Unity Catalog
Migrating DLT to Unity Catalog mainly involves updating table references, permissions, and removing path-based access while keeping pipeline logic largely unchanged.
March 4, 2026
by Seshendranath Balla Venkata
· 1,131 Views · 1 Like
article thumbnail
5 Surprising Truths About Scaling Apache Spark
Strategies for optimizing Apache Spark performance by addressing core bottlenecks like data shuffling, join inefficiencies, and excessive data scanning.
March 3, 2026
by Anurag Malik
· 987 Views
article thumbnail
Unified Intelligence: Mastering the Azure Databricks and Azure Machine Learning Integration
Bridge the gap between Big Data and production ML. Learn to integrate Azure Databricks with Azure Machine Learning for a seamless, scalable end-to-end MLOps workflow.
February 27, 2026
by Jubin Abhishek Soni DZone Core CORE
· 1,278 Views
article thumbnail
The Hidden Cost of Custom Logic: A Performance Showdown in Apache Spark
A deep dive into PySpark UDF performance, showing why standard Python UDFs slow pipelines and when to use Pandas UDFs or native Spark functions instead.
February 26, 2026
by Abhilash Rao Mesala
· 1,658 Views
article thumbnail
AWS SageMaker HyperPod: Distributed Training for Foundation Models at Scale
Master distributed training at scale with AWS SageMaker HyperPod's resilient cluster management and high-performance interconnects.
February 19, 2026
by Jubin Abhishek Soni DZone Core CORE
· 1,449 Views
article thumbnail
Green AI in Practice: How I Track GPU Hours, Energy, CO₂, and Cost for Every ML Experiment
Practice Green AI by tracking GPU-hours, energy, and cost for every ML run, so you pick models that are not just accurate, but also cheaper, leaner, and greener.
February 13, 2026
by Sai Teja Erukude
· 2,548 Views · 2 Likes
article thumbnail
A Pattern for Intelligent Ticket Routing in ITSM
Manual ticket routing is a hidden tax on IT efficiency. Here is an architectural pattern for using Logistic Regression and Skype status APIs to automate this.
February 10, 2026
by Dippu Kumar Singh
· 1,135 Views
article thumbnail
Model Context Protocol Vs Agent2Agent: Practical Integration with Enterprise Data
MCP is production-ready for LLM-to-tool integration; A2A enables emerging multi-agent collaboration. They complement, not compete, and neither replaces Spark or Airflow.
February 9, 2026
by Ram Ghadiyaram DZone Core CORE
· 1,450 Views · 1 Like
article thumbnail
How Global Payment Processors like Stripe and PayPal Use Apache Kafka and Flink to Scale
How top payment processor companies like Stripe, PayPal, Payoneer, and Worldline use data streaming for real-time payments and fraud detection.
February 3, 2026
by Kai Wähner DZone Core CORE
· 1,542 Views · 3 Likes
article thumbnail
Building an OCR Data Pipeline: From Unstructured Images to Structured Data
How to treat OCR text as just another data source — build a repeatable ingestion, transformation, and validation workflow for unstructured data.
January 28, 2026
by Punitha Ponnuraj
· 2,904 Views · 1 Like
article thumbnail
Efficient Sampling Approach for Large Datasets
In this article, we will learn about the central limit theorem and how it helps with random sampling in big-data-related problems.
January 22, 2026
by Rajesh Vakkalagadda
· 1,152 Views
article thumbnail
MERGE and Liquid Clustering: Common Performance Issues
A practical look at common pitfalls and performance challenges when using MERGE operations on liquid-clustered Delta tables, and how to avoid them.
January 21, 2026
by Avi Yehuda
· 1,624 Views
article thumbnail
Parallel S3 Writes for Massive Sparse DataFrames: How to Maintain Row Order Without Blowing Memory
Learn how to write massive sparse Pandas DataFrames to S3 without OOM errors by using Spark to parallelize index-based chunks while preserving row order.
January 16, 2026
by pooja chhabra
· 1,508 Views · 2 Likes
article thumbnail
DevSecOps for MLOps: Securing the Full Machine Learning Lifecycle
Why ML systems are uniquely vulnerable to security attacks — and how MLSecOps closes the gaps in data, models, and pipelines.
January 15, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 2,026 Views · 2 Likes
article thumbnail
Apache Spark 4.0: What’s New for Data Engineers and ML Developers
Spark 4.0 brings Spark Connect, enhanced SQL (PIPE, VARIANT), richer Python APIs, and advanced streaming — modernizing Spark for faster, more flexible 2025 workloads.
January 12, 2026
by harshraj bhoite
· 2,084 Views
article thumbnail
Serverless Spark Isn't Always the Answer: A Case Study
Processing 500M+ records with 100 concurrent users under a 5-minute SLA demands smart architecture. We evaluate seven compute models and why hybrid approaches often win.
January 12, 2026
by Janani Annur Thiruvengadam DZone Core CORE
· 1,549 Views · 1 Like
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×