DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
A Pattern for Intelligent Ticket Routing in ITSM
Manual ticket routing is a hidden tax on IT efficiency. Here is an architectural pattern for using Logistic Regression and Skype status APIs to automate this.
February 10, 2026
by Dippu Kumar Singh
· 1,023 Views
article thumbnail
Model Context Protocol Vs Agent2Agent: Practical Integration with Enterprise Data
MCP is production-ready for LLM-to-tool integration; A2A enables emerging multi-agent collaboration. They complement, not compete, and neither replaces Spark or Airflow.
February 9, 2026
by Ram Ghadiyaram DZone Core CORE
· 1,318 Views · 1 Like
article thumbnail
How Global Payment Processors like Stripe and PayPal Use Apache Kafka and Flink to Scale
How top payment processor companies like Stripe, PayPal, Payoneer, and Worldline use data streaming for real-time payments and fraud detection.
February 3, 2026
by Kai Wähner DZone Core CORE
· 1,489 Views · 3 Likes
article thumbnail
Building an OCR Data Pipeline: From Unstructured Images to Structured Data
How to treat OCR text as just another data source — build a repeatable ingestion, transformation, and validation workflow for unstructured data.
January 28, 2026
by Punitha Ponnuraj
· 2,798 Views · 1 Like
article thumbnail
Efficient Sampling Approach for Large Datasets
In this article, we will learn about the central limit theorem and how it helps with random sampling in big-data-related problems.
January 22, 2026
by Rajesh Vakkalagadda
· 1,095 Views
article thumbnail
MERGE and Liquid Clustering: Common Performance Issues
A practical look at common pitfalls and performance challenges when using MERGE operations on liquid-clustered Delta tables, and how to avoid them.
January 21, 2026
by Avi Yehuda
· 1,582 Views
article thumbnail
Parallel S3 Writes for Massive Sparse DataFrames: How to Maintain Row Order Without Blowing Memory
Learn how to write massive sparse Pandas DataFrames to S3 without OOM errors by using Spark to parallelize index-based chunks while preserving row order.
January 16, 2026
by pooja chhabra
· 1,464 Views · 2 Likes
article thumbnail
DevSecOps for MLOps: Securing the Full Machine Learning Lifecycle
Why ML systems are uniquely vulnerable to security attacks — and how MLSecOps closes the gaps in data, models, and pipelines.
January 15, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 1,940 Views · 2 Likes
article thumbnail
Apache Spark 4.0: What’s New for Data Engineers and ML Developers
Spark 4.0 brings Spark Connect, enhanced SQL (PIPE, VARIANT), richer Python APIs, and advanced streaming — modernizing Spark for faster, more flexible 2025 workloads.
January 12, 2026
by harshraj bhoite
· 2,009 Views
article thumbnail
Serverless Spark Isn't Always the Answer: A Case Study
Processing 500M+ records with 100 concurrent users under a 5-minute SLA demands smart architecture. We evaluate seven compute models and why hybrid approaches often win.
January 12, 2026
by Janani Annur Thiruvengadam DZone Core CORE
· 1,486 Views · 1 Like
article thumbnail
The Rise of Diskless Kafka: Rethinking Brokers, Storage, and the Kafka Protocol
Diskless Kafka stores all event data in object storage without using brokers for scalable and cost-efficient data streaming architectures.
January 9, 2026
by Kai Wähner DZone Core CORE
· 1,704 Views · 2 Likes
article thumbnail
Multi-Region Apache Kafka using Synchronous Replication for Disaster Recovery With Zero Data Loss (RPO=0)
Kafka isn’t one-size-fits-all. Choose between self-managed, serverless, or BYOC deployments. New RPO=0 options now enable zero data loss for real-time applications.
January 9, 2026
by Kai Wähner DZone Core CORE
· 1,564 Views · 2 Likes
article thumbnail
The Hidden Security Risks in ETL/ELT Pipelines for LLM-Enabled Organizations
As LLMs enter data pipelines, ETL/ELT becomes part of the AI security boundary, where untrusted inputs can introduce upstream risks.
January 7, 2026
by Vivek Venkatesan
· 3,327 Views · 2 Likes
article thumbnail
Solving the Cold Start Problem in Edge AI: A Guide to Data-Saving Learning
Update edge AI models efficiently using Mix Up and contribution sampling to overcome domain shift with minimal data, ensuring continuous evolution without forgetting.
January 6, 2026
by Dippu Kumar Singh
· 3,545 Views
article thumbnail
Metadata, Not Data Volume, Is the Real Bottleneck in Modern Data Lakes
In Apache Iceberg data lakes, growing snapshots and manifests often make metadata resolution — not data scanning — the primary performance bottleneck.
January 6, 2026
by Vivek Venkatesan
· 3,277 Views
article thumbnail
LLMs in Data Engineering: How Generative AI is Changing ETL and Analytics
LLMs reshape data engineering by automating ETL tasks, enabling natural language analytics, and empowering faster, smarter decision-making without replacing engineers.
January 1, 2026
by harshraj bhoite
· 2,642 Views · 1 Like
article thumbnail
Rethinking Cloud Compliance With an AI-Driven Approach
Learn how AI transforms cloud compliance with continuous monitoring, automated risk assessment, and intelligent data governance for secure operations.
December 30, 2025
by Atish Kumar Dash
· 1,767 Views · 2 Likes
article thumbnail
Data Modeling: From ERwin to the Cloud
Learn in this article how data modeling has evolved from ERwin to cloud-native tools, boosting efficiency, governance, and AI-driven schema design.
December 24, 2025
by Anisha Sagi
· 1,427 Views · 1 Like
article thumbnail
JavaScript Data Grid Comparison: 8 Popular Options Reviewed
I reviewed eight top JavaScript data grids and compared them by performance, customization, accessibility, cost, integration, and devX.
December 24, 2025
by Marina Chernyuk
· 3,828 Views · 3 Likes
article thumbnail
Implementing Automated Validation and Anomaly Detection
Ensure high-quality data in large-scale pipelines with automated validation, anomaly detection, and scalable frameworks that maintain accuracy and consistency.
December 23, 2025
by Venkataram Poosapati
· 1,728 Views · 1 Like
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×