DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
The Rise of Diskless Kafka: Rethinking Brokers, Storage, and the Kafka Protocol
Diskless Kafka stores all event data in object storage without using brokers for scalable and cost-efficient data streaming architectures.
January 9, 2026
by Kai Wähner DZone Core CORE
· 1,777 Views · 2 Likes
article thumbnail
Multi-Region Apache Kafka using Synchronous Replication for Disaster Recovery With Zero Data Loss (RPO=0)
Kafka isn’t one-size-fits-all. Choose between self-managed, serverless, or BYOC deployments. New RPO=0 options now enable zero data loss for real-time applications.
January 9, 2026
by Kai Wähner DZone Core CORE
· 1,614 Views · 2 Likes
article thumbnail
The Hidden Security Risks in ETL/ELT Pipelines for LLM-Enabled Organizations
As LLMs enter data pipelines, ETL/ELT becomes part of the AI security boundary, where untrusted inputs can introduce upstream risks.
January 7, 2026
by Vivek Venkatesan
· 3,381 Views · 2 Likes
article thumbnail
Solving the Cold Start Problem in Edge AI: A Guide to Data-Saving Learning
Update edge AI models efficiently using Mix Up and contribution sampling to overcome domain shift with minimal data, ensuring continuous evolution without forgetting.
January 6, 2026
by Dippu Kumar Singh
· 3,579 Views
article thumbnail
Metadata, Not Data Volume, Is the Real Bottleneck in Modern Data Lakes
In Apache Iceberg data lakes, growing snapshots and manifests often make metadata resolution — not data scanning — the primary performance bottleneck.
January 6, 2026
by Vivek Venkatesan
· 3,319 Views
article thumbnail
LLMs in Data Engineering: How Generative AI is Changing ETL and Analytics
LLMs reshape data engineering by automating ETL tasks, enabling natural language analytics, and empowering faster, smarter decision-making without replacing engineers.
January 1, 2026
by harshraj bhoite
· 2,783 Views · 1 Like
article thumbnail
Rethinking Cloud Compliance With an AI-Driven Approach
Learn how AI transforms cloud compliance with continuous monitoring, automated risk assessment, and intelligent data governance for secure operations.
December 30, 2025
by Atish Kumar Dash
· 1,837 Views · 2 Likes
article thumbnail
Data Modeling: From ERwin to the Cloud
Learn in this article how data modeling has evolved from ERwin to cloud-native tools, boosting efficiency, governance, and AI-driven schema design.
December 24, 2025
by Anisha Sagi
· 1,455 Views · 1 Like
article thumbnail
JavaScript Data Grid Comparison: 8 Popular Options Reviewed
I reviewed eight top JavaScript data grids and compared them by performance, customization, accessibility, cost, integration, and devX.
December 24, 2025
by Marina Chernyuk
· 4,028 Views · 3 Likes
article thumbnail
Implementing Automated Validation and Anomaly Detection
Ensure high-quality data in large-scale pipelines with automated validation, anomaly detection, and scalable frameworks that maintain accuracy and consistency.
December 23, 2025
by Venkataram Poosapati
· 1,805 Views · 1 Like
article thumbnail
Bridging the Gap Between Data Lakes and Warehouses
Data lakehouses combine the flexibility of data lakes with the reliability, performance, and governance features of data warehouses.
December 23, 2025
by Venkataram Poosapati
· 1,189 Views · 2 Likes
article thumbnail
Event-Driven Architecture's Dark Secret: Why 80% of Event Streams Are Wasted Resources
Your Kafka topics are bleeding money. Default retention, universal idempotency checks, and unmanaged DLQs waste 80% of event stream resources without anyone noticing.
December 16, 2025
by Dinesh Elumalai DZone Core CORE
· 3,212 Views · 8 Likes
article thumbnail
Building Cost-Efficient ETL with Apache Spark Structured Streaming
Smart tuning of Spark Structured Streaming — auto-scaling, checkpoint management, and efficient file formats — can cut ETL costs nearly in half while improving latency.
December 16, 2025
by harshraj bhoite
· 1,273 Views · 1 Like
article thumbnail
AI Data Storage: Challenges, Capabilities, and Comparative Analysis
Deep dive into the storage challenges in AI scenarios, critical storage capabilities, and comparative analysis of storage products.
December 15, 2025
by Rui Su
· 1,823 Views
article thumbnail
Streaming vs In-Memory DataWeave: Designing for 1M+ Records Without Crashing
MuleSoft’s default in-memory DataWeave can’t handle million-record files. Streaming solves this by processing data efficiently without OutOfMemory errors.
December 15, 2025
by Sree Harsha Meka
· 1,543 Views · 2 Likes
article thumbnail
Escaping the "Excel Trap": Building an AI-Assisted ETL Pipeline Without a Data Team
Escape Excel silos. Use GitHub Copilot to generate Python pipelines that transform static spreadsheets into dynamic dashboards without manual coding.
December 15, 2025
by Dippu Kumar Singh
· 1,867 Views
article thumbnail
Reproducibility as a Competitive Edge: Why Minimal Config Beats Complex Install Scripts
Complex install scripts create fragility, drift, and wasted hours. Reproducibility gives you a real competitive edge in speed, quality, and operational clarity.
December 9, 2025
by Con Hrisikos
· 1,514 Views · 1 Like
article thumbnail
How to Prevent Quality Failures in Enterprise Big Data Systems
Ensure reliable data pipelines with medallion architecture: Bronze, Silver, Gold layers catch quality issues early, preventing silent failures and bad decisions.
December 9, 2025
by Ram Ghadiyaram DZone Core CORE
· 1,509 Views · 1 Like
article thumbnail
Is TOON the Next Lightweight Hero in Event Stream Processing With Apache Kafka?
Majorly beneficial for LLM-specific pipelines, we can use TOON to ingest stream data into an Apache Kafka topic, as it's a compact, token-efficient serialization format.
November 28, 2025
by Gautam Goswami DZone Core CORE
· 4,013 Views · 1 Like
article thumbnail
AWS Airflow vs Step Functions: The Data Engineering Orchestration Dilemma
When you're building data pipelines in AWS, choosing between Managed Airflow and Step Functions isn't just a technical decision — it's a strategic one.
November 27, 2025
by Janani Annur Thiruvengadam DZone Core CORE
· 4,749 Views · 2 Likes
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×