DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
Bridging the Gap Between Data Lakes and Warehouses
Data lakehouses combine the flexibility of data lakes with the reliability, performance, and governance features of data warehouses.
December 23, 2025
by Venkataram Poosapati
· 1,157 Views · 2 Likes
article thumbnail
Event-Driven Architecture's Dark Secret: Why 80% of Event Streams Are Wasted Resources
Your Kafka topics are bleeding money. Default retention, universal idempotency checks, and unmanaged DLQs waste 80% of event stream resources without anyone noticing.
December 16, 2025
by Dinesh Elumalai DZone Core CORE
· 3,142 Views · 8 Likes
article thumbnail
Building Cost-Efficient ETL with Apache Spark Structured Streaming
Smart tuning of Spark Structured Streaming — auto-scaling, checkpoint management, and efficient file formats — can cut ETL costs nearly in half while improving latency.
December 16, 2025
by harshraj bhoite
· 1,226 Views · 1 Like
article thumbnail
AI Data Storage: Challenges, Capabilities, and Comparative Analysis
Deep dive into the storage challenges in AI scenarios, critical storage capabilities, and comparative analysis of storage products.
December 15, 2025
by Rui Su
· 1,745 Views
article thumbnail
Streaming vs In-Memory DataWeave: Designing for 1M+ Records Without Crashing
MuleSoft’s default in-memory DataWeave can’t handle million-record files. Streaming solves this by processing data efficiently without OutOfMemory errors.
December 15, 2025
by Sree Harsha Meka
· 1,485 Views · 2 Likes
article thumbnail
Escaping the "Excel Trap": Building an AI-Assisted ETL Pipeline Without a Data Team
Escape Excel silos. Use GitHub Copilot to generate Python pipelines that transform static spreadsheets into dynamic dashboards without manual coding.
December 15, 2025
by Dippu Kumar Singh
· 1,815 Views
article thumbnail
Reproducibility as a Competitive Edge: Why Minimal Config Beats Complex Install Scripts
Complex install scripts create fragility, drift, and wasted hours. Reproducibility gives you a real competitive edge in speed, quality, and operational clarity.
December 9, 2025
by Con Hrisikos
· 1,453 Views · 1 Like
article thumbnail
How to Prevent Quality Failures in Enterprise Big Data Systems
Ensure reliable data pipelines with medallion architecture: Bronze, Silver, Gold layers catch quality issues early, preventing silent failures and bad decisions.
December 9, 2025
by Ram Ghadiyaram DZone Core CORE
· 1,461 Views · 1 Like
article thumbnail
Is TOON the Next Lightweight Hero in Event Stream Processing With Apache Kafka?
Majorly beneficial for LLM-specific pipelines, we can use TOON to ingest stream data into an Apache Kafka topic, as it's a compact, token-efficient serialization format.
November 28, 2025
by Gautam Goswami DZone Core CORE
· 3,820 Views · 1 Like
article thumbnail
AWS Airflow vs Step Functions: The Data Engineering Orchestration Dilemma
When you're building data pipelines in AWS, choosing between Managed Airflow and Step Functions isn't just a technical decision — it's a strategic one.
November 27, 2025
by Janani Annur Thiruvengadam DZone Core CORE
· 4,505 Views · 2 Likes
article thumbnail
Optimizing Trino Performance With Materialized Views in a Data Lake
In this article, learn how Trino materialized views boosted our Iceberg-based data lake, improving real-time query speed, reducing load, and cutting costs.
November 27, 2025
by Mikhail Povolotskii
· 1,381 Views
article thumbnail
Revamping Real-Time Data Ingestion for Scalable Media Intelligence
AI-augmented microservices streaming architecture (Spring Boot, Kafka, Elasticsearch) to handle 8.64 million real-time media articles per day.
November 25, 2025
by Sindhukumar Rajakumaran
· 1,267 Views
article thumbnail
Advanced Usage of Decodable in Swift: Handling Dynamic Keys
Use DynamicKey to safely decode JSON with unpredictable keys — it avoids fragile if let chains and makes your decoding logic flexible and maintainable.
November 20, 2025
by Ada Kirsch
· 1,380 Views · 1 Like
article thumbnail
Iceberg Compaction and Fine-Grained Access Control: Performance Challenges and Solutions
Implementing fine-grained access control on Apache Iceberg can create major performance challenges. Learn how Glue, Redshift, and Athena handle FGAC at scale.
November 19, 2025
by Janani Annur Thiruvengadam DZone Core CORE
· 3,106 Views · 2 Likes
article thumbnail
Meta Data: How Data about Your Data is Optimal for AI
Metadata enhances AI performance by providing crucial context for models. Learn key benefits, implementation strategies, and real-world examples for smarter AI systems.
November 19, 2025
by Kevin Vu
· 1,354 Views
article thumbnail
Databricks vs Snowflake: Complete Architecture Mapping for Enterprise AI and Big Data
This guide maps core data, big data, and AI/ML concepts between Databricks and Snowflake, with examples, diagrams, and a framework for choosing or combining the two.
November 13, 2025
by Ram Ghadiyaram DZone Core CORE
· 6,932 Views · 7 Likes
article thumbnail
Event-Driven Architecture Patterns: Real-World Lessons From IoT Development
Real-world lessons on building event-driven systems for IoT and microservices — and model optimization with practical code examples.
November 10, 2025
by Bhavna Hirani
· 7,660 Views · 6 Likes
article thumbnail
Hudi vs. Delta vs. Iceberg: How to Choose the Right Lakehouse Table Format
Hudi excels at real-time upserts, Delta handles ACID workloads, and Iceberg supports large-scale analytics with flexible schemas.
November 10, 2025
by harshraj bhoite
· 3,436 Views · 1 Like
article thumbnail
Regression Analysis for Time Series Data: Models and Applications
Ready to use regression analysis for time series data? Explore how this method works in practice to effectively predict future outcomes and drive growth.
November 10, 2025
by Oleksandr Stefanovskyi
· 1,491 Views · 1 Like
article thumbnail
Master Production-Ready Big Data, Apache Spark Jobs in Databricks and Beyond: An Expert Guide
Refine Apache Spark performance in Databricks with strategies. Includes expert insights, PySpark examples, and diagrams for efficient data processing.
November 6, 2025
by Ram Ghadiyaram DZone Core CORE
· 1,710 Views · 1 Like
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×