DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
5 Critical Databricks Performance Hacks That Most Engineers Miss (100x Faster Queries)
In this article, learn to boost Databricks' performance with six proven optimization strategies for UDFs, AQE, Delta Lake, broadcasts, and Photon acceleration.
November 3, 2025
by Ram Ghadiyaram DZone Core CORE
· 3,204 Views · 4 Likes
article thumbnail
Delta Lake 4.0 and Delta Kernel: What's New in the Future of Data Lakehouses
Delta Lake 4.0 pushes the lakehouse forward with flexible schemas, stronger transactions, remote, multi‑engine access, self‑optimizing performance, and AI‑ready storage.
November 3, 2025
by Sairamakrishna BuchiReddy Karri
· 3,249 Views · 2 Likes
article thumbnail
An Open-Source ChatGPT App Generator
Create fully functional ChatGPT apps using AI — no coding needed. Build and deploy interactive UI widgets directly inside ChatGPT in just minutes.
October 31, 2025
by Thomas Hansen DZone Core CORE
· 1,832 Views · 4 Likes
article thumbnail
When Coalesce Is Slower Than Repartition: A Spark Performance Paradox
In this article, learn why repartition() can outperform coalesce() in Apache Spark — and how Catalyst optimizer pushdown can throttle your job’s parallelism.
October 30, 2025
by Janani Annur Thiruvengadam DZone Core CORE
· 2,582 Views · 3 Likes
article thumbnail
Debugging a Spark Driver Out of Memory (OOM) Issue With Large JSON Data Processing
This article draws on real-world debugging experience and aims to provide insights into Spark's memory management challenges.
October 29, 2025
by Raju Ansari
· 2,281 Views · 2 Likes
article thumbnail
Unlocking Scalable Data Lakes: Building With Apache Iceberg, AWS Glue, and S3
Apache Iceberg + AWS Glue + S3 bring ACID, schema evolution, and time travel to data lakes—fixing schema drift, small files, and cost sprawl at enterprise scale.
October 28, 2025
by Vivek Venkatesan
· 3,259 Views · 1 Like
article thumbnail
Set Up Spring Data Elasticsearch With Basic Authentication
Guide to configure SSL communication with Elasticsearch via Spring Data Elasticsearch. Additionally, the communication is secured with BASIC authentication.
October 27, 2025
by Arnošt Havelka DZone Core CORE
· 2,046 Views
article thumbnail
Enterprise-Grade Document Intelligence: Cloud Big Data AI With YOLOv9 and Spark on AWS
Automate document analysis with YOLOv9, Apache Spark, and AWS. Boost speed, accuracy, and fraud detection across finance, healthcare, insurance, and more.
October 27, 2025
by Ram Ghadiyaram DZone Core CORE
· 2,453 Views · 4 Likes
article thumbnail
The Dark Side of Apache Iceberg’s Data Time Travel Feature
This article talks about the hidden aspects of the Apache Iceberg Time Travel Query feature. It also highlights how to address those hidden negative aspects.
October 22, 2025
by Pravin Dwiwedi
· 2,729 Views · 1 Like
article thumbnail
Building Scalable CRM Systems: Architecture Patterns and Data Modeling Strategies
A hands-on guide to building scalable CRM systems with the right architecture, data models, and performance and security strategies.
October 22, 2025
by Chitrapradha Ganesan
· 3,762 Views · 2 Likes
article thumbnail
A Fresh Look at Optimizing Apache Spark Programs
Optimize Spark jobs by tuning configurations, writing efficient code (Data Frames, broadcast joins), using optimized storage, and monitoring the Spark UI and logs.
October 14, 2025
by Nataraj Mocherla
· 2,933 Views · 2 Likes
article thumbnail
How Developers Use Synthetic Data to Stress-Test Models in Noisy Markets
Synthetic data lets quants stress-test equity strategies beyond noisy markets, preserving volatility, and building resilience before risking real capital.
October 14, 2025
by Jay Mehta
· 1,282 Views · 1 Like
article thumbnail
Operationalizing Responsible AI: Turning Ethics Into Engineering
This article will provide a direction on how to build a reliable AI system in production by incorporating bias mitigation strategies.
October 13, 2025
by Jofia Jose Prakash
· 2,113 Views · 2 Likes
article thumbnail
Apache Iceberg REST Catalog: The Key to Vendor-Agnostic Data Interoperability
In this article, I have demonstrated how Iceberg Data can be accessed through the Iceberg REST Catalog from Data Mesh with a simple Python application.
October 13, 2025
by Pravin Dwiwedi
· 2,500 Views · 1 Like
article thumbnail
Introduction to Spring Data Elasticsearch 5.5
Getting started with the latest version of Spring Data Elasticsearch 5.5 and Elasticsearch 8.18 as a NoSQL database for our data storage.
October 10, 2025
by Arnošt Havelka DZone Core CORE
· 2,942 Views · 2 Likes
article thumbnail
8 Challenges in Multimodal Training Data Creation
Creating high-quality multimodal training data is essential yet complex, involving challenges in synchronization, scalability, context capture, and tooling.
October 8, 2025
by Chirag Shivalker
· 2,871 Views
article thumbnail
7 AWS Services Every Data Engineer Should Master
In 2025, S3, Glue, Lambda, Athena, Redshift, EMR, and Kinesis form the core AWS toolkit for building fast, reliable, and scalable data pipelines.
October 6, 2025
by Sai Mounika Yedlapalli
· 4,138 Views · 3 Likes
article thumbnail
From Big Data to Agents: My Decade Building Systems
How a simple scraper, a few dashboards, and a lot of curiosity turned into agentic systems that actually ship value. A builder’s path.
October 3, 2025
by Nacho Corcuera
· 2,709 Views · 2 Likes
article thumbnail
Building a Scalable and Reliable Marketing Data Stack on GCP
A resilient marketing data stack on GCP leverages BigQuery, Pub/Sub, and Dataflow to deliver real-time insights, handle schema drift, and scale analytics.
October 2, 2025
by Shafeeq Ur Rahaman
· 1,305 Views · 2 Likes
article thumbnail
Salesforce Data Cloud: Setting Up and Using the Ingestion API
In this guide, learn to use Salesforce Data Cloud Ingestion API for real-time and bulk data ingestion to deliver accurate, personalized customer experiences.
October 2, 2025
by Ramesh Bellamkonda
· 3,487 Views · 2 Likes
  • Previous
  • ...
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×