DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Big Data Topics

article thumbnail
The Future of Data Lakehouses: Apache Iceberg Explained
This blog post is the first in a three-part series exploring Apache Iceberg and its role in modern data architectures and the emergence of data lakehouses.
February 25, 2025
by Fawaz Ghali, PhD DZone Core CORE
· 4,031 Views · 5 Likes
article thumbnail
The Hidden Cost of Dirty Data in AI Development
Dirty data weakens AI, increases costs, introduces bias, and causes compliance risks. Strong data governance ensures reliable AI outcomes.
February 25, 2025
by Ilya Dudkin DZone Core CORE
· 3,961 Views · 3 Likes
article thumbnail
Deduplication of Videos Using Fingerprints, CLIP Embeddings
Video deduplication optimizes storage by removing duplicates using techniques like segmentation, embeddings, and clustering to manage massive datasets efficiently.
February 21, 2025
by Praneeth Reddy Vatti
· 6,768 Views · 5 Likes
article thumbnail
Scaling Image Deduplication: Finding Needles in a Haystack
Learn to efficiently deduplicate 100M+ images using distributed architectures, embeddings, FAISS for ANN search, and clustering to ensure accurate results.
February 20, 2025
by Praneeth Reddy Vatti
· 5,654 Views · 3 Likes
article thumbnail
Data Pattern Automation With AI and Machine Learning
Pattern recognition and AI improve data workflows, automate insights, and drive efficiency in business processes across industries.
February 19, 2025
by Sandip Gami
· 4,030 Views · 2 Likes
article thumbnail
ETL Generation Using GenAI
Learn about how GenAI automates ETL pipelines, generates code, adapts to schema changes, and improves data processes with speed, efficiency, and precision.
February 14, 2025
by Ramesh Daddala
· 6,357 Views · 2 Likes
article thumbnail
Loading XML into MongoDB
Learn how to export XML data to MongoDB using SmartXML ETL tools, simplifying the process and ensuring efficient data handling and storage.
February 12, 2025
by Luca Sanders
· 6,657 Views · 1 Like
article thumbnail
The Right ETL Architecture for Multi-Source Data Integration
Dedicated ETL pipelines are easy to set up but hard to scale, while common pipelines offer efficiency at the cost of complexity. Know which one to choose.
February 12, 2025
by Murat Balkan DZone Core CORE
· 5,971 Views · 1 Like
article thumbnail
SQL as the Backbone of Big Data and AI Powerhouses
SQL powers Big Data and AI with tools like BigQuery, remaining a cornerstone of data-driven innovation through its simplicity and adaptability.
February 11, 2025
by Medha Gupta
· 3,436 Views · 1 Like
article thumbnail
Relational DB Migration to S3 Data Lake Via AWS DMS, Part I
This article discusses the challenges faced during relational database migration to AWS using DMS, including source data, logging, and network bandwidth issues.
February 7, 2025
by Vijay Bhosale
· 5,211 Views · 3 Likes
article thumbnail
Control Your Services With OTEL, Jaeger, and Prometheus
In this article, we will combine OTEL, Jaeger, and Prometheus for faster, centralized observability and quick troubleshooting in distributed systems.
February 7, 2025
by Ilia Ivankin
· 5,027 Views · 3 Likes
article thumbnail
Best Practices for Scaling Kafka-Based Workloads
Kafka is a famous technology with a lot of great features and capabilities. This article explains Kafka producer and consumer configurations best practices.
February 6, 2025
by Narendra Lakshmana gowda
· 5,825 Views · 1 Like
article thumbnail
The Role of DQ Checks in Data Pipelines
Why are DQ checks critical for every data pipeline, and what are some of the different types of DQ alerts you can set up to enhance the reliability of your pipeline?
February 5, 2025
by Ajay Krishnan Prabhakaran
· 5,386 Views
article thumbnail
Mastering the Transition: From Amazon EMR to EMR on EKS
Optimize your data processing by seamlessly transitioning from Amazon EMR to EMR on EKS. Discover best practices and tips for a smooth migration.
February 5, 2025
by Satrajit Basu DZone Core CORE
· 6,897 Views · 4 Likes
article thumbnail
AI Regulation in the U.S.: Navigating Post-EO 14110
The Trump administration revoked EO 14110, shifting the U.S. toward a market-driven AI strategy to spur innovation and investment.
February 5, 2025
by Frederic Jacquet DZone Core CORE
· 4,464 Views · 5 Likes
article thumbnail
Teradata Performance and Skew Prevention Tips
Use unique identifiers such as Teradata Primary Index to evenly distribute data across AMPs to improve database performance by 70%.
February 4, 2025
by Sudheer Kumar Lagisetty
· 4,192 Views · 10 Likes
article thumbnail
Data Governance Essentials: Policies and Procedures (Part 6)
Learn how data quality, policies, and procedures strengthen data governance by ensuring accuracy, compliance, and security for better decision-making.
February 4, 2025
by Sukanya Konatam
· 5,350 Views · 4 Likes
article thumbnail
All You Need to Know About Apache Spark
Apache Spark is a fast, open-source cluster computing framework for big data, supporting ML, SQL, and streaming. It’s scalable, efficient, and widely used.
February 3, 2025
by Abhishek Trehan
· 4,118 Views · 2 Likes
article thumbnail
How Apache Flink and Apache Paimon Influence Data Streaming
Apache Flink is a crucial component of Apache Paimon since it offers the real-time processing power that enhances Paimon's strong consistency and storage features.
January 28, 2025
by Gautam Goswami DZone Core CORE
· 5,759 Views · 4 Likes
article thumbnail
CubeFS: High-Performance Storage for Cloud-Native Apps
CubeFS is a CNCF-graduated, cloud-native distributed file system delivering scalable performance, fault tolerance, and smooth Kubernetes integration.
January 24, 2025
by Sai Sandeep Ogety DZone Core CORE
· 7,679 Views · 8 Likes
  • Previous
  • ...
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×