Big Data Resources

Data Lake vs. Data Warehouse vs. Data Lakehouse

The pros and cons of the legacy data warehouse, the more recent data lake, and contemporary data lakehouse architectures.

Updated March 21, 2025

by Noa Shavit

· 7,606 Views · 6 Likes

Patch Management in the Age of IoT: Challenges and Solutions

IoT patch management tackles risks via automation, lightweight patches, and centralized tools, ensuring security despite device variety and resource limits.

March 20, 2025

by andrew vereen

· 4,905 Views · 1 Like

Comparing DuckDB, Snowflake, and Databricks

In this article, we do an in-depth comparison of DuckDB, Snowflake, and Databricks to help you find the best data processing platform for your organization.

March 19, 2025

by Noa Shavit

· 4,611 Views · 1 Like

Building a Distributed Multi-Language Data Science System

The starter consists of hexagonal microservices (MERN monorepo, Spring Boot Camel, Flask), Gateway, Eureka, that communicate via REST, GraphQL, gRPC, and AMQP.

March 18, 2025

by Alexander Eleseev

CORE

· 4,068 Views · 3 Likes

Bridging Cloud and On-Premises Log Processing

This article emphasizes on the feasibility of ingesting logs from public cloud platforms into on-premises services by highlighting key points.

March 18, 2025

by Avinash Ibbandi

· 3,615 Views · 3 Likes

Best Practices for Data Warehouses in Microsoft Fabric

Leverage Microsoft Fabric for unified data warehousing; follow best practices for schema, ingestion, transformation, security, optimization, and continuous monitoring.

March 17, 2025

by Aravind Nuthalapati

CORE

· 25,093 Views · 4 Likes

ETL With Large Language Models: AI-Powered Data Processing

LLMs transform ETL with schema-less extraction, adaptive transformations, and multi-modal support, enabling scalable, efficient, and accessible data workflows.

March 10, 2025

by Suri (thammuio)

CORE

· 31,730 Views

IoT Communication Protocols for Efficient Device Integration

In this blog, we will explore the most widely used IoT protocols and the importance of selecting the right protocol for efficient device integration.

March 5, 2025

by Richard Kaplan

· 6,420 Views · 1 Like

Harnessing Real-Time Insights With Streaming SQL on Kafka

Streaming SQL enables real-time data processing and analytics on the fly, seamlessly querying Kafka topics for actionable insights without complex coding.

March 5, 2025

by Rama Krishna Panguluri

· 6,304 Views

AI Agents for Data Warehousing

AI agents are revolutionizing data warehousing by enhancing efficiency, accuracy, and automation across various aspects of data management today.

March 4, 2025

by Ajay Tanikonda

· 4,554 Views · 1 Like

Materialized Views in Data Stream Processing With RisingWave

Materialized views enhance data streaming by improving incremental computation, enabling efficient retrieval and calculation of aggregated or pre-processed data.

March 3, 2025

by Gautam Goswami

CORE

· 3,632 Views · 2 Likes

Modern Data Processing Libraries: Beyond Pandas

In this article, we explore the alternatives to pandas for data processing and data analysis. We'll compare and contrast based on performance.

March 3, 2025

by Vidyasagar (Sarath Chandra) Machupalli FBCS

CORE

· 6,536 Views · 6 Likes

Doris Lakehouse Integration: A New Approach to Data Analysis

Doris Lakehouse Integration bridges data lakes and warehouses and enables seamless access, faster queries, unified management, and greater data value.

February 28, 2025

by Darren Xu

· 6,278 Views · 3 Likes

Exploring IoT's Top WebRTC Use Cases

WebRTC can handle both high-quality media streaming and efficient data sharing, making it a versatile tool for device developers.

February 28, 2025

by Carsten Rhod Gregersen

· 3,820 Views · 1 Like

Modern ETL Architecture: dbt on Snowflake With Airflow

Build a scalable ETL pipeline with dbt, Snowflake, and Airflow, and address data engineering challenges with modular architecture, CI/CD, and best practices.

February 27, 2025

by Digvijay Waghela

· 6,266 Views · 2 Likes

Top Methods to Improve ETL Performance Using SSIS

Improve ETL performance in SSIS with parallel extraction, optimized transformations, and proper configuration of concurrency, batch sizes, and data types.

February 27, 2025

by DZone Editorial

· 5,672 Views · 1 Like

Cloud-Driven Analytics Solution Strategy in Healthcare

Detailed insights into compute resource management, cluster optimization, storage efficiency, and cost governance in cloud-based environments.

February 27, 2025

by Abrar Ahmed Syed

· 4,841 Views · 5 Likes

How to Scale Elasticsearch to Solve Your Scalability Issues

Scaling Elasticsearch requires balancing sharding, query performance, and memory tuning for optimal efficiency in high-traffic, real-time applications.

February 26, 2025

by Vivek Kumar

· 8,155 Views · 3 Likes

Spark Job Optimization

Spark jobs can be optimized to maximize resource utilization in a cluster, improving performance and reducing costs for large-scale data processing.

February 25, 2025

by Chandra Shekar r Chekuri

· 3,310 Views · 1 Like

The Future of Data Lakehouses: Apache Iceberg Explained

This blog post is the first in a three-part series exploring Apache Iceberg and its role in modern data architectures and the emergence of data lakehouses.

February 25, 2025

by Fawaz Ghali, PhD

CORE

· 4,121 Views · 5 Likes

The Latest Big Data Topics