Big Data Resources

Navigating the Regulatory Maze: Simplifying Data Compliance

Explore how IT professionals can address evolving regulatory compliance challenges in data management, reducing organizational risks and costs.

September 20, 2024

by Tom Smith

CORE

· 3,688 Views · 1 Like

Kafka Message Testing

Utilizing tools like RecordCaptor, as well as adhering to isolation principles and clear separation of test stages, ensures high accuracy and efficiency.

September 18, 2024

by Anton Belyaev

· 4,123 Views · 2 Likes

Setting Up Secure Data Lakes for Starlight Financial: A Guide to AWS Implementation

This guide delves into securing financial data lakes with AWS services, focusing on best practices for data protection and compliance.

September 13, 2024

by Harsh Daiya

CORE

· 4,783 Views · 1 Like

Optimizing Data Management for AI Success: Industry Insights and Best Practices

Explore key strategies for effective data management in AI projects, including real-time access, federated queries, and data literacy for developers and engineers.

September 11, 2024

by Tom Smith

CORE

· 4,456 Views · 1 Like

The Significance of Complex Event Processing (CEP) With RisingWave for Delivering Accurate Business Decisions

Learn more about CEP, how it addresses a key challenge in real-time processing by detecting patterns in data streams, and compare FlinkCEP and RisingWave.

September 11, 2024

by Gautam Goswami

CORE

· 4,634 Views · 1 Like

Real-Time GenAI With RAG Using Apache Kafka and Flink to Prevent Hallucinations

Learn about context-specific real-time Generative AI (GenAI) with Retrieval Augmentation Generation (RAG) using Kafka and Flink to prevent hallucinations.

September 11, 2024

by Kai Wähner

CORE

· 3,547 Views · 1 Like

Exploring Real-Time Data Ingestion Into Snowflake Using CockroachDB, Redpanda, and Kafka Connect

Explore Kafka Connect as a solution to stream changefeeds into Snowflake for greater control over how messages are delivered to Snowflake.

September 10, 2024

by Artem Ervits

CORE

· 5,315 Views · 1 Like

Accelerate Your Journey to a Modern Data Platform Using Coalesce

This article will identify key challenges organizations face today in managing data platforms, and explore how advanced ETL tools can address these challenges.

September 10, 2024

by Asia Banu Shaik

· 4,778 Views · 5 Likes

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

This article compares the performance and cost efficiency of three storage formats Parquet, Avro, and ORC on Google Cloud Platform.

September 9, 2024

by Rahul Sarabu

· 5,155 Views · 4 Likes

Principles of Modern Data Infrastructure

Explore principles of modern data infrastructure such as scalability, high availability, speed, security, maintainability, efficiency, and developer experience.

September 6, 2024

by Joe Zhou

· 4,991 Views · 1 Like

Keeping Two Multi-Master Databases Aligned With a Vector Clock

In this article, learn about an experience in keeping two different databases aligned with two different technologies by using an application-level solution.

September 5, 2024

by Claudio Guidi

CORE

· 5,866 Views · 4 Likes

Setting Up a Data Warehouse for Starlight: A Comprehensive Guide

Learn architectural considerations, essential tools, and technologies, and see sample code snippets to illustrate key steps of a data warehouse setup.

September 5, 2024

by Harsh Daiya

CORE

· 3,788 Views · 1 Like

How to Conduct Effective Data Security Audits for Big Data Systems

Learn key strategies for conducting thorough data security audits in big data systems to safeguard sensitive information.

September 4, 2024

by Devin Partida

· 4,917 Views · 1 Like

MLOps: How to Build a Toolkit to Boost AI Project Performance

AI projects could end up among the 90% that fail due to common implementation pitfalls. Here, learn how to change the game with the right MLOps tools.

September 3, 2024

by Alexander Simonov

· 5,632 Views · 2 Likes

Open Standards for Data Lineage: OpenLineage for Batch and Streaming

Explores trends and efforts to provide an open standard with OpenLineage, and how data governance solutions help fulfill enterprise-wide data governance needs.

September 3, 2024

by Kai Wähner

CORE

· 5,846 Views · 1 Like

Confusion Matrix vs. ROC Curve: When to Use Which for Model Evaluation

The Confusion Matrix and the ROC Curve evaluate model performance in machine learning and data science. Compare and learn when to use each in model evaluation.

September 3, 2024

by Fizza Jatniwala

· 4,594 Views · 1 Like

Telemetry Pipelines Workshop: Integrating Fluent Bit With OpenTelemetry, Part 2

Take a look at integrating Fluent Bit with OpenTelemetry for use cases where this connectivity in your telemetry data infrastructure is essential.

September 2, 2024

by Eric D. Schabell

CORE

· 6,218 Views · 1 Like

Building Product to Learn AI, Part 3: Taste Testing and Evaluating

Before our AI meal planner is complete, one crucial step remains: a taste test. This is where a robust evaluation framework becomes your most valuable tool.

September 2, 2024

by Indrajit Bhattacharya

· 2,557 Views · 2 Likes

Best Practices for Salesforce Data Management and Security

Are you doing enough to ensure data security while adhering to data hygiene best practices? Here, learn Salesforce data management and security best practices.

September 2, 2024

by Ilya Dudkin

CORE

· 3,099 Views · 1 Like

High Fidelity Data: Balancing Privacy and Usage

Explore High Fidelity (HiFi) Data features such as visual, population, statistical, and ownership integrity, and its use to balance privacy and usability.

August 29, 2024

by Zilong Tang

· 5,608 Views · 4 Likes

The Latest Big Data Topics