Data Resources

Data Governance Essentials: Policies and Procedures (Part 6)

Learn how data quality, policies, and procedures strengthen data governance by ensuring accuracy, compliance, and security for better decision-making.

February 4, 2025

by Sukanya Konatam

· 3,492 Views · 2 Likes

Processing Cloud Data With DuckDB And AWS S3

DuckDB's ability to read data directly from cloud storage, such as AWS S3, makes it particularly powerful for modern data architectures.

February 4, 2025

by Anil Kumar Moka

· 6,150 Views

SOC 2 Made Simple: Your Guide to Certification

This guide breaks down the SOC 2 Type 2 certification process into practical steps, from preparation to the audit, with some tips and tools to make the journey smoother.

February 4, 2025

by Roman Misyurin

· 3,261 Views · 3 Likes

Best Practices for API Rate Limits and Quotas

Rate limits protect your infrastructure, and quotas help you monetize your APIs. Both are key parts of a healthy API strategy.

February 3, 2025

by Derric Gilling

CORE

· 3,008 Views · 1 Like

All You Need to Know About Apache Spark

Apache Spark is a fast, open-source cluster computing framework for big data, supporting ML, SQL, and streaming. It’s scalable, efficient, and widely used.

February 3, 2025

by Abhishek Trehan

· 2,664 Views · 1 Like

90% Cost Reduction With Prefix Caching for LLMs

Up to 70% of prompts in LLM applications are repetitive. Prefix caching can reduce inference costs by up to 90%, thus optimizing performance and saving money.

February 3, 2025

by Mahak Shah

· 2,143 Views

Pydantic: Simplifying Data Validation in Python

Pydantic is a powerful Python library that uses type annotations to validate data structures. Learn about the powerful features of Pydantic with code examples.

February 3, 2025

by Vidyasagar (Sarath Chandra) Machupalli FBCS

CORE

· 2,104 Views · 5 Likes

The Quest for HA and DR in Loki

Minimize data loss and business disruption by implementing high availability and configuring disaster recovery for Loki with AWS S3 as the object store.

January 31, 2025

by Pavan N G

· 3,549 Views · 1 Like

Creating a Service for Sensitive Data With Spring and Redis

In some cases, one cannot store user-sensitive data permanently. Let's create a simple application that handles sensitive data leveraging Spring and Redis.

January 31, 2025

by Alexander Rumyantsev

· 3,844 Views · 1 Like

Building a Machine Learning Pipeline Using PySpark

This article discusses building an efficient ML pipeline with PySpark, covering data loading, preprocessing, model training, and evaluation for large datasets.

January 30, 2025

by Abhishek Trehan

· 3,617 Views · 1 Like

Bridging Graphviz and Cytoscape.js for Interactive Graphs

Making Graphviz static digraphs interactive and compatible with Cytoscape by converting DOT format graphs into Cytoscape JSON using Python.

January 30, 2025

by Puneet Malhotra

· 2,696 Views · 1 Like

SmartXML: An Alternative to XPath for Complex XML Files

We'll discuss SmartXML, an XPath alternative for parsing complex XML files, converting them to SQL, and loading the results into a database seamlessly.

January 30, 2025

by Luca Sanders

· 3,053 Views · 2 Likes

Scaling Read Your Own Writes Consistency

This article is intended for distributed systems practitioners looking to understand and implement Read Your Own Writes consistency in production environments.

January 30, 2025

by Ganapathy Subramanian Ramachandran

· 2,280 Views

Metal and the Simulated Annealing Algorithm

The Simulated Annealing algorithm described in this article demonstrates its effectiveness as a powerful tool for finding optimal solutions to complex problems.

January 29, 2025

by Vitaly Kuznetsov (Ippolitov)

· 2,387 Views · 2 Likes

Scrape Amazon Product Reviews With Python

Let's learn how we can implement Python and Python scripts to scrape the Amazon website in an ethical way to extract product review data.

January 29, 2025

by Juveria dalvi

· 2,080 Views · 2 Likes

Implement RAG With PGVector, LangChain4j, and Ollama

Apply vector search and RAG experiments to enhance query results and optimize data storage for text embeddings, specifically with Bruce Springsteen's album data.

January 28, 2025

by Gunter Rotsaert

CORE

· 2,204 Views · 2 Likes

How Apache Flink and Apache Paimon Influence Data Streaming

Apache Flink is a crucial component of Apache Paimon since it offers the real-time processing power that enhances Paimon's strong consistency and storage features.

January 28, 2025

by Gautam Goswami

CORE

· 5,030 Views · 3 Likes

Vector Storage, Indexing, and Search With MariaDB

Since MariaDB 11.7, you can store vectors for generative AI applications in a single database. Learn more about these new features.

January 28, 2025

by Alejandro Duarte

CORE

· 38,653 Views · 5 Likes

Get Started With Vector Search in Azure Cosmos DB

Learn how to enable and use vector search in Azure Cosmos DB for NoSQL with a step-by-step guide in Python, TypeScript, .NET, and Java using a movie dataset.

January 27, 2025

by Abhishek Gupta

CORE

· 3,554 Views · 2 Likes

Top Tools for Object Storage and Data Management

The best tools for object storage, including MinIO, Cyberduck, and more, to efficiently manage and store unstructured data in modern cloud environments.

January 27, 2025

by Vidyasagar (Sarath Chandra) Machupalli FBCS

CORE

· 3,517 Views · 5 Likes

The Latest Data Topics