DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • How Trustworthy Is Big Data?

Trending

  • Using Python Libraries in Java
  • Tired of Spring Overhead? Try Dropwizard for Your Next Java Microservice
  • How To Build Resilient Microservices Using Circuit Breakers and Retries: A Developer’s Guide To Surviving
  • Vibe Coding With GitHub Copilot: Optimizing API Performance in Fintech Microservices
  1. DZone
  2. Data Engineering
  3. Databases
  4. Navigating Vector Databases and Search Through the Prism of Colors

Navigating Vector Databases and Search Through the Prism of Colors

Vector index and vector search are widely available in databases. Understanding the core approach makes it simpler to choose and use it.

By 
Keshav Murthy user avatar
Keshav Murthy
DZone Core CORE ·
Oct. 09, 23 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
4.5K Views

Join the DZone community and get the full member experience.

Join For Free

Vector technology in AI, often referred to with implementations, vector indexes, and vector search, offers a robust mechanism index and query through high-dimensional data entities spanning images, text, audio, and video. Their prowess becomes evident across diverse spectrums like similarity-driven searches, multi-modal retrieval, dynamic recommendation engines, and platforms leveraging the Retrieval Augmented Generation (RAG) paradigm. Due to its potential impact on a multitude of use cases, vectors have emerged as a hot topic. As one delves deeper, attempting to demystify the essence of "what precisely is vector search?", they are often greeted by a barrage of terms — AI, LLM, generative AI — to name a few. This article aims to paint a clearer picture (quite literally) by likening the concept to something we all know: colors.

Infinite hues bloom,

A million shades dance and play,

Colors light our world.

Just the so-called "official colors" span across three long Wikipedia pages. While it's straightforward to store and search these colors by their names using conventional search indices like those in Elastic Search or Couchbase FTS, there's a hitch. Think about the colors Navy and Ocean. Intuitively, they feel closely related, evoking images of deep, serene waters. Yet, linguistically, they share no common ground. This is where traditional search engines hit a wall.

The typical workaround? Synonyms. You could map Navy to a plethora of related terms: blue, azure, ocean, turquoise, sky, and so on. But now, consider the gargantuan task of doing this for every color name. Moreover, these lists don't give us a measure of the closeness between colors. Is azure closer to the navy than the sky? A list won't tell you that. To put it simply, seeking similarities among colors is a daunting task. Trying to craft relationships between colors to gauge their similarity? Even more challenging.

The simple solution to this is the well-known RGB. Encoding colors in the RGB vector scheme solves both the similarity and distance problem.

When we talk about a color's RGB values, we're essentially referencing its coordinates in this 3D space where each dimension can have values ranging from 0 (zero) to 255, totaling 256 values. The vector (R, G, B) is defined by three components: the intensity of Red (R), the intensity of Green (G), and the intensity of Blue (B). Each of these components typically ranges from 0 to 255, allowing for over 16 million, 16777216 to be exact, unique combinations, each representing a distinct color. For instance, the vector (255, 0, 0) signifies the full intensity of red with no contributions from green or blue, resulting in the color red.

RGB

Here are sample RGB values for some colors:

  • Navy: (0, 0, 128)
  • Turquoise: (64, 224, 208)
  • Orange: (255, 165, 0)
  • Green: (0, 128, 0) 
  • Gray: (128, 128, 128)

The three values here can be seen as vectors representing a unique value in the color space containing 16777216 colors. Visualizing RGB values as vectors offers a profound advantage: the spatial proximity of two vectors gives a measure of color similarity. Colors that are close in appearance will have vectors that are close in the RGB space. This vector representation, therefore, not only provides a means to encode colors but also allows for an intuitive understanding of color relationships and similarities.

Similarity Searching

To find colors within an Euclidean distance of 1 from the color (148, 201, 44) in the RGB space, we vary each R, G, and B value by one up and one down to create the search space. This method will generate 3 x 3 x 3 = 27 color combinations but gives us a list of similar colors with specific distances. 

This is like identifying a small cube inside a larger RGB cube...

Identifying smaller cube inside larger RGB cube

Plain Text
 
(147, 200, 43), (147, 200, 44), (147, 200, 45)

(147, 201, 43), (147, 201, 44), (147, 201, 45)

(147, 202, 43), (147, 202, 44), (147, 202, 45)

(148, 200, 43), (148, 200, 44), (148, 200, 45)

(148, 201, 43), (148, 201, 44)  <- This is the original color, (148, 201, 45)

(148, 202, 43), (148, 202, 44), (148, 202, 45)

(149, 200, 43), (149, 200, 44), (149, 200, 45)

(149, 201, 43), (149, 201, 44), (149, 201, 45)

(149, 202, 43), (149, 202, 44), (149, 202, 45)


All these 27 colors are similar to our original colors (148, 201, 44). This principle can be expanded to various distances and multiple ways to calculate the distance. If we were to store, index, and search RGBs in a database, let's see how this is done.

Similarity search on colors via RGB model

Hopefully, this gave you a good understanding of how the RGB models the color schemes and solves the similarity search problem.

Let's replace the RGB model with an LLM model and input text and images about tennis. We then search for "French open." Even though the input text or image didn't include "French open" directly, the effect of the similarity search is that Djokovic and the two tennis images will still be returned! That's the magic of the LLM model and vector search.

Replace the RGB model with an LLM model and input text and images about tennis

Vector indexing and vector search follow the same path. RGB encodes the 16 million colors in 3 bytes. But, the real-world data is more complicated. Languages, images, and videos much. more complicated. Hence, the vector databases use not three, but 300 or 3000 or more dimensions to encode data. Because of this, we need novel methods to store, index, and do similarity searches efficiently. However, the core principle is the same. More on how vector indexing and searching is done in a future blog!

Database Indexing Service

Opinions expressed by DZone contributors are their own.

Related

  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • How Trustworthy Is Big Data?

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!