Over a million developers have joined DZone.

A Simple Search by Image Engine in F#

·

Last week I was playing with a photomosaic composer toy project and needed a simple search by image engine. By search by image I mean searching a database for an image similar to a given one. In this tutorial I will show you how you can implement this functionality –with some obvious limitations– in an extremely simple way just by looking at an image’s color distribution.

If you are looking for feature similarity (shapes, patterns, etc.) you most likely need edge detection algorithms (linear filters or other similar methods), which give excellent results but are usually quite complicated. I suppose that’s the way most image search engines work. Alternatively this paper describes the sophisticated color-based approach used by Google’s skin-detection engine.

In many cases however, finding images with a perceptually similar color distribution can be enough.
If you are in this situation, you may get away with a very simple technique that still gives pretty good results with a minimal implementation effort. The technique is long known and widely used, but if you have no experience in image processing this step-by-step guide may be a fun and painless warm-up to the topic.

I’ll show the concept with the help of F# code, but the approach is so straightforward that you should understand it even without prior knowledge of the language.

TL;DR:

This is the high level outline of the process.

Just once, to build a database “index”:

  • Create a normalized 8-bit color distribution histogram of each image in the database.

For every query:

  • Create a normalized 8-bit color distribution histogram of the query image.
  • Search the database for the histogram closest to the query using some probability distribution distance function.

If you are still interested in the details of each step, please read on.

Extracting an image’s color signature

Given that we want to compare images, we’ll have to transform them into something that can be easily compared. We could just compute the average color of all pixels in an image, but this is not very useful in practice. Instead, we will use a color histogram, i.e. we will count the number of pixels of each possible color.

A color histogram is created in four steps:

  1. Load/decode the image into an array of pixels.
  2. Downsample the pixels to 8-bit “truecolor” in order to reduce the color space to 256 distinct colors.
  3. Count the number of pixels for each given color.
  4. Normalize the histogram (to allow the comparison of images with different size).

1. Loading the image

This is almost trivial in most languages/framework. Here’s the F# code using the System.Windows.Media APIs:

open System.Windows.Media
open System.Windows.Media.Imaging

/// Returns the pixels of an image as a byte array,
/// in form [Alpha, B, G, R, Alpha, B, G, R, ...]
let getPixels imgPath =
    let s = new BitmapImage (new System.Uri (imgPath))
    let source =
        if s.Format <> PixelFormats.Bgr32
        then new FormatConvertedBitmap (s, PixelFormats.Bgr32, null, 0.)
             :> BitmapSource
        else s :> BitmapSource

    let width = source.PixelWidth
    let height = source.PixelHeight    

    let pixels = Array.create (width * height * 4) 0uy
    source.CopyPixels (pixels, width * 4, 0)
    pixels

2. Downsampling 32-bit color to 8-bit

 

8bit

With the help of some basic bitwise operations we reduce pixels from 32 bits down to 8. We discard the alpha channel and keep 2 bits for blue (out of the original 8), 3 for red and 3 for green (we discard the least significant bits of each color component). The result is that each pixel (being a byte) can represent one of exactly 256 colors. We obviously loose some color detail because we cannot represent all the original gradients, but having a smaller color space keeps the histogram size manageable.

/// Combines the given R, G, B values (8 bits each)
/// into a single byte.
let to8bpp red green blue =
    let b = blue >>> 6
    let g = green >>> 5
    let r = red >>> 5
    0uy ||| (r <<< 5) ||| (g <<< 2) ||| b

/// Converts 32-bits ABGR to 8-bits "truecolor": 2 bits for blue,
/// 3 for green, 3 for red (RGB).
/// Expects an array in form [ Alpha, B, G, R, Alpha, B, G, R, ... ]
/// Returns an array of pixels where each pixel is a byte (RGB).
let to8bit px32bpp =
    [| for i in 0..4..((Array.length px32bpp) - 4) ->
        to8bpp px32bpp.[i + 2] px32bpp.[i + 1] px32bpp.[i]
    |]

Note: in general 8-bit images use a palette, i.e. every pixel value is a pointer to a color in a 256-color palette. That way the palette can be optimized to only include the most frequent color in the image. In our case the benefit would not be worth the trouble as we would need a common palette across all the images anyways (plus the above method is faster and simpler).

3. Creating the histogram

 

histogram

Nothing special here: we just count the number of pixels that are of a given color. The histogram is nothing more than a 256-elements array of integers (plus the image file name). You can read it like “this image has 23 “light green” pixels, 10 “dark red” pixels, etc.”
We then normalize the histogram by dividing each value by the total number of pixels so that each color amount is a float value in the 0 .. 1 range, where for ex. 0.3 means that a picture has 30% of pixels of that given color.

type Histogram = {
    Data : float array
    FileName : string
}

let makeHistogram (fileName : string) =
    // gets the image pixels
    let pixels = getPixels fileName |> to8bit

    // creates an empty histogram
    let histogram = Array.create 256 0.
    let pixelCount = Array.length pixels

    // counts the number of occurrences of every color
    for i in 0..pixelCount - 1 do
        let color = int pixels.[i]
        histogram.[color] <- histogram.[color] + 1.

    // normalizes the histogram
    let normalized = [| for i in 0..histogram.Length - 1 ->
                        System.Math.Round(histogram.[i] / float pixelCount, 4)
                     |]

    // returns the image "signature"
    { Histogram.Data = normalized, FileName = fileName }

Comparing color histograms

Now we have a collection of histograms (the database) and a query histogram. In order to find the best matching image, we need a way to measure how similar two histograms are. In other words we need a distance function that quantifies the similarity between two histograms (and thus between two images).

You probably have noticed that a normalized histogram is in fact a discrete probability distribution. Every value is between 0 and 1 and the sum of all values is 1. This means we can use statistical “goodness of fit” tests to measure the distance between two histograms. For example the chi-squared test is one of those. We are going to use a slight variation of it, called quadratic-form distance. It is pretty effective in our case because it reduces the importance of differences between large peaks.
The test is defined as follows (p and q are the two histograms we are comparing):

distQF(p,q)=12 ni=0(piqi)2pi+qi

 

the implementation is straightforward:

let quadFormDistance (hist1 : float array) (hist2 : float array) =
        Array.map2 (fun pi qi ->
                        if pi = 0. && qi = 0.
                        then 0.
                        else 0.5 * (pown (pi - qi) 2) / (pi + qi))
                   hist1
                   hist2
        |> Array.sum

The more two histograms are different, the larger is the return value of this test. The test returns 0 for two identical histograms.

A more sophisticated option is the Jensen-Shannon divergence, that is a  a smoothed version of the Kullback-Leibler divergence. While being more complicated, it has the interesting property that its square root is a metric, i.e. it defines a metric space (in layman’s terms, a space where the distance between two points can be measured, and where the distance A → B → C cannot be shorter than the direct distance A → B). This property is going to be useful in the next post when we’ll optimize our search algorithm.

The Kullback-Leibler and Jensen-Shannon divergences are defined as:

distKL(p,q)=ni=0pilnpiqi

 

distJS(p,q)=12distKL(p,12(p+q))+12distKL(q,12(p+q))

 

This is the corresponding F# code:

let KullbackLeiblerDist p q =
    Array.map2 (fun pi qi -> 
                    if pi = 0.
                    then 0.
                    else pi * log (pi / qi))
               p q
    |> Array.sum

let JensenShannonDist p q =
    let m = Array.map2 (fun pi qi -> (pi + qi) / 2.)
                       p q
    (KullbackLeiblerDist p m) / 2. + (KullbackLeiblerDist q m) / 2.

let distance p q =
    sqrt (JensenShannonDist p q)

This paper includes an interesting comparison of various distance functions.

At this point our problem is almost solved. All we have to do is iterating through all the samples measuring the distance between query and sample and selecting the histogram with the smallest distance:

/// Returns the histogram closest to the given one.
let nearestNeighbor sample samples =
    samples
    |> Seq.map (fun s -> (s.FileName, distance s.Data sample.Data))
    |> Seq.sortBy (fun (fn, dist) -> dist)
    |> Seq.head
    |> fst

Notice that I use head because I’m only interested in the best matching item. I could truncate the list at any given length to obtain n matching items in order of relevance.

Optimizing the search

Maybe you’ve noticed one detail: for every query, we need to walk the full database computing the distance function between our query histogram and each image. Not very smart. If  the database contains billions of images that’s not going to be a very fast search. Also if we perform a large number of queries in a short time we are going to be in trouble.

If you expected a quick and easy answer to this issue I’m going to disappoint you. However, the good news is that this problem is very interesting, much more so than it may look at first sight. This will be the topic of the next post, where I’ll write about VP-Trees, BK-Trees, and Locality-Sensitive Hashing.

Grab the source

The complete F# source of this tutorial is available on GitHub (a whopping 132-lines of code).

Thanks to my brother Lorenzo for the review and feedback.

Topics:

Published at DZone with permission of Francesco De Vittori, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}