A Simple Search by Image Engine in F#
Join the DZone community and get the full member experience.
Join For Freelast week i was playing with a photomosaic composer toy project and needed a simple search by image engine. by search by image i mean searching a database for an image similar to a given one. in this tutorial i will show you how you can implement this functionality –with some obvious limitations– in an extremely simple way just by looking at an image’s color distribution.
if you are looking for feature similarity (shapes, patterns, etc.) you most likely need edge detection algorithms ( linear filters or other similar methods ), which give excellent results but are usually quite complicated. i suppose that’s the way most image search engines work. alternatively this paper describes the sophisticated color-based approach used by google’s skin-detection engine.
in many cases however, finding images with a perceptually similar color distribution can be enough.
if
you are in this situation, you may get away with a very simple
technique that still gives pretty good results with a minimal
implementation effort. the technique is long known and widely used, but
if you have no experience in image processing this step-by-step guide
may be a fun and painless warm-up to the topic.
i’ll show the concept with the help of f# code, but the approach is so straightforward that you should understand it even without prior knowledge of the language.
tl;dr:
this is the high level outline of the process.
just once, to build a database “index”:
- create a normalized 8-bit color distribution histogram of each image in the database.
for every query:
- create a normalized 8-bit color distribution histogram of the query image.
- search the database for the histogram closest to the query using some probability distribution distance function.
if you are still interested in the details of each step, please read on.
extracting an image’s color signature
given that we want to compare images, we’ll have to transform them into something that can be easily compared. we could just compute the average color of all pixels in an image, but this is not very useful in practice. instead, we will use a color histogram , i.e. we will count the number of pixels of each possible color.
a color histogram is created in four steps:
- load/decode the image into an array of pixels.
- downsample the pixels to 8-bit “truecolor” in order to reduce the color space to 256 distinct colors.
- count the number of pixels for each given color.
- normalize the histogram (to allow the comparison of images with different size).
1. loading the image
this is almost trivial in most languages/framework. here’s the f# code using the system.windows.media apis:
open system.windows.media open system.windows.media.imaging /// returns the pixels of an image as a byte array, /// in form [alpha, b, g, r, alpha, b, g, r, ...] let getpixels imgpath = let s = new bitmapimage (new system.uri (imgpath)) let source = if s.format <> pixelformats.bgr32 then new formatconvertedbitmap (s, pixelformats.bgr32, null, 0.) :> bitmapsource else s :> bitmapsource let width = source.pixelwidth let height = source.pixelheight let pixels = array.create (width * height * 4) 0uy source.copypixels (pixels, width * 4, 0) pixels
2. downsampling 32-bit color to 8-bit
with the help of some basic bitwise operations we reduce pixels from 32 bits down to 8. we discard the alpha channel and keep 2 bits for blue (out of the original 8), 3 for red and 3 for green (we discard the least significant bits of each color component). the result is that each pixel (being a byte) can represent one of exactly 256 colors. we obviously loose some color detail because we cannot represent all the original gradients, but having a smaller color space keeps the histogram size manageable.
/// combines the given r, g, b values (8 bits each) /// into a single byte. let to8bpp red green blue = let b = blue >>> 6 let g = green >>> 5 let r = red >>> 5 0uy ||| (r <<< 5) ||| (g <<< 2) ||| b /// converts 32-bits abgr to 8-bits "truecolor": 2 bits for blue, /// 3 for green, 3 for red (rgb). /// expects an array in form [ alpha, b, g, r, alpha, b, g, r, ... ] /// returns an array of pixels where each pixel is a byte (rgb). let to8bit px32bpp = [| for i in 0..4..((array.length px32bpp) - 4) -> to8bpp px32bpp.[i + 2] px32bpp.[i + 1] px32bpp.[i] |]
note: in general 8-bit images use a palette, i.e. every pixel value is a pointer to a color in a 256-color palette. that way the palette can be optimized to only include the most frequent color in the image. in our case the benefit would not be worth the trouble as we would need a common palette across all the images anyways (plus the above method is faster and simpler).
3. creating the histogram
nothing special here: we just count the number of pixels that are of a
given color. the histogram is nothing more than a 256-elements array of
integers (plus the image file name). you can read it like “this image
has 23 “light green” pixels, 10 “dark red” pixels, etc.”
we then
normalize the histogram by dividing each value by the total number of
pixels so that each color amount is a float value in the 0 .. 1 range,
where for ex. 0.3 means that a picture has 30% of pixels of that given
color.
type histogram = { data : float array filename : string } let makehistogram (filename : string) = // gets the image pixels let pixels = getpixels filename |> to8bit // creates an empty histogram let histogram = array.create 256 0. let pixelcount = array.length pixels // counts the number of occurrences of every color for i in 0..pixelcount - 1 do let color = int pixels.[i] histogram.[color] <- histogram.[color] + 1. // normalizes the histogram let normalized = [| for i in 0..histogram.length - 1 -> system.math.round(histogram.[i] / float pixelcount, 4) |] // returns the image "signature" { histogram.data = normalized, filename = filename }
comparing color histograms
now we have a collection of histograms (the database) and a query histogram. in order to find the best matching image, we need a way to measure how similar two histograms are. in other words we need a distance function that quantifies the similarity between two histograms (and thus between two images).
you probably have noticed that a normalized histogram is in fact a
discrete probability distribution. every value is between 0 and 1 and
the sum of all values is 1. this means we can use statistical “
goodness of fit
” tests to measure the distance between two histograms. for example the
chi-squared test
is one of those. we are going to use a slight variation of it, called
quadratic-form distance.
it is pretty effective in our case because it reduces the importance of differences between large peaks.
the test is defined as follows (p and q are the two histograms we are comparing):
the implementation is straightforward:
let quadformdistance (hist1 : float array) (hist2 : float array) = array.map2 (fun pi qi -> if pi = 0. && qi = 0. then 0. else 0.5 * (pown (pi - qi) 2) / (pi + qi)) hist1 hist2 |> array.sum
the more two histograms are different, the larger is the return value of this test. the test returns 0 for two identical histograms.
a more sophisticated option is the jensen-shannon divergence , that is a a smoothed version of the kullback-leibler divergence . while being more complicated, it has the interesting property that its square root is a metric , i.e. it defines a metric space (in layman’s terms, a space where the distance between two points can be measured, and where the distance a → b → c cannot be shorter than the direct distance a → b). this property is going to be useful in the next post when we’ll optimize our search algorithm.
the kullback-leibler and jensen-shannon divergences are defined as:
d i s t k l ( p , q ) = ∑ n i = 0 p i l n p i q id i s t j s ( p , q ) = 1 2 d i s t k l ( p , 1 2 ( p + q ) ) + 1 2 d i s t k l ( q , 1 2 ( p + q ) )
this is the corresponding f# code:
let kullbackleiblerdist p q = array.map2 (fun pi qi -> if pi = 0. then 0. else pi * log (pi / qi)) p q |> array.sum let jensenshannondist p q = let m = array.map2 (fun pi qi -> (pi + qi) / 2.) p q (kullbackleiblerdist p m) / 2. + (kullbackleiblerdist q m) / 2. let distance p q = sqrt (jensenshannondist p q)
this paper includes an interesting comparison of various distance functions.
at this point our problem is almost solved. all we have to do is iterating through all the samples measuring the distance between query and sample and selecting the histogram with the smallest distance:
/// returns the histogram closest to the given one. let nearestneighbor sample samples = samples |> seq.map (fun s -> (s.filename, distance s.data sample.data)) |> seq.sortby (fun (fn, dist) -> dist) |> seq.head |> fst
notice that i use head because i’m only interested in the best matching item. i could truncate the list at any given length to obtain n matching items in order of relevance.
optimizing the search
maybe you’ve noticed one detail: for every query, we need to walk the full database computing the distance function between our query histogram and each image. not very smart. if the database contains billions of images that’s not going to be a very fast search. also if we perform a large number of queries in a short time we are going to be in trouble.
if you expected a quick and easy answer to this issue i’m going to disappoint you. however, the good news is that this problem is very interesting, much more so than it may look at first sight. this will be the topic of the next post, where i’ll write about vp-trees, bk-trees, and locality-sensitive hashing.
grab the source
the complete f# source of this tutorial is available on github (a whopping 132-lines of code).
thanks to my brother lorenzo for the review and feedback.
Published at DZone with permission of Francesco De Vittori, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments