Searching Images by Visual Similarity With the Clarifai API
It sure would be amazing if you could just show your computer a picture and say, “Find me images that look like this.” Wait... you CAN do that! And here’s how.
Join the DZone community and get the full member experience.Join For Free
When you’re searching for images, words are often not enough to find exactly what you need. Wouldn’t it be amazing if you could just show your computer a picture and say, “Find me images that look like this?” With the Clarifai API, you can search for any image by visual similarity. Here’s how!
The Clarifai Search API has a variety of different ways for you to query your inputs. In one of our previous posts, we talked about searching your content by geolocation. In this post, we will have an image do all of the talking. You won’t need to do any training on your dataset; simply upload images and then you can search over those images seamlessly.
For the first step, you need to have Python (version 3.6.2) installed for your appropriate operating system. You can head over to a terminal and write in
python --version. If you happen to have a version of Python 2 installed you may need to try
python3 --version. You should see a prompt come up reading:
Python 3.6.2. If neither of these work, you may need to check if Python is included in your
Many people who develop in Python suggest having a virtual environment. These help to manage any application-specific dependencies. Having one won't be necessary for this tutorial but if you are curious, you can read more about them in the Python documentation.
Adding Your Inputs
You will need to have a dataset of inputs to search against. These inputs get indexed on your application and Clarifai will be able to "see" the image. To a machine, an image is nothing but vectors that describe what each pixel looks like. When you perform a search using an image, Clarifai will look to see how close these vectors are to one another to determine if the images are visually similar. The visual similarity can be more effective if your dataset includes objects that are relevant to what you'd like to search against. If you were to add a dataset full of only food and try to search using an image of a dog your search results wouldn't be as strong.
Here's a dataset of images from ImageNet on food. You should see a file containing nothing but lines of URLs of images of food. Save this file as
food-data.txt in the same directory as your code then we will take this file and upload the images in batches.
# upload.py import os from clarifai.rest import ClarifaiApp from clarifai.rest import Image as ClImage app = ClarifaiApp(api_key='YOUR_API_KEY') FILE_NAME = 'food-data.txt' FILE_PATH = os.path.join(os.path.curdir, FILE_NAME) # Counter variables current_batch = 0 counter = 0 batch_size = 32 with open(FILE_PATH) as data_file: images = [url.strip() for url in data_file] row_count = len(images) print("Total number of images:", row_count) while(counter < row_count): print("Processing batch: #", (current_batch+1)) imageList =  for current_index in range(counter, counter+batch_size - 1): try: imageList.append(ClImage(url=images[current_index])) except IndexError: break app.inputs.bulk_create_images(imageList) counter = counter + batch_size current_batch = current_batch + 1
Wowza! Over one thousand images being uploaded effortlessly.
Searching Using an Image
The amazing part you've been waiting for is being able to search by visual similarity using an image. Let's say you wanted to find out if there is anything similar to this picture of cookies. All you would need from here is:
# search.py from clarifai.rest import ClarifaiApp app = ClarifaiApp(api_key='YOUR_API_KEY') # Search using a URL search = app.inputs.search_by_image(url='https://images-gmi-pmc.edge-generalmills.com/cbc3bd78-8797-4ac9-ae98-feafbd36aab7.jpg') for search_result in search: print("Score:", search_result.score, "| URL:", search_result.url)
search_by_image() function returns a list of
Image objects that wrap the response but we will have it print out the
score and the URL associated with it. You can also use image bytes or a filename to query against.
# Response Score: 0.8486366 | URL: http://farm4.static.flickr.com/3502/4000853007_0f1e33cdc0.jpg Score: 0.79205513 | URL: http://farm4.static.flickr.com/3213/3080197227_e4b28c76ae.jpg Score: 0.7901007 | URL: http://farm1.static.flickr.com/222/470272746_1674448c07.jpg Score: 0.741455 | URL: http://farm4.static.flickr.com/3317/3289848643_bf1f2e7b5b.jpg Score: 0.7173992 | URL: http://farm4.static.flickr.com/3620/3473362088_c90b72c819.jpg Score: 0.68365324 | URL: http://farm1.static.flickr.com/150/365771958_06e87421d1.jpg Score: 0.6734046 | URL: http://farm4.static.flickr.com/3077/3160541712_b879bf7a22.jpg Score: 0.6723133 | URL: http://farm4.static.flickr.com/3399/3185443954_26bf37dc8a.jpg Score: 0.66024935 | URL: http://farm3.static.flickr.com/2481/3943873688_f094d211a3.jpg Score: 0.6529919 | URL: http://farm3.static.flickr.com/2124/2239822705_419fffe609.jpg Score: 0.6473093 | URL: http://farm2.static.flickr.com/1319/540399285_142ae1822e.jpg Score: 0.63564813 | URL: http://farm3.static.flickr.com/2030/2310386017_8741472785.jpg Score: 0.6230167 | URL: http://farm4.static.flickr.com/3338/3501581193_be17c2d04e.jpg Score: 0.61543244 | URL: http://farm1.static.flickr.com/228/499181350_b01a280789.jpg Score: 0.61172754 | URL: http://farm4.static.flickr.com/3169/2802528597_0483e7aa39.jpg Score: 0.6057524 | URL: http://farm3.static.flickr.com/2111/2413962121_41b412c39c.jpg Score: 0.60092676 | URL: http://farm1.static.flickr.com/118/296736486_e721b93e82.jpg Score: 0.60023034 | URL: http://farm3.static.flickr.com/2264/2410275528_d7a69df963.jpg Score: 0.5992471 | URL: http://farm3.static.flickr.com/2317/2435454915_2947203717.jpg Score: 0.5968622 | URL: http://farm3.static.flickr.com/2664/3979835380_7748ddf164.jpg
The search response will show a value for score from 0 to 1. A score closer to 1 means the image is more visually similar; a score closer to 0 means the image is less visually similar. The response from Clarifai also defaults from the top 20 results. If you want to change, that you can add a different value for the
per_page parameter in the
You're probably thinking to yourself, "That's all it took?" And the answer is yes! The part that took the longest was getting the dataset. Otherwise, performing our search was only five lines of code. The Search API doesn't stop here:
Published at DZone with permission of Prince Wilson, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.