Computer vision is an exciting and quickly growing set of data science technologies. It has a broad range of applications from industrial quality control to disease diagnosis. I have dabbled with a few different technologies that fall under this umbrella before, and I decided that it would be a worthwhile endeavor to rapid prototype an image recognition web application that used a neural network.
I used a deep learning framework called Caffe, created by the Berkeley Vision and Learning Center. There are several other comparable deep learning frameworks like Chainer, Theano, and Torch7 that were candidates, but I chose Caffe due to my previous experience with it. Caffe has a set of Python bindings, which is what I made use of for this project. If you’re interested in more theory behind deep learning and neural networks, I recommend this page by Michael Nielsen.
To begin, I installed all the Caffe dependencies onto an AWS t2.medium instance running Ubuntu 14.04 LTS. (Installation instructions for 14.04 LTS can be found here.) I elected to use CPU-only CUDA, because I’m not training my own neural network for this project. I obtained two pre-trained models from the BVLC Model Zoo, called GoogleNet and AlexNet. Both of these models were trained on ImageNet, which is a standard set of about 14 million images.
Now that I had all the prerequisites installed, I opened up the Exaptive IDE and started a fresh Xap (what we like to call web applications built in Exaptive). I started by creating a new Python component for writing the Caffe code necessary to identify an image. I named the new component “GoogleNet” after the neural net model I want to use first.
My new GoogleNet component in the IDE, ready for coding.
Then I wrote the Caffe code in Python.
First, we instantiate a caffe image classifier.
net = caffe.Classifier( reference_model, reference_pretrained, mean=imagenet_mean, channel_swap=(2, 1, 0), raw_scale=255, image_dims=(256, 256))
The reference_model is a filepath to a set of config options for the network. Caffe provides a stock model for this. The reference_pretrained is another filepath that points to the pretrained GoogleNet model from the model zoo.
We grab the input image filepath and use Caffe methods to load it.
image_file = inevents["image"] input_image = caffe.io.load_image(image_file)
Then we simply call predict on our image classifier with the input image as an argument.
output = net.predict([input_image]) predictions = output predicted_class_index = predictions.argmax()
Then we get the top three predictions for our image.
ind = np.argpartition(predictions, -3)[-3:]
Then make some nice html to return for the text component.
pretty_text = "<h3>GoogleNet:</h3>" for i in range(0, len(ind[np.argsort(predictions[ind])])): pretty_text += "#%d. %s (%2.1f%%) <br>" % ( i + 1, name_map[ind[np.argsort(predictions[ind])][2-i]], predictions[ind[np.argsort(predictions[ind])][2-i]] * 100) return pretty_text
Note that we’re grabbing the id and using a name_map, which corresponds to the image class’s imagenet ids. Then the pretty_text will be returned for the user.
The drop target will be used to hand an image to the neural net component.
The file drop target, ready to accept our images.
Three components later, we’re ready to identify some images.
At this point, the code was done. I added some HTML and inline styles, then I saved this Xap and opened it in another tab. Here is page when we load it.
Then we drag in a picture. I used a picture of a bunny and a kitten. The app processes for a few seconds and then I see:
It works! You can see the neural net's predictions (and their imagenet id numbers) along with the % certainty that the neural net gives us. So from here, we’ve laid the ground-work for plenty of other applications. Now we can use any pre-trained neural net model. For example, if a model existed for a life-sciences application, all we’d need to do is upload that model and the component we just wrote could point to it instead of the GoogleNet model and give us results from this web app.
To illustrate this, I added a second component that uses the AlexNet model such that I will get results for the same image from two separate neural net models that were trained on the same set of images.
The AlexNet component only differs from the GoogleNet by the model filepath we use in the code.
Running the same image through both neural nets, we now get:
All told, this process of writing the code and wiring up components took me just under an hour. As I wrote before, we can substitute any Caffe neural network model and use it through this basic Xap. From here, I think it’d be interesting to create a neural network training interface as a Xap. It would be helpful to have a nice front-end for training neural networks, from specifying the number of hidden layers, to the composition and configuration of those hidden layers and visualizing the test scores of the new models. Perhaps a followup blog post will be in order once that’s done.