Deep learning is a phrase that follows us everywhere. Even the non-technical people are aware of how artificial intelligence can change our world. I totally agree with Andrew Ng that “AI is the new electricity” and that it can make our world better — especially that part where evil robots kill us all! (Just kidding... but for those “robots,” we must start from somewhere.) If a robot wants to kill humans, they must have eyes to see humans. And what if a robot wants to kill men but doesn’t want to hurt women? In that case, our robot must recognize whether the human is male or female. Next, should our robot spare young or old people? It must know who is old and who is young.
Alright, enough of jokes — let's get started with this.
In this blog, we will present our deep learning project that can locate facial attributes on an image and predict age and gender (exactly what our robot needs). The name of this project is Facelyzr, just like in my last blog where we presented Twitalyzr. I guess we are into some weird “lyzr” stuff… What's next, a foodlyzr? Who knows?
The idea behind Facelyzr is to take a picture with a face as input and give information about that face as output. For example:
For educational and research purposes, we used two datasets: CelebA and IMDB. This project was a great opportunity for me to learn TensorFlow and to dig deeper into deep learning. One solution for this task was to use an already pre-trained model for similar tasks and just train classifiers for our task. But I wanted to make something on my own from scratch using the idea of unsupervised pre-training. I decided to combine both unsupervised and supervised learning. I trained stacked denoising the convolutional autoencoder on images of faces, and then I attached classifiers on top of it.
Convolution Stacked Denoising Autoencoder
With autoencoder, our network can learn what is important by itself. It's like, Please don’t tell me how to do my job. Just give me those images, and I’ll see what can I do. So the autoencoder takes an image (or any other input data), finds some internal structure, and uses this internal structure to reconstruct the original input.
The building block of SDAE is an autoencoder. Each layer has its own autoencoder, and during the training time, we train layer after layer without propagating throughout the entire network. Basically, every layer will learn to reconstruct its input. To make our network more robust, we add noise to each layer input so that our network can learn to reconstruct uncorrupted input. The first convolution layer can learn to detect edges, colors, etc. The following convolution layers will use that feature to create more complex things such as patterns, textures, etc. Deeper layers can learn to detect noses, eyes, ears, etc. depending on what data neural network is trained for. One layer of the stacked denoising autoencoder is presented in the image below.
After training the autoencoder, we can throw away our reconstruction part because we don’t need it for making predictions. We only care about the first part of the network: the encoder. The role of the encoder is to be a feature extractor since it creates features for every input image. On the basis of those features, we can train any classifier or regressor that will learn to estimate age, for example, or gender, or both.
Our pre-trained model worked very well on gender prediction. We achieved 98% accuracy on the test set. We also trained our model to estimate age. That model achieved a root mean square error of 8 on the test set, so it still needs some tuning. Given these results for gender recognition, our robot killer would accidentally kill women in some cases. Sorry, ladies...
In this blog, I presented the idea of a project named Facelyzr, still under development, that uses both unsupervised and supervised deep learning to make predictions of facial features on an image. This model assumes that there is a face centered on the image. I plan to extend this model with face detection and face localization on images without using fancy libraries or pre-trained state-of-the-art models. After all, I plan to open-source all my work on this project, so stay tuned. Many thanks to Andrew Ng and Andrej Karpathy. They are really great instructors, and because of them, I stepped into the awesome world of neural networks.