Home-Made Java Face Recognition Application
In this post, we are going to develop a Java face recognition application using deeplearning4j.
Join the DZone community and get the full member experience.
Join For FreeIn this post, we are going to develop a Java face recognition application using deeplearning4j. The application is offering a GUI and flexibility to register new faces so feel free to try with your own images. Additionally, you can check out the free open source code as part of the PactPub video course Java Machine Learning for Computer Vision together with many new improvements to previous posts applications in Java.
Face Recognition Applications
Face recognition has always been an important problem to solve due to its sensitivity in regards to security and because it is closely related to peoples' identities. For many years, face recognition applications were well known, especially in criminology and searching for wanted persons with cameras and sometimes even using satellites. Nowadays, in the Deep Learning era, face recognition is widely found from simple applications like unlocking your phone, offering state of the art accuracy.
Lets first visit below the challenges related to the face recognition and then see how they are solved using Deep Learning techniques.
Face Recognition Challenges
Face Verification Problem
In previous posts, we have already seen Image Classification and Object Detection where we were concerned mostly with finding out if an image was representing a certain class like: is it a dog?is it a car? We also saw how to mark the classified object with a bounding box.
Now we are going one step further by uniquely identifying the objects.
So we're not just checking whether the image is a car or not, but we're additionally finding out if it is specifically my car, your car, or someone's car (for animal classification, we will need to find out if this is John's dog or Maria's dog rather than just a dog).
Face verification is not different, just that the logic is extended to the human face.
The question is not if it is simply a human or not but rather if it: is a person with identification X or is company employ with some identification number.
Face Recognition Problem
Then we have the face recognition problem where we need to do face verification for a group of people instead of just one; if a new person is any of the persons in a certain group.
Although face recognition and verification can be thought of as the same problem, the reason we treat it different is that face recognition can be much harder.
For instance, let's suppose we achieved a face verification accuracy of 98% to verify if a person is the one it claims, which maybe is not that bad if we apply that model with a 2% error rate to the face recognition with, let's say, 16 people. It obviously is not going to work well since the error is 2% on 16 persons (32% error rate).
So for face recognition to work well and have reasonable accuracy, taking also the sensitive nature of the problem, we will need something like 99.99% accuracy.
One Shoot Learning Problem
Usually, with face recognition, we have only one photo of each person to recognize, so the next challenge is related to the problem known as "one shoot learning problem."
Let's say that we want to recognize the employees as they come in. Usually, we really could have only one photo of each of the employees or maybe a few of them in the best case scenario.
With the knowledge and application we have seen so far, we can, of course, feed all these photos to a Neural Network to learn and then have the network predict classes for each of the employees. As much as it may sound intuitive, that will not really work well for the below reasons.
- Convolution architecture, seen so far, had great results, but they were trained with thousands of images of just one type to millions of images in total. So we have really few data available.
- Additionally, it will not scale well. For instance, what will happen if we have a new employee? We need to modify the Neural Network by adding a new class and then we need to train the network all over again. In a few words, each time we have a new employee, the network needs to be re-trained and modified.
Similarity Function
The high-level solution to the above problems is implemented through the similarity function. Instead of trying to learn to recognize specific persons faces as classes, what if we learn a function d, which measures how similar or different two images are?
d(face_1,face_2) ->degree of difference between face images
If the function would return a value smaller than a constant γ, we know that the images are quite similar. Otherwise, we know they are different.
Supposing that on the left we have the employees faces and on the right a person coming, now what will happen is that for each of the comparisons, we will have a number that will be big when the images are different and small when they are similar. So, for this case, we know that the person is the third employee in our group since he has the lowest number below e.x γ=0.8.
Additionally, this solution also scales well since a new person joining would mean just a new comparison to execute. We do not need to retrain since the Neural Network has learned a generic function to distinguish faces rather than specific faces.
The similarity function is just a high-level explanation of a solution, so let's see below two ways how it is implemented in practice.
Siamese Networks
We will still continue to use convolution architectures with many convolution layers and fully connected layers. With the exception that the last prediction layer (softmax layer) we will not be used or it will be cut.
We will feed the first image X1 to the network, then grab the last fully connected layer activations F(X1) and save in memory;
We will repeat the same for the second image X2 that we want to compare or the new coming employee. So now, we have the encoded activations for the second image F(X2) saved in memory.
Notice that the network here stays the same for both images. That's where the Siamese name comes in since we use the same network (or to cloned networks) executions for both of images and in practice, this happens in parallel.
Now the Neural Network for each iteration (through forward step and back-propagation ) will learn the function d as shown in the picture.
In a few words, this will be the goal of our learning to shift the difference accordingly to a small or large number depending if images are the same or different.
And only when encoded values are similar we will predict that two images are the same. Recalling from the previous section, this is exactly what is referred to as the similarity function. The d denotes the distance, so the distance between the activation of last layers of a very deep convolution network.
Triplet Loss
Triplet loss is another great way to solve the face recognition problem and the one we will use for our Java application. The name triplet comes from the fact that we use three images as just one training sample. Similarly, we will use the activations of last fully connected layer of some very deep Neural Network.
We are going to first choose the base image or the anchor image, which will be used as a comparison with two other images, and through a forward step, we get the activation of the last layer F(A).
Together with a different image but representing the same person called the positive image, we get the activation of last layer F(P). Recalling from our previous section, we want our similarity function d (A,P), so the difference between the anchor and positive image activations to be, in this case, as close to zero as possible since these images represent the same person after all.
Now, keeping the same anchor image, we are going to choose an image that represents a different person, so, a negative image F(N). Function d (A,N), the difference between the anchor and negative image activations, in this case, will be bigger than zero, so we want the difference to be big in order to emphasize the fact that this are different face images.
Triplet loss is explained in more details through diagrams on Java Machine Learning for Computer Vision by giving also a slightly more formal definition. Anyway, after some simple math steps, the combined formula for positive and negative case comparisons with anchor images looks like below:
ε is introduced to prevent a Neural Network from finding weights such that the distance between images for the negative and positive cases can be the same (therefore the difference will be zero and easily satisfy the condition). In this way, Neural Networks have to work harder to make sure that at least there is a minimum distance ε between a positive and negative case.
Iteration by iteration Neural Network will try to learn the above function in order to satisfy the equation. It will try to push the positive case difference (green equation) to lower values and try to push the negative case difference to larger values(red equation) by a difference value at least -ε (moving ε on the other side of equation).
Suggestions for Choosing triplets
Choosing triplets has a really big impact on how well and efficiently the network learns. So when we need to carefully choose the triplets following below guideline::
- When negative examples N are chosen randomly then the condition is easily satisfied:
- Choose Triplets that are hard to train. Positive cases (image of same face) that are as different as possible so d(A,P) will be big value and negative cases (different images) that are as similar to person face(anchor) so d(A,N) will be as low as possible.
- During the training, you still need a few positive pictures per person or triplets
- After training, you can apply the one-shoot learning problem of having only one picture per person
Java Application
The code can be freely found on GitHub as part of the video course. Although if offers all the flexibility to develop or borrow existing models, deeplearning4j face recognition has some known issues and is not offering yet pre-trained weights through transfer learning.
The Code
So, in order to build the Java application, we will need to use the weights from existing Keras OpenFace model found on GitHub repository.
- As the first step, we need to build the Neural Network architecture which is based on Inception Networks( first build by GoogLeNet, detailed information can be found here). The full implementation code is not shown here as it is simple but long:
buildBlock3a(graph);
buildBlock3b(graph);
buildBlock3c(graph);
buildBlock4a(graph);
buildBlock4e(graph);
buildBlock5a(graph);
buildBlock5b(graph);
graph.addLayer("avgpool",
new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.AVG, new int[]{3, 3},
new int[]{1, 1})
.convolutionMode(ConvolutionMode.Truncate)
.build(),
"inception_5b")
.addLayer("dense", new DenseLayer.Builder().nIn(736).nOut(encodings)
.activation(Activation.IDENTITY).build(), "avgpool")
.addVertex("encodings", new L2NormalizeVertex(new int[]{}, 1e-12), "dense")
.setInputTypes(InputType.convolutional(96, 96, inputShape[0])).pretrain(true);
/* Uncomment in case of training the network, graph.setOutputs should be lossLayer then
.addLayer("lossLayer", new CenterLossOutputLayer.Builder()
.lossFunction(LossFunctions.LossFunction.SQUARED_LOSS)
.activation(Activation.SOFTMAX).nIn(128).nOut(numClasses).lambda(1e-4).alpha(0.9)
.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer).build(),
"embeddings")*/
graph.setOutputs("encodings");
Training face recognition Neural Networks is especially computationally expensive and in same time requires quite a lot of effort due to carefully triplet selections. Thanks to transfer learning we can use already trained Neural Network weights even from other languages and frameworks. In this way we can use all the face detection knowledge those Neural Networks gained during training. The weights are read from excel files found originally at the Keras Open Face and then copied to the Java code. The full code can be found at this class on GitHub repository (loadWeights). Some effort is needed in order to adapt the weights from Keras to deeplearning4j internal organization of convolution layers and dense. Notice how for dense first we need the weights(w) and then the bias(b) while for convolution is other way around.
static void loadWeights(ComputationGraph computationGraph) throws IOException {
Layer[] layers = computationGraph.getLayers();
for (Layer layer : layers) {
List<double[]> all = new ArrayList<>();
String layerName = layer.conf().getLayer().getLayerName();
if (layerName.contains("bn")) {
all.add(readWightsValues(BASE + layerName + "_w.csv"));
all.add(readWightsValues(BASE + layerName + "_b.csv"));
all.add(readWightsValues(BASE + layerName + "_m.csv"));
all.add(readWightsValues(BASE + layerName + "_v.csv"));
layer.setParams(mergeAll(all));
} else if (layerName.contains("conv")) {
all.add(readWightsValues(BASE + layerName + "_b.csv"));
all.add(readWightsValues(BASE + layerName + "_w.csv"));
layer.setParams(mergeAll(all));
} else if (layerName.contains("dense")) {
double[] w = readWightsValues(BASE + layerName + "_w.csv");
all.add(w);
double[] b = readWightsValues(BASE + layerName + "_b.csv");
all.add(b);
layer.setParams(mergeAll(all));
}
}
}
Basically, these are the main parts of the application apart from Java SWING GUI and other low-level utilities, which can be freely explored in the code.
Running Application and Showcase
It is possible to run the from source by simply executing the RunFaceRecognition class. After running the application a Jaa GUI will be shown as below:
It was possible to register your own images (Register new Member button), which will be shown as a member below and then try if other pictures of new member (Choose Face Image) will match or not (Who Is? button ).
Limitations
In the future, further consolidation may be needed in the way we load the weights. So right now, the model may still need some tuning, so please stay tuned as the code will continually be improved to state of the art accuracy.
Please notice that the open face model is quite small compared to real systems, so the accuracy may not be the best, but it is quite promising and it clearly shows the power of the explored concept on this post.
Enjoy!
Opinions expressed by DZone contributors are their own.
Trending
-
Writing a Vector Database in a Week in Rust
-
Building the World's Most Resilient To-Do List Application With Node.js, K8s, and Distributed SQL
-
Integrating AWS With Salesforce Using Terraform
-
Database Integration Tests With Spring Boot and Testcontainers
Comments