Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Detect Facial Emotions in a Meeting

DZone 's Guide to

Detect Facial Emotions in a Meeting

Maybe you can't read minds, but with this AI solution, reading faces might be the next best thing.

· AI Zone ·
Free Resource

Image title

After a conference is over, it is time to take stock. It's important to know if the conference was actually a success, know if established goals and objectives were reached, and know that an important aspect is audience opinion.

While evaluating audience opinion, facial emotions can bring relevant information. Analyzing facial emotions of each person that attends a conference, requires one or more people who pay attention to public and take notes of their reactions during the conference or in a more general way at the end of it. By following this method, it will be impossible to know for certain all the reactions of all the public in specific moments during the conference. Additionally, if a conference is given online, it will be impossible to know audience emotions.

In this article, we present a system able to analyze the facial expressions of people who attend a meeting by analyzing the image from a camera. The developed program is able to identify faces, following them even if they are moving, and generate a report of facial expressions at the end of the meeting.

Proposed Solution

To develop this system, we implemented a “Haarcascade Frontalface” filter from OpenCV. This filter is based on the value of the pixels on the image in grayscale to find faces. To define the parameters of the filter, there was a database with pictures where the face area was selected by hand. It selected, more specifically, the eyes and forehead region. This filter counts how many black and white pixels can be found in each region and decides by windowing whether there’s a face or not.

Image title

We have also used a convolutional neural network to identify facial expressions. This model is based on the FER-2013 dataset, which contains images in grayscale of faces classified as {angry, disgusted, happy, sad, surprised, neutral}.

Methodology

This process was developed in order to create a system able to identify faces in a video, analyze their emotions, and generate a report of them.

The first part of the proposed solution consists of extracting the images that are taken from a webcam that takes a video of people who are attending a conference. An image is taken each 0.2 seconds.

Once we have an image we analyze it with the “Haarcascade Frontalface” from OpenCV, which gives as output the coordinates of the image where faces were found.

We take the area of faces and pass them through a HOG filter to extract the specific characteristics of this segment of the image to be able to make tracking of them.

Finally, we pass the faces through a convolutional neural network to identify the emotion of each face detected in the image.

Image title

Implementation

This system was implemented in Python 3.6.

The first step consists of reading images from the camera and searching for faces. This is done by OpenCV. Then, we implemented HOG filter in the regions where faces were found to track them. By doing this, we always get the coordinates of the faces of people in the same order.

# loading models
emotion_labels = get_labels('fer2013')
face_detection = load_detection_model(detection_model_path)
emotion_classifier = load_model(emotion_model_path, compile=False)
# image detect and track faces
image = cv2.imread(filepath)
faces = ()
faces, detected = pipeline.detect_and_track(image)

The next step consists of extracting the region of the image where faces were found. These regions are changed to greyscale and are the input for the motion detection model. The output that gives information about the detected emotion is “emotion_labels.”

# emotion prediction
faces_emotion = detect_faces(face_detection, gray_image)
emotion_prediction = emotion_classifier.predict(gray_face) 
emotion_probability = np.max(emotion_prediction)
emotion_label_arg = np.argmax(emotion_prediction)
emotion_text = emotion_labels[emotion_label_arg]

Finally, we assigned a number for each emotion, where happy=5, surprised=4, neutral=3, disgusted=2 and bored=1. These numbers are saved in an array, where each row is the detected emotion for each person. We are able to do this thanks to tracking.

#assign numbers and save them for each detected face
if emotion_text == 'happy':
num_emotion=5
elif emotion_text == 'surprise':
num_emotion=4
elif emotion_text == 'neutral':
num_emotion=3
elif emotion_text == 'disgust': 
num_emotion=2 
elif emotion_text == 'sad': 
num_emotion=1
m_frames[ibox]=num_emotion

At the end of the analysis of the emotions, the system saves the array that contains detected emotions of each person and generates the information of global detected emotions. We use this array to generate the graphs and each graph is “linked” to the picture of each found person to allow the web page to display the graph of each person when their picture is “clicked.”

Results

We developed a user-friendly web page in order to allow users to interact with the detected emotions model in an easy way.

The web page opens the camera of the user, receives an image and process it in “near real time.” When the user presses “Stop” button, the system generates a video using the images that have been analyzed and a graph where there are shown the global reaction of participants and a button that takes the user to another page where he can visualize a graph of each detected person.

Image title

Summary and Future Work

In this project, we proposed a solution to the problem of detecting emotions implementing a facial recognition model and a face tracker. We are able to generate a report of the emotions of each person during a meeting even if a person leaves the room and gets back to the meeting.

After having the results of the system we could identify two ways of improving this solution. One corresponding to the performance of the model used to detect the emotions and the other related to the algorithm to recognize the faces on the image.

The emotions detection model that was used in this project has low accuracy, so to make the system work better, it will be necessary to re-train the facial detection model or to develop our own in order to be able to identify people facial expressions from different angles.

The Haar-cascade filter that we implemented to detect faces has the inconvenience that if the image has not enough contrast, it will not recognize all the faces in an image. This could be solved if we add a pre-processing step to the image to add contrast.

Topics:
artifical intelligence ,emotion detection ,image processing with opencv ,computer vision ,face tracking ,convolutional neural network

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}