DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Enable Background Sync, Media Capture, and Geolocation APIs in Your PWA
  • Tutorial: How to Build a Progressive Web App (PWA)
  • How To Build a Real-Time, Event-Driven Information System
  • How To Use IBM App Connect To Build Flows

Trending

  • Mastering Advanced Traffic Management in Multi-Cloud Kubernetes: Scaling With Multiple Istio Ingress Gateways
  • How To Develop a Truly Performant Mobile Application in 2025: A Case for Android
  • How to Convert XLS to XLSX in Java
  • Cosmos DB Disaster Recovery: Multi-Region Write Pitfalls and How to Evade Them
  1. DZone
  2. Data Engineering
  3. Databases
  4. Tutorial: How to Build a Progressive Web App (PWA) with Face Recognition and Speech Recognition

Tutorial: How to Build a Progressive Web App (PWA) with Face Recognition and Speech Recognition

I will discuss advanced PWA features that provide access to your hardware APIs. We are going to build an app with Face Recognition and Speech Recognition. This is now possible on the Web/Browser! The native experience users see on native apps is now brought to the web! This would open up a whole new approach.

By 
Peter Eijgermans user avatar
Peter Eijgermans
·
Updated Dec. 20, 20 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
19.6K Views

Join the DZone community and get the full member experience.

Join For Free

This is a follow up to the second tutorial on PWA. You can also follow this tutorial if you haven't followed the second one or my first tutorial about PWA. We are going to focus on some new Web APIs, such as:

  • Face detection API, for face recognition in the browser. https://justadudewhohacks.github.io/face-api.js/docs/index.html
  • Web speech API for enabling 'Speech-to-text' in this app. https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API

We add these APIs to our existing PWA for taking 'selfies.' With face detection, we predict your emotion, your gender, and your age.

To record an accompanying text ('Speech-to-text') for your 'selfie,' you can easily use the Web Speech API.

Experimental Web Platform features

The above APIs only work if you have enabled 'Experimental Web Platform features' in your Chrome browser via the URL: chrome: // flags

Figure 1

Project Setup

As a starting point for the tutorial, clone the following Github repository:

Shell
 




xxxxxxxxxx
1


 
1
git clone https://github.com/petereijgermans11/progressive-web-app



Then in your terminal, move to the following directory:  

Shell
 




xxxxxxxxxx
1


 
1
cd pwa-article/pwa-app-native-features-rendezvous-init



and install the dependencies through:

Shell
 




xxxxxxxxxx
1


 
1
npm i && npm start 



Open your app on:  http:// localhost:8080

Figure 2

Public URL for Your Mobile

There are many ways to access our localhost:8080 from a remote mobile device. You can use ngrok for this (https://ngrok.com/).

Install ngrok via:

Shell
 




xxxxxxxxxx
1


 
1
npm install -g ngrok


And run the following command in your terminal:

Shell
 




xxxxxxxxxx
1


 
1
ngrok http 8080


This command generates a public URL for you. Then browse to the generated URL on your mobile in Chrome.

Face Recognition With JavaScript

Face detection/face recognition is one of the most used applications of Artificial Intelligence. The use of face recognition has increased in recent years.

In this tutorial, we will expand the existing app with face recognition that even works in the browser. We predict your emotion, your gender, and your age based on your 'selfie.' We are using the Face-api.js here.

Face-api.js contains a JavaScript API for face recognition in the browser. This API is implemented on top of the tensorflow.js API.

The output of this app looks as shown below:

Figure 3

The steps to implement the above are as follows:

Step 1: Face-api

Face-api.js contains a JavaScript API for face recognition in the browser. This face-api is already available in the folder: public/src/lib

Step 2: Models

Models are the trained data that we will use to detect features of your 'selfie.' These models are already available in the folder: public/src/models

Step 3: index.html

In the index.html file, we import:

  • the existing facedetection.css for the styling (see listing 1);
  • face-api.min.js, this is the Face detection API for processing the model data and extracting the features (see listing 3);
  • facedetection.js where we will write our logic.

 First, import the styling into the index.html

JavaScript
 




xxxxxxxxxx
1


 
1
<link rel="stylesheet" href="src/css/facedetection.css">


Listing 1


Place the code below in the index.html file, directly below the tag: <div id="create-post">

We use the existing video tag to be able to take a "selfie"(see listing 2). And a "result container" for predicting your emotion, your gender, and your age.

JavaScript
 




x
9


 
1
<video id="player" autoplay></video>
2
<div class="container-faceDetection">
3
</div>
4
<canvas id="canvas" width="320px" height="240px"></canvas>
5
<div class="result-container">
6
   <div id="emotion">Emotion</div>
7
   <div id="gender">Gender</div>
8
   <div id="age">Age</div>
9
</div>


Listing 2


Place the code below at the bottom of the index.html so that we can use the Face-detection API:

JavaScript
 




xxxxxxxxxx
1


 
1
<script src="src/lib/face-api.min.js"></script>
2
<script src="src/js/facedetection.js"></script>


Listing 3


Step 4: Import Models Into the PWA

Here we create a separate function in feed.js to start the video streaming (listing 4).

Move the following existing code from the initializeMedia()-function into a separate function called startVideo() (see result code: listing 4)

This function is responsible for the video streaming:

JavaScript
 




xxxxxxxxxx
1
15


 
1
 
          
2
const startVideo = () => {
3
   navigator.mediaDevices.getUserMedia({video: {facingMode: 'user'}, audio: false})
4
       .then(stream => {
5
           videoPlayer.srcObject = stream;
6
           videoPlayer.style.display = 'block';
7
           videoPlayer.setAttribute('autoplay', '');
8
           videoPlayer.setAttribute('muted', '');
9
           videoPlayer.setAttribute('playsinline', '');
10
       })
11
       .catch(error => {
12
           console.log(error);
13
       });
14
}


Listing 4


In the existing feed.js, we use Promise.all to load the models for the face API asynchronously. Once these models are properly loaded, we call the created startVideo() function (see listing 5).

Place the following code at the bottom of the initializeMedia() function:

JavaScript
 




xxxxxxxxxx
1


 
1
Promise.all([
2
   faceapi.nets.tinyFaceDetector.loadFromUri("/src/models"),
3
   faceapi.nets.faceLandmark68Net.loadFromUri("/src/models"),
4
   faceapi.nets.faceRecognitionNet.loadFromUri("/src/models"),
5
   faceapi.nets.faceExpressionNet.loadFromUri("/src/models"),
6
   faceapi.nets.ageGenderNet.loadFromUri("/src/models")
7
]).then(startVideo);


Listing 5

Step 5: Implement the facedetection.js

Below are the functions of the Face detection API, which we use in our app:

                                            faceapi.detectSingleFace                                               

faceapi.detectSingleFace uses the SSD Mobilenet V1 Face Detector. You can specify the face recognition by passing the videoPlayer object an options object. To detect multiple faces, replace the detectSingleFace feature with detectAllFaces.

                                                                      withFaceLandmarks                                                                         

This function is used to detect 68 face landmarks.

                                                                       withFaceExpressions                                                                       

This function detects all faces in an image, recognizes face expressions from each face, and returns an array.

                                                                          withAgeAndGender                                                                       

This function detects all faces in an image, estimates the age and gender of each face, and returns an array.

Place the code below in the existing file with the name: facedetection.js, under the existing code (see listing 6).

The above functions are called in here to perform face recognition.

First, a playing event handler is added to the videoPlayer, which responds when the video camera is active.

The variable videoPlayer contains the HTML element <video>. Your video tracks will be rendered in this.

Then a canvasElement is created under the name canvasForFaceDetection. This is used for face recognition. This canvasForFaceDetection is placed in the container faceDetection.

The setInterval() function repeats the faceapi.detectSingleFace function at a time interval of 100 milliseconds. This function is called asynchronously using async / await, and finally, the results of the face recognition are displayed in the fields: emotion, gender, and age.

JavaScript
 




xxxxxxxxxx
1
36


 
1
videoPlayer.addEventListener("playing", () => {
2
 const canvasForFaceDetection = faceapi.createCanvasFromMedia(videoPlayer);
3
 let containerForFaceDetection = document.querySelector(".container-faceDetection");
4
 containerForFaceDetection.append(canvasForFaceDetection);
5
 
          
6
 const displaySize = { width: 500, height: 500};
7
 faceapi.matchDimensions(canvasForFaceDetection, displaySize);
8
 
          
9
 setInterval(async () => {
10
   const detections = await faceapi
11
     .detectSingleFace(videoPlayer, new faceapi.TinyFaceDetectorOptions())
12
     .withFaceLandmarks()
13
     .withFaceExpressions()
14
     .withAgeAndGender();
15
 
          
16
   const resizedDetections = faceapi.resizeResults(detections, displaySize);
17
     canvasForFaceDetection.getContext("2d").clearRect(0, 0, 500, 500);
18
 
          
19
   faceapi.draw.drawDetections(canvasForFaceDetection, resizedDetections);
20
   faceapi.draw.drawFaceLandmarks(canvasForFaceDetection, resizedDetections);
21
   if (resizedDetections && Object.keys(resizedDetections).length > 0) {
22
     const age = resizedDetections.age;
23
     const interpolatedAge = interpolateAgePredictions(age);
24
     const gender = resizedDetections.gender;
25
     const expressions = resizedDetections.expressions;
26
     const maxValue = Math.max(...Object.values(expressions));
27
     const emotion = Object.keys(expressions).filter(
28
       item => expressions[item] === maxValue
29
     );
30
     document.getElementById("age").innerText = `Age - ${interpolatedAge}`;
31
     document.getElementById("gender").innerText = `Gender - ${gender}`;
32
     document.getElementById("emotion").innerText = `Emotion - ${emotion[0]}`;
33
   }
34
 }, 100);
35
});


Listing 6


Web Speech API

The interface we are going to build for the Web Speech API will look like the one shown below (see figure 4). As you can see, the screen contains an input field to enter text. But it is also possible to record text using 'Speech-to-text.' For this, you have to click on the microphone icon in the input field.

Under this input field, you can also choose the desired language via a select box.

Figure 4

The steps to implement the above are as follows:

Step 1: index.html

In the index.html file, we import:

  • the existing speech.css for the styling (see listing 7);
  • speech.js where we will write our logic for the 'Speech-to-Text' feature (see listing 9).

First, import the styling into index.html:

JavaScript
 




x


 
1
<link rel="stylesheet" href="src/css/speech.css">


Listing 7


Place the code below in the index.html, directly below the tag: <form> (see listing 8)

In the <div id ="info"> - section, the info texts are placed that can be displayed as soon as this API is used.

With the startButton onclick-event, you can start and use the API.

Finally, with updateCountry onchange-event you can select the desired language via a select box.

JavaScript
 




xxxxxxxxxx
1
35


 
1
<div id="info">
2
   <p id="info_start">Click on the microphone icon and begin speaking.</p>
3
   <p id="info_speak_now">Speak now.</p>
4
   <p id="info_no_speech">No speech was detected. You may need to adjust your
5
       <a href="//support.google.com/chrome/bin/answer.py?hl=en&amp;answer=1407892">
6
           microphone settings</a>.</p>
7
   <p id="info_no_microfoon" style="display:none">
8
       No microphone was found. Ensure that a microphone is installed and that
9
       <a href="//support.google.com/chrome/bin/answer.py?hl=en&amp;answer=1407892">
10
           microphone settings</a> are configured correctly.</p>
11
   <p id="info_allow">Click the "Allow" button above to enable your microphone.</p>
12
   <p id="info_denied">Permission to use microphone was denied.</p>
13
   <p id="info_blocked">Permission to use microphone is blocked. To change,
14
       go to chrome://settings/contentExceptions#media-stream</p>
15
   <p id="info_upgrade">Web Speech API is not supported by this browser.
16
       Upgrade to <a href="//www.google.com/chrome">Chrome</a>
17
       version 25 or later.</p>
18
</div>
19
 
          
20
<div class="right">
21
   <button id="start_button" onclick="startButton(event)">
22
       <img id="start_img" src="./src/images/mic.gif" alt="Start"></button>
23
</div>
24
<div class="input-section mdl-textfield mdl-js-textfield mdl-textfield--floating-label div_speech_to_text">
25
   <span id="title" contenteditable="true" class="final"></span>
26
   <span id="interim_span" class="interim"></span>
27
   <p>
28
</div>
29
<div class="center">
30
   <p>
31
   <div id="div_language">
32
       <select id="select_language" onchange="updateCountry()"></select>
33
       <select id="select_dialect"></select>
34
   </div>
35
</div>


Listing 8


Place the code below at the bottom of index.html, so we can use the Web Speech API (see listing 9):

JavaScript
 




xxxxxxxxxx
1


 
1
<script src="src/js/speech.js"></script>


Listing 9


Step 2: Implement the Web Speech API

This code in the speech.js initializes the Web Speech Recognition API (see listing 10):

JavaScript
 




xxxxxxxxxx
1
37


 
1
if ('webkitSpeechRecognition' in window) {
2
    start_button.style.display = 'inline-block';
3
    recognition = new webkitSpeechRecognition();
4
    recognition.continuous = true;
5
    recognition.interimResults = true;
6
 
          
7
    recognition.onstart = () => {
8
       recognizing = true;
9
       showInfo('info_speak_now');
10
       start_img.src = './src/images/mic-animate.gif';
11
    };
12
 
          
13
    recognition.onresult = (event) => {
14
       let interim_transcript = '';
15
       for (let i = event.resultIndex; i < event.results.length; ++i) {
16
          if (event.results[i].isFinal) {
17
             final_transcript += event.results[i][0].transcript;
18
          } else {
19
             interim_transcript += event.results[i][0].transcript;
20
          }
21
       }
22
       final_transcript = capitalize(final_transcript);
23
       title.innerHTML = linebreak(final_transcript);
24
       interim_span.innerHTML = linebreak(interim_transcript);
25
     };
26
 
          
27
    
28
    recognition.onerror = (event) => {
29
     ....
30
    };
31
 
          
32
    recognition.onend = () => {
33
      ...
34
    };
35
}


Listing 10

First, it is checked whether the API of the 'webkitSpeechRecognition' is available in the window object. The window object represents the browser window (Javascript is part of the window object).

If the 'webkitSpeechRecognition' is available in the window object, then a 'webkitSpeechRecognition' object is created via: recognition = new webkitSpeechRecognition ();

We then set the following properties of the Speech API:

                                                                                          recognition.continuous = true                                                                 

This property determines whether results are continuously returned for each recognition.

                                                                                        recognition.interimResults = true                                                           

This property determines whether the Speech recognition returns intermediate results.

Event Handler

                                                                                                 recognition.onstart(event)                                                                 

This event handler is executed when the SpeechRecognition API is started (see Listing 10).

The following info text is displayed: 'Speak now.' Then an animation of a pulsating microphone is shown: mic-animate.gif (see image).


                                                                                           recognition.onresult(event)                                                                    

This event handler is executed when the SpeechRecognition API returns a result.

The SpeechRecognitionEvent "results property" returns a 2-dimensional SpeechRecognitionResultList. The isFinal property determines in a "loop" whether a result text from the list is 'final' or 'interim.' The results are also converted to a string with the transcript property.


                                                                                      recognition.onend(event)                                                                         

This event handler is executed when Speech Recognition is terminated.

No info text is displayed, only the default icon of a microphone.

 

                                                                                  recognition.onerror(event)                                                                         

This event handler handles the errors. A matching info text is also shown with the error message.


Starting the Speech Recognition

Add the following code at the top of the already existing speech.js to start the Web Speech API, using the startButton event (see listing 11):

JavaScript
 




xxxxxxxxxx
1
17


 
1
const startButton = (event) => {
2
   if (recognizing) {
3
       recognition.stop();
4
       return;
5
   }
6
   final_transcript = '';
7
   recognition.lang = select_dialect.value;
8
   recognition.start();
9
   ignore_onend = false;
10
   title.innerHTML = '';
11
   interim_span.innerHTML = '';
12
   start_img.src = './src/images/mic-slash.gif';
13
   showInfo('info_allow');
14
   start_timestamp = event.timeStamp;
15
};


Listing 11


This code starts the Web Speech Recognition API using recognition.start(). This start() function triggers a start event and is handled in the event handler recognition.onstart() (see Listing 10).

Furthermore, the selected language is set with recognition.lang. And finally, a pulsating microphone image is shown.

Conclusion

The Web is getting more sophisticated day-by-day. More native features are being brought on board; this is due to the fact that the number of web users is far greater than native app users. The native experience users see on native apps is brought to the web to retain them there without the need to go back to the native apps.

After this introduction, you can continue with an extensive tutorial that you can find in: https://github.com/petereijgermans11/progressive-web-app/tree/master/pwa-workshop. 

See also my previous article. In this article, I will discuss some advanced PWA features that provide access to your hardware APIs, like Media Capture API, Geolocation API, and the Background Sync API.

And take a look at my first article. In this article, I discuss some basics behind building PWAs and then provide a comprehensive tutorial on creating your first PWA.

Follow or like me on Twitter: https://twitter.com/EijgermansPeter.



Picasa Web Albums Speech recognition mobile app Listing (computer) API JavaScript Build (game engine) Event

Opinions expressed by DZone contributors are their own.

Related

  • Enable Background Sync, Media Capture, and Geolocation APIs in Your PWA
  • Tutorial: How to Build a Progressive Web App (PWA)
  • How To Build a Real-Time, Event-Driven Information System
  • How To Use IBM App Connect To Build Flows

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!