DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Demystifying the Magic: A Look Inside the Algorithms of Speech Recognition
  • Voice Synthesis: Evolution, Ethics, and Law
  • LLMs for Bad Content Detection: Pros and Cons
  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It

Trending

  • Slopsquatting: Building a Scanner That Catches AI-Hallucinated Packages Before They Reach Production
  • Your AI Agent Tests Are Passing, But Your Agent Is Still Broken
  • GenAI Implementation Isn't Magic — It’s a Lifecycle
  • Identity in Action
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. [Part-4] Text to Action: Wake Word Detection Speech to Calendar Event

[Part-4] Text to Action: Wake Word Detection Speech to Calendar Event

Build a hands-free voice assistant with wake word detection that converts "Hey Calendar" commands into Google Calendar events using Web Speech API and AI.

By 
Vivek Vellaiyappan Surulimuthu user avatar
Vivek Vellaiyappan Surulimuthu
·
Jul. 24, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
1.8K Views

Join the DZone community and get the full member experience.

Join For Free

Welcome to the fourth installment of our “Text to Action” series, where we’re building intelligent systems that transform natural language into real-world actions using AI.

In [Part-1] Text to Action: Build a Smart Calendar AI Assistant, we established our foundation by creating an Express.js backend that connects to Google Calendar’s API. This gave us the ability to programmatically create calendar events through exposed API endpoint.

In [Part-2] Text to Action: Words to Calendar Events, we added natural language processing (NLP) capabilities, enabling users to type descriptions like “Schedule a team meeting tomorrow at 3pm” and have our system intelligently transform these words into calendar events.

In [Part-3] Text to Action: Adding Voice Control to Your Smart Calendar, we implemented voice commands with a press-and-hold interface, creating a hands-free way to schedule events by speaking directly to the system.

Today, we’re implementing wake word detection — our first attempt at truly hands-free calendar management. Simply say “Hey Calendar, schedule a meeting tomorrow at 3pm” without pressing any buttons.

What We’re Building



We’re adding wake word detection to our existing application that will:

  • Continuously listen for the wake phrase “Hey Calendar” using the Web Speech API
  • Process spoken commands automatically when the wake word is detected
  • Provide voice feedback and visual status indicators
  • Reset automatically after each command for repeated use

Important Note: This implementation uses the Web Speech API, which provides immediate functionality but has limitations in accuracy and consistency.

Consider this Version 1 of wake word detection — in Part 5, we’ll implement a much more reliable solution using custom machine learning models.

Demo flow: “Hey Calendar” Detection → Command Extraction → NLP Processing → Calendar Event Creation → Voice Confirmation

This creates a complete hands-free calendar assistant that responds naturally to voice commands.

The Wake Word Flow

Here’s what happens when you say “Hey Calendar, schedule a team meeting tomorrow at 2pm”:

  1. Always Listening: System continuously monitors for “Hey Calendar”
  2. Wake Word Detected: Full transcript is captured and wake word identified
  3. Command Extraction: Everything after “Hey Calendar” becomes the command
  4. Voice Confirmation: “Yes, processing your command” provides immediate feedback
  5. NLP Processing: Command sent to existing /api/text-to-event endpoint (Part 2)
  6. Calendar Creation: Event created using Google Calendar API (Part 1)
  7. Success Feedback: Visual and voice confirmation of created event
  8. Auto Reset: System returns to listening for next wake word

The beauty is that this requires zero backend changes — we reuse all the infrastructure from Parts 1–3.

Core Implementation

Setting Up Continuous Listening

// Check browser compatibility
if (!('webkitSpeechRecognition' in window) && !('SpeechRecognition' in window)) {
  alert('Your browser does not support the Speech Recognition API. Please use Chrome, Edge, or Safari.');
  return;
}

// Initialize speech recognition
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

// Configure for continuous listening
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = 'en-US';

// Simple state management
let isWakeWordEnabled = false;
let isProcessingCommand = false;
let lastProcessedCommand = '';


Wake Word Detection Logic

The heart of our system is the speech recognition processing that listens for “Hey Calendar”:

recognition.onresult = (event) => {
  if (isProcessingCommand) return; // Prevent processing while busy
  
  let finalTranscript = '';
  
  // Extract final transcript
  for (let i = event.resultIndex; i < event.results.length; i++) {
    if (event.results[i].isFinal) {
      finalTranscript += event.results[i][0].transcript;
    }
  }
  
  // Process final results only
  if (finalTranscript) {
    processTranscript(finalTranscript.toLowerCase().trim());
  }
};

function processTranscript(transcript) {
  // Prevent duplicate command processing
  if (transcript === lastProcessedCommand) {
    return;
  }
  
  // Simple but effective wake word detection
  if (transcript.includes('hey calendar')) {
    lastProcessedCommand = transcript;
    handleWakeWordCommand(transcript);
  }
}


Command Processing

When the wake word is detected, we extract the command and process it:

function handleWakeWordCommand(transcript) {
  isProcessingCommand = true;
  
  // Visual feedback
  statusEl.textContent = 'Wake word detected! Processing command...';
  
  // Extract command after wake word
  const command = transcript.replace(/hey calendar,?/gi, '').trim();
  
  if (command) {
    speak('Yes, processing your command.');
    processVoiceCommand(command); // Reuses function from Part 3
  } else {
    speak('I heard Hey Calendar, but no command. Please try again.');
    resetToListening();
  }
}


Auto-Restart Mechanism

A critical feature is automatically restarting speech recognition when it stops:

recognition.onend = () => {
  // Auto-restart if still enabled and not processing
  if (isWakeWordEnabled && !isProcessingCommand) {
    setTimeout(() => {
      if (isWakeWordEnabled && !isProcessingCommand) {
        try {
          recognition.start();
        } catch (error) {
          // Retry after delay if restart fails
          setTimeout(() => {
            if (isWakeWordEnabled) {
              recognition.start();
            }
          }, 1000);
        }
      }
    }, 100);
  }
};


Connecting to Existing Infrastructure

The beauty of our architecture is that wake word detection seamlessly integrates with our existing system:

async function processVoiceCommand(text) {
  try {
    const timezone = Intl.DateTimeFormat().resolvedOptions().timeZone;
    
    // Send to existing NLP endpoint from Part 2
    const response = await fetch('/api/text-to-event', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Timezone': timezone
      },
      body: JSON.stringify({ text })
    });
    
    const data = await response.json();
    
    if (data.success) {
      const eventData = data.eventData;
      speak(`Event created: ${eventData.summary}.`);
      // Display success message
    } else {
      speak('Sorry, I couldn\'t create that event.');
    }
    
  } catch (error) {
    speak('Sorry, I couldn\'t create that event.');
  }
  
  // Auto-reset to listening mode
  setTimeout(resetToListening, 2000);
}


Notice how we’re simply sending the recognized speech text to our existing /api/text-to-event endpoint from Part 2. This demonstrates the power of good architectural design — we can add new interface modes without recreating our core functionality.

Architecture Integration

This implementation requires zero backend changes:

  • Reuses /api/text-to-event endpoint from Part 2
  • Leverages Google Calendar integration from Part 1
  • Uses same voice synthesis capabilities from Part 3
  • Works alongside existing interfaces without conflicts

The modular design allows users to choose their preferred interaction method while maintaining a consistent underlying system.

Visual and Voice Feedback

The interface provides comprehensive feedback for all states:

Visual Status Indicators:

  • Wake Word: ON — Green dot with breathing animation, actively listening
  • Processing — Yellow dot with spinning animation, creating calendar event
  • Wake Word: OFF — Gray dot, system disabled

Voice Feedback:

  • “Wake word detection enabled” when starting
  • “Yes, processing your command” when wake word detected
  • “Event created: [event name]” when successful
  • Helpful error messages when things go wrong

Real-time Transcript Display:

  • Shows live “Hearing: [text]” as you speak
  • Confirms “Wake word detected!” when “Hey Calendar” is recognized
  • Displays the full command being processed

Testing Your Wake Word Assistant

  1. Start the server: npm start
  2. Make sure Ollama is running with the llama3.2:latest model
  3. Open: http://localhost:3000/part-4-wake-word-detection-using-web-speech-api.html
  4. Click “Enable Hey Calendar”
  5. Say: “Hey Calendar, schedule a team meeting tomorrow at 2pm”
  6. Watch for visual indicators and listen for voice confirmation

Pro Tips:

  • Speak the complete command in one phrase for best results
  • Click example commands to hear them spoken aloud
  • Watch the indicator dot: green (listening), yellow (processing), gray (off)
  • The system works reliably for repeated commands

The Complete “Text to Action” Journey

With wake word detection implemented, our Calendar AI Assistant now offers four distinct interaction modes:

  1. Direct API Calls — Create events with structured JSON requests (Part 1)
  2. Natural Language Text — Type commands like “Schedule meeting tomorrow 3pm” (Part 2)
  3. Press-and-Hold Voice — Manual voice activation when you’re ready (Part 3)
  4. Always-On Wake Word — Hands-free operation with “Hey Calendar” (Part 4)

This progression demonstrates how modern web technologies can create increasingly sophisticated user experiences, each building on the foundation of the previous parts.

What Works vs. Limitations

Works Well:

  • Continuous listening with automatic restart
  • Clear visual and voice feedback
  • Seamless integration with existing calendar system
  • Reliable duplicate command prevention

Current Limitations:

  • Web Speech API accuracy varies by environment
  • Occasional false positives with similar-sounding words
  • Requires modern browser with Web Speech API support

These limitations highlight why custom wake word detection using machine learning is valuable, which we’ll explore in future parts.

Conclusion

We’ve successfully implemented wake word detection, creating a true hands-free calendar assistant that responds to “Hey Calendar” commands. While this Web Speech API approach has some limitations, it provides immediate functionality and demonstrates the core concepts of always-on voice interfaces.

The complete code is available on GitHub.

Next: In Part 5, we’ll implement custom wake word detection using machine learning for higher accuracy, personalized wake phrases, and better environmental noise handling.

Resources

  • Web Speech API Documentation
  • Google Calendar API
  • Ollama Documentation

Let me know in the comments what you’d like to see built next!

AI Google Calendar Speech recognition

Published at DZone with permission of Vivek Vellaiyappan Surulimuthu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Demystifying the Magic: A Look Inside the Algorithms of Speech Recognition
  • Voice Synthesis: Evolution, Ethics, and Law
  • LLMs for Bad Content Detection: Pros and Cons
  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook