[Part-4] Text to Action: Wake Word Detection Speech to Calendar Event

Build a hands-free voice assistant with wake word detection that converts "Hey Calendar" commands into Google Calendar events using Web Speech API and AI.

Jul. 24, 25 · Tutorial

Likes (2)

Comment

Save

1.9K Views

Welcome to the fourth installment of our “Text to Action” series, where we’re building intelligent systems that transform natural language into real-world actions using AI.

In [Part-1] Text to Action: Build a Smart Calendar AI Assistant, we established our foundation by creating an Express.js backend that connects to Google Calendar’s API. This gave us the ability to programmatically create calendar events through exposed API endpoint.

In [Part-2] Text to Action: Words to Calendar Events, we added natural language processing (NLP) capabilities, enabling users to type descriptions like “Schedule a team meeting tomorrow at 3pm” and have our system intelligently transform these words into calendar events.

In [Part-3] Text to Action: Adding Voice Control to Your Smart Calendar, we implemented voice commands with a press-and-hold interface, creating a hands-free way to schedule events by speaking directly to the system.

Today, we’re implementing wake word detection — our first attempt at truly hands-free calendar management. Simply say “Hey Calendar, schedule a meeting tomorrow at 3pm” without pressing any buttons.

What We’re Building

We’re adding wake word detection to our existing application that will:

Continuously listen for the wake phrase “Hey Calendar” using the Web Speech API
Process spoken commands automatically when the wake word is detected
Provide voice feedback and visual status indicators
Reset automatically after each command for repeated use

Important Note: This implementation uses the Web Speech API, which provides immediate functionality but has limitations in accuracy and consistency.

Consider this Version 1 of wake word detection — in Part 5, we’ll implement a much more reliable solution using custom machine learning models.

Demo flow: “Hey Calendar” Detection → Command Extraction → NLP Processing → Calendar Event Creation → Voice Confirmation

This creates a complete hands-free calendar assistant that responds naturally to voice commands.

The Wake Word Flow

Here’s what happens when you say “Hey Calendar, schedule a team meeting tomorrow at 2pm”:

Always Listening: System continuously monitors for “Hey Calendar”
Wake Word Detected: Full transcript is captured and wake word identified
Command Extraction: Everything after “Hey Calendar” becomes the command
Voice Confirmation: “Yes, processing your command” provides immediate feedback
NLP Processing: Command sent to existing /api/text-to-event endpoint (Part 2)
Calendar Creation: Event created using Google Calendar API (Part 1)
Success Feedback: Visual and voice confirmation of created event
Auto Reset: System returns to listening for next wake word

The beauty is that this requires zero backend changes — we reuse all the infrastructure from Parts 1–3.

Core Implementation

Setting Up Continuous Listening

// Check browser compatibility
if (!('webkitSpeechRecognition' in window) && !('SpeechRecognition' in window)) {
  alert('Your browser does not support the Speech Recognition API. Please use Chrome, Edge, or Safari.');
  return;
}

// Initialize speech recognition
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

// Configure for continuous listening
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = 'en-US';

// Simple state management
let isWakeWordEnabled = false;
let isProcessingCommand = false;
let lastProcessedCommand = '';

Wake Word Detection Logic

The heart of our system is the speech recognition processing that listens for “Hey Calendar”:

recognition.onresult = (event) => {
  if (isProcessingCommand) return; // Prevent processing while busy
  
  let finalTranscript = '';
  
  // Extract final transcript
  for (let i = event.resultIndex; i < event.results.length; i++) {
    if (event.results[i].isFinal) {
      finalTranscript += event.results[i][0].transcript;
    }
  }
  
  // Process final results only
  if (finalTranscript) {
    processTranscript(finalTranscript.toLowerCase().trim());
  }
};

function processTranscript(transcript) {
  // Prevent duplicate command processing
  if (transcript === lastProcessedCommand) {
    return;
  }
  
  // Simple but effective wake word detection
  if (transcript.includes('hey calendar')) {
    lastProcessedCommand = transcript;
    handleWakeWordCommand(transcript);
  }
}

Command Processing

When the wake word is detected, we extract the command and process it:

function handleWakeWordCommand(transcript) {
  isProcessingCommand = true;
  
  // Visual feedback
  statusEl.textContent = 'Wake word detected! Processing command...';
  
  // Extract command after wake word
  const command = transcript.replace(/hey calendar,?/gi, '').trim();
  
  if (command) {
    speak('Yes, processing your command.');
    processVoiceCommand(command); // Reuses function from Part 3
  } else {
    speak('I heard Hey Calendar, but no command. Please try again.');
    resetToListening();
  }
}

Auto-Restart Mechanism

A critical feature is automatically restarting speech recognition when it stops:

recognition.onend = () => {
  // Auto-restart if still enabled and not processing
  if (isWakeWordEnabled && !isProcessingCommand) {
    setTimeout(() => {
      if (isWakeWordEnabled && !isProcessingCommand) {
        try {
          recognition.start();
        } catch (error) {
          // Retry after delay if restart fails
          setTimeout(() => {
            if (isWakeWordEnabled) {
              recognition.start();
            }
          }, 1000);
        }
      }
    }, 100);
  }
};

Connecting to Existing Infrastructure

The beauty of our architecture is that wake word detection seamlessly integrates with our existing system:

async function processVoiceCommand(text) {
  try {
    const timezone = Intl.DateTimeFormat().resolvedOptions().timeZone;
    
    // Send to existing NLP endpoint from Part 2
    const response = await fetch('/api/text-to-event', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Timezone': timezone
      },
      body: JSON.stringify({ text })
    });
    
    const data = await response.json();
    
    if (data.success) {
      const eventData = data.eventData;
      speak(`Event created: ${eventData.summary}.`);
      // Display success message
    } else {
      speak('Sorry, I couldn\'t create that event.');
    }
    
  } catch (error) {
    speak('Sorry, I couldn\'t create that event.');
  }
  
  // Auto-reset to listening mode
  setTimeout(resetToListening, 2000);
}

Notice how we’re simply sending the recognized speech text to our existing /api/text-to-event endpoint from Part 2. This demonstrates the power of good architectural design — we can add new interface modes without recreating our core functionality.

Architecture Integration

This implementation requires zero backend changes:

Reuses /api/text-to-event endpoint from Part 2
Leverages Google Calendar integration from Part 1
Uses same voice synthesis capabilities from Part 3
Works alongside existing interfaces without conflicts

The modular design allows users to choose their preferred interaction method while maintaining a consistent underlying system.

Visual and Voice Feedback

The interface provides comprehensive feedback for all states:

Visual Status Indicators:

Wake Word: ON — Green dot with breathing animation, actively listening
Processing — Yellow dot with spinning animation, creating calendar event
Wake Word: OFF — Gray dot, system disabled

Voice Feedback:

“Wake word detection enabled” when starting
“Yes, processing your command” when wake word detected
“Event created: [event name]” when successful
Helpful error messages when things go wrong

Real-time Transcript Display:

Shows live “Hearing: [text]” as you speak
Confirms “Wake word detected!” when “Hey Calendar” is recognized
Displays the full command being processed

Testing Your Wake Word Assistant

Start the server: npm start
Make sure Ollama is running with the llama3.2:latest model
Open: http://localhost:3000/part-4-wake-word-detection-using-web-speech-api.html
Click “Enable Hey Calendar”
Say: “Hey Calendar, schedule a team meeting tomorrow at 2pm”
Watch for visual indicators and listen for voice confirmation

Pro Tips:

Speak the complete command in one phrase for best results
Click example commands to hear them spoken aloud
Watch the indicator dot: green (listening), yellow (processing), gray (off)
The system works reliably for repeated commands

The Complete “Text to Action” Journey

With wake word detection implemented, our Calendar AI Assistant now offers four distinct interaction modes:

Direct API Calls — Create events with structured JSON requests (Part 1)
Natural Language Text — Type commands like “Schedule meeting tomorrow 3pm” (Part 2)
Press-and-Hold Voice — Manual voice activation when you’re ready (Part 3)
Always-On Wake Word — Hands-free operation with “Hey Calendar” (Part 4)

This progression demonstrates how modern web technologies can create increasingly sophisticated user experiences, each building on the foundation of the previous parts.

What Works vs. Limitations

Works Well:

Continuous listening with automatic restart
Clear visual and voice feedback
Seamless integration with existing calendar system
Reliable duplicate command prevention

Current Limitations:

Web Speech API accuracy varies by environment
Occasional false positives with similar-sounding words
Requires modern browser with Web Speech API support

These limitations highlight why custom wake word detection using machine learning is valuable, which we’ll explore in future parts.

Conclusion

We’ve successfully implemented wake word detection, creating a true hands-free calendar assistant that responds to “Hey Calendar” commands. While this Web Speech API approach has some limitations, it provides immediate functionality and demonstrates the core concepts of always-on voice interfaces.

The complete code is available on GitHub.

Next: In Part 5, we’ll implement custom wake word detection using machine learning for higher accuracy, personalized wake phrases, and better environmental noise handling.

Resources

Let me know in the comments what you’d like to see built next!

AI Google Calendar Speech recognition

Published at DZone with permission of Vivek Vellaiyappan Surulimuthu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending