[Part-4] Text to Action: Wake Word Detection Speech to Calendar Event
Build a hands-free voice assistant with wake word detection that converts "Hey Calendar" commands into Google Calendar events using Web Speech API and AI.
Join the DZone community and get the full member experience.
Join For FreeWelcome to the fourth installment of our “Text to Action” series, where we’re building intelligent systems that transform natural language into real-world actions using AI.
In [Part-1] Text to Action: Build a Smart Calendar AI Assistant, we established our foundation by creating an Express.js backend that connects to Google Calendar’s API. This gave us the ability to programmatically create calendar events through exposed API endpoint.
In [Part-2] Text to Action: Words to Calendar Events, we added natural language processing (NLP) capabilities, enabling users to type descriptions like “Schedule a team meeting tomorrow at 3pm” and have our system intelligently transform these words into calendar events.
In [Part-3] Text to Action: Adding Voice Control to Your Smart Calendar, we implemented voice commands with a press-and-hold interface, creating a hands-free way to schedule events by speaking directly to the system.
Today, we’re implementing wake word detection — our first attempt at truly hands-free calendar management. Simply say “Hey Calendar, schedule a meeting tomorrow at 3pm” without pressing any buttons.
What We’re Building
We’re adding wake word detection to our existing application that will:
- Continuously listen for the wake phrase “Hey Calendar” using the Web Speech API
- Process spoken commands automatically when the wake word is detected
- Provide voice feedback and visual status indicators
- Reset automatically after each command for repeated use
Important Note: This implementation uses the Web Speech API, which provides immediate functionality but has limitations in accuracy and consistency.
Consider this Version 1 of wake word detection — in Part 5, we’ll implement a much more reliable solution using custom machine learning models.
Demo flow: “Hey Calendar” Detection → Command Extraction → NLP Processing → Calendar Event Creation → Voice Confirmation
This creates a complete hands-free calendar assistant that responds naturally to voice commands.
The Wake Word Flow
Here’s what happens when you say “Hey Calendar, schedule a team meeting tomorrow at 2pm”:
- Always Listening: System continuously monitors for “Hey Calendar”
- Wake Word Detected: Full transcript is captured and wake word identified
- Command Extraction: Everything after “Hey Calendar” becomes the command
- Voice Confirmation: “Yes, processing your command” provides immediate feedback
- NLP Processing: Command sent to existing
/api/text-to-eventendpoint (Part 2) - Calendar Creation: Event created using Google Calendar API (Part 1)
- Success Feedback: Visual and voice confirmation of created event
- Auto Reset: System returns to listening for next wake word
The beauty is that this requires zero backend changes — we reuse all the infrastructure from Parts 1–3.
Core Implementation
Setting Up Continuous Listening
// Check browser compatibility
if (!('webkitSpeechRecognition' in window) && !('SpeechRecognition' in window)) {
alert('Your browser does not support the Speech Recognition API. Please use Chrome, Edge, or Safari.');
return;
}
// Initialize speech recognition
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
// Configure for continuous listening
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = 'en-US';
// Simple state management
let isWakeWordEnabled = false;
let isProcessingCommand = false;
let lastProcessedCommand = '';
Wake Word Detection Logic
The heart of our system is the speech recognition processing that listens for “Hey Calendar”:
recognition.onresult = (event) => {
if (isProcessingCommand) return; // Prevent processing while busy
let finalTranscript = '';
// Extract final transcript
for (let i = event.resultIndex; i < event.results.length; i++) {
if (event.results[i].isFinal) {
finalTranscript += event.results[i][0].transcript;
}
}
// Process final results only
if (finalTranscript) {
processTranscript(finalTranscript.toLowerCase().trim());
}
};
function processTranscript(transcript) {
// Prevent duplicate command processing
if (transcript === lastProcessedCommand) {
return;
}
// Simple but effective wake word detection
if (transcript.includes('hey calendar')) {
lastProcessedCommand = transcript;
handleWakeWordCommand(transcript);
}
}
Command Processing
When the wake word is detected, we extract the command and process it:
function handleWakeWordCommand(transcript) {
isProcessingCommand = true;
// Visual feedback
statusEl.textContent = 'Wake word detected! Processing command...';
// Extract command after wake word
const command = transcript.replace(/hey calendar,?/gi, '').trim();
if (command) {
speak('Yes, processing your command.');
processVoiceCommand(command); // Reuses function from Part 3
} else {
speak('I heard Hey Calendar, but no command. Please try again.');
resetToListening();
}
}
Auto-Restart Mechanism
A critical feature is automatically restarting speech recognition when it stops:
recognition.onend = () => {
// Auto-restart if still enabled and not processing
if (isWakeWordEnabled && !isProcessingCommand) {
setTimeout(() => {
if (isWakeWordEnabled && !isProcessingCommand) {
try {
recognition.start();
} catch (error) {
// Retry after delay if restart fails
setTimeout(() => {
if (isWakeWordEnabled) {
recognition.start();
}
}, 1000);
}
}
}, 100);
}
};
Connecting to Existing Infrastructure
The beauty of our architecture is that wake word detection seamlessly integrates with our existing system:
async function processVoiceCommand(text) {
try {
const timezone = Intl.DateTimeFormat().resolvedOptions().timeZone;
// Send to existing NLP endpoint from Part 2
const response = await fetch('/api/text-to-event', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Timezone': timezone
},
body: JSON.stringify({ text })
});
const data = await response.json();
if (data.success) {
const eventData = data.eventData;
speak(`Event created: ${eventData.summary}.`);
// Display success message
} else {
speak('Sorry, I couldn\'t create that event.');
}
} catch (error) {
speak('Sorry, I couldn\'t create that event.');
}
// Auto-reset to listening mode
setTimeout(resetToListening, 2000);
}
Notice how we’re simply sending the recognized speech text to our existing /api/text-to-event endpoint from Part 2. This demonstrates the power of good architectural design — we can add new interface modes without recreating our core functionality.
Architecture Integration
This implementation requires zero backend changes:
- Reuses
/api/text-to-eventendpoint from Part 2 - Leverages Google Calendar integration from Part 1
- Uses same voice synthesis capabilities from Part 3
- Works alongside existing interfaces without conflicts
The modular design allows users to choose their preferred interaction method while maintaining a consistent underlying system.
Visual and Voice Feedback
The interface provides comprehensive feedback for all states:
Visual Status Indicators:
- Wake Word: ON — Green dot with breathing animation, actively listening
- Processing — Yellow dot with spinning animation, creating calendar event
- Wake Word: OFF — Gray dot, system disabled
Voice Feedback:
- “Wake word detection enabled” when starting
- “Yes, processing your command” when wake word detected
- “Event created: [event name]” when successful
- Helpful error messages when things go wrong
Real-time Transcript Display:
- Shows live “Hearing: [text]” as you speak
- Confirms “Wake word detected!” when “Hey Calendar” is recognized
- Displays the full command being processed
Testing Your Wake Word Assistant
- Start the server:
npm start - Make sure Ollama is running with the llama3.2:latest model
- Open:
http://localhost:3000/part-4-wake-word-detection-using-web-speech-api.html - Click “Enable Hey Calendar”
- Say: “Hey Calendar, schedule a team meeting tomorrow at 2pm”
- Watch for visual indicators and listen for voice confirmation
Pro Tips:
- Speak the complete command in one phrase for best results
- Click example commands to hear them spoken aloud
- Watch the indicator dot: green (listening), yellow (processing), gray (off)
- The system works reliably for repeated commands
The Complete “Text to Action” Journey
With wake word detection implemented, our Calendar AI Assistant now offers four distinct interaction modes:
- Direct API Calls — Create events with structured JSON requests (Part 1)
- Natural Language Text — Type commands like “Schedule meeting tomorrow 3pm” (Part 2)
- Press-and-Hold Voice — Manual voice activation when you’re ready (Part 3)
- Always-On Wake Word — Hands-free operation with “Hey Calendar” (Part 4)
This progression demonstrates how modern web technologies can create increasingly sophisticated user experiences, each building on the foundation of the previous parts.
What Works vs. Limitations
Works Well:
- Continuous listening with automatic restart
- Clear visual and voice feedback
- Seamless integration with existing calendar system
- Reliable duplicate command prevention
Current Limitations:
- Web Speech API accuracy varies by environment
- Occasional false positives with similar-sounding words
- Requires modern browser with Web Speech API support
These limitations highlight why custom wake word detection using machine learning is valuable, which we’ll explore in future parts.
Conclusion
We’ve successfully implemented wake word detection, creating a true hands-free calendar assistant that responds to “Hey Calendar” commands. While this Web Speech API approach has some limitations, it provides immediate functionality and demonstrates the core concepts of always-on voice interfaces.
The complete code is available on GitHub.
Next: In Part 5, we’ll implement custom wake word detection using machine learning for higher accuracy, personalized wake phrases, and better environmental noise handling.
Resources
Let me know in the comments what you’d like to see built next!
Published at DZone with permission of Vivek Vellaiyappan Surulimuthu. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments