Over a million developers have joined DZone.

An Introduction to Speech Synthesis Markup Language

DZone's Guide to

An Introduction to Speech Synthesis Markup Language

From the advent of Microsoft Sam, we've marveled at how computers have talked with us. Now it's time to teach them to talk to us better. SSML can make that happen.

· IoT Zone
Free Resource

Address your IoT software testing needs – improve quality, security, safety, and compliance across the development lifecycle.

Speech synthesis is a not a new technology — computers have been attempting to speak to us for decades — but with the recent rise of voice-activated appliances, speech synthesis is undergoing a renaissance. At more than one meetup I heard Speech Synthesis Markup Language (SSML) mentioned for modeling computerized speech and thought it warranted further investigation.

The W3C introduced SSML in September 2004 by the W3C, but based on JSML and JSGF specifications, which are owned by Sun. It’s an XML-based markup language that defines passages of text, a voice to use to speak them, and allows for ‘prosody’, or the tone or accent of words.

The structure of a passage of spoken text consists of XML elements. A parent speak element that defines the XML definition and a default language. Then optional p (paragraph) and s (sentence) elements that let you define the structure of the text, you can change three structural attributes, but also the language spoken.

Inside these elements are voice elements that let you use a predefined voice that affects the way text is spoken. You can change the gender, age, variant and language, all are optional and you can find out what values are available in the spec. You can combine languages between the structural and voice elements to create spoken text with an accent, i.e. English, but sounding like it’s spoken by a Spanish person.

Inside these elements, you can add a variety of elements to change the way certain passages are spoken. For example, emphasis elements that add ‘stress’ to sections of text:

<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.1">
    <dc:title xml:lang="en">Hello readers</dc:title>

    <s xml:lang="en-UK">
      <voice name="David" gender="male" age="25">
        Good day, is it <emphasis>tea time?</emphasis>
    <s xml:lang="en-US">
      <voice name="David" gender="male" age="25">
        Hey there, want some <emphasis>pie</emphasis>?

In addition to basic emphasis, you can use the prosody element plus parameters to control:

  • pitch
  • contour
  • pitch range
  • rate
  • duration
  • volume

For example:

<s xml:lang="en-US">
  <voice name="David" gender="male" age="25">
    Hey there, want some <prosody pitch="high" rate="slow">pie</prosody>?

Or add pauses for dramatic effect:

<s xml:lang="en-US">
  <voice name="David" gender="male" age="25">
    Hey there, want some <break time="3s" /> pie?

And there’s much more you can control with other elements and parameters, read the full W3C specification to find out more.

Testing SSML

Great, now you know how to create SSML, and as it’s XML-based, you can use a plethora of existing tools to validate the file, but this is audio, so you want to hear how it sounds.

Considering the pedigree of the standard, the options are limited, assuming you have access to hardware to perform the real tests. I ended up using eSpeak, a CLI tool that had GUIs available, but I couldn’t get them to work.

As IBM Watson supports SSML via its speech API, so in theory, you can test some SSML features on the demo page but I couldn’t figure out what elements you are able to use.

Accelerate the delivery of high-quality software in the connected IoT era through an integrated analysis, testing, security, and analytics platform

text to speech ,speech-sdk ,ssml ,iot

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}