This is my first video I publish at YouTube and I made in English so try being nice and indulgent when you rate it.
I have been looking the best speech synthesiser for couple of days which I could use in my home made applications. I program mostly in Java language but I haven't found any solution which is completed and has enough quality to satisfy me. I found free library FreeTTS which was entirely written in Java language. I will talk about it some day. This is the link to the website of this library. If you see release notes page, you read that the development of this library was finished in 2005. FreeTTS speaks only in English. On the left side you can see the links to the sound files. Try to listen to how it sounds. It sounds like a robot. Would you like to have your e-book read aloud like this? I think you would be tired after listening to a couple of paragraphs.
I have found good speech synthesiser, called IVONA. It speaks in over 18 languages, Polish, English, German as well. At this website you can see all available IVONA voices. Some languages have a couple of voices. For example English language involves Amy, Brian and Emma voices. Let me present Brian’s voice to you. It is quite pleasant to hear something like this, knowing this is a software. As you can see the Brian voice costs 45$. You can download 30-day trial version. I did it and I'm still testing these voices. It is important that the voices of this synthesiser support SAPI 5 interface. SAPI 5 is the native Microsoft Speech API (SAPI). The documentation for SAPI is available at the website MSDN Library.
SAPI 5 is not available from Java Virtual Machine position. It is required to program it in C++ or C# or Visual Basic. My simple application is written in the C++ language. If I could use SAPI in the application written in Java, I can, for example, make in C++ the library DLL (Dynamic-link library) and take the benefit from Java Native Interface. Thanks to this I will use C++ application in my Java application. There is also one more solution, a little dirty. I can execute EXE application with proper command-line arguments from Java. To do that, I use ProcessBuilder class or System.Runtime.exec method.
Before I describe the "speak(...)" function, look at the "main(...)" function. When I call the function "speak(...)" I pass the text which will be read aloud and the description of the voice which will be used. This attribute is used by the function "SpFindBestToken" for indicating the speech synthesiser and the voice. How do you know what to write here? Please open the windows registry (regedit command) and open the node "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices". There is a list of all voices you can use. You can indicate the voice you are interested in by passing of attributes like Age, Gender, Language, Name and Vendor.
It's interesting that Speech API can be used not only for speech synthesising but also for speech recognising. If I had installed the application for speech recognising, which supported SAPI 5, I could see it in the node "HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech\\Recognizers".
In my simple application I use 3 functions which comes from Component Object Model (COM) library. There are:
- CoInitialize - Initializes the COM library on the current thread. You need to initialize the COM library on a thread before you call any of the library functions.
- CoUninitialize - Closes the COM library on the current thread, unloads all DLLs loaded by the thread, frees any other resources that the thread maintains, and forces all RPC connections on the thread to close.
- CoCreateInstance - Creates a single uninitialized object of the class associated with a specified CLSID.
I pass here "CLSID_SpVoice" which is connected with the class "ISpVoice" and described in "sapi.h" header file. Here I have the identifiers of all classes of Speech API. Now you can use "pVoice" pointer. Here I choose the voice and here I order the speech synthesiser to read my text. Finally, I release the resources. Of course this chunk of code can be refactored because I do the same things three times.
In my next presentation I will show you the solution for integrating Java application with this simple text-to-speech application. I will describe other text-to-speech platforms and application programming interfaces (APIs).
Visit my GitHub proffile.
- Microsoft Speech Platform SDK 11 Documentation, Application-Level Interfaces
- Text-to-speech tutorial from MSDN https://msdn.microsoft.com/en-us/library/ms720163(v=vs.85).aspx
- Polish http://harposoftware.com/en/13-polish
- English http://harposoftware.com/en/16-british-english
- German http://harposoftware.com/en/27-german
Environment and IDE
- Visual Studio Community https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx
- Microsoft Speech Platform SDK 11
Component Object Model (COM)
- CoCreateInstance https://msdn.microsoft.com/en-us/library/windows/desktop/ms686615(v=vs.85).aspx
- CoInitialize https://msdn.microsoft.com/en-us/library/windows/desktop/ms678543(v=vs.85).aspx
- CoUninitialize https://msdn.microsoft.com/en-us/library/windows/desktop/ms688715(v=vs.85).aspx
- Helper SpFindBestToken https://msdn.microsoft.com/en-us/library/ms717543(v=vs.85).aspx
- ISpObjectToken https://msdn.microsoft.com/en-us/library/ms718134(v=vs.85).aspx
- ISpVoice https://msdn.microsoft.com/en-us/library/ms719576(v=vs.85).aspx
- ISpVoice::Speak https://msdn.microsoft.com/en-us/library/ms719820(v=vs.85).aspx
- ISpVoice::SetVoice https://msdn.microsoft.com/en-us/library/ms719807(v=vs.85).aspx