Tuesday, March 8, 2016

Text to Speech Synthesis

Recently I tried out an eLearning project on providing voice overs. Apart from using an actual voice-over from a real person, I used a computer generated voice. Remember how the HAL sang the "Daisy Daisy" from "Open the pod bay doors, HAL" article. Technology is much more improved now. The robotic voices are now replaced by natural voices, You no longer feel the roboticality of the voice. I mean... not upto the maximum, but upto some extent, to an impressive level.

By default for MS Windows (7) there comes a default TTS voice called Anna. She's an American English voice. If you need more voices, you can freely download from Microsoft website for free. British English; US English; Indian English; like wise. But all are female as I can remember. :P Apart from English, you can get Spanish, French and many other languages too.

I tried out default Anna and few other English voices, such as Helen (US English), ZiraPro (US English) and Hazel (British English). But to tell you the truth, they are pretty ugly. They still make you feel robotical.

NOTE: If you find any trouble that installed voices are not appeared on speech properties of control panel, you need to do some registry editing, as well as open the speech properties dialog using a file. Search and find the solution on Internet, For me, after that hack, now it's perfectly working.

If you like to have some third party voices, I would like to recommend the following.

Number one TTS voice provider is IVONA voices in my opinion. I highly prefer UK English voice Brian. You can use the online demo or else download from the underworld. (You know what I meant aren't you? :P)

And then another recommendation is using the voices of Acapela Group. I do not prefer to use the downloadable voices available in the underworld because they voiced bit more like telephony. If you need any natural hearing audio files, you can use their online demo "Acapela-Box" and record it using "Stereo Mix"option. Stereo Mix is a way of recording what you can hear in the speakers. You can check it on the recording options of sound properties on your control panel. My favorite voice in there is UK English voice Graham.

My third preference goes to Loquendo. But when compared to above two, quality is little low. Both the online demo as well as underworld download is available. I actually used it because of its editor, Loquendo TTS Director.

The benefit of using Acapela and Loquendo is, the voices they provide contain voice smileys. For example the way of saying Hello!, Good Morning!, Shut Up! kind of voice acts to greet, regrets, complements etc. And also they provides para-linguistic sounds such as sounds for crying, thinking, laughing etc. With Loquendo TTS Director we can easily select what voice smiley or para-linguistic sound needed. But the issue is, L-TTS Director only supports voices of Loquendo. With "Acapela-Box", you have to refer their text documentation to see the way of including voice smiley or para-linguistic sound.

What I mentioned above are the voice engines. For voice editors, apart from L-TTS Director, you can use NaturalReader and TextAloud. I prefer TextAloud even though it doesn't support voice smiley or para-linguistic sound even voice engines supported them. But when you install TextAloud they'll provide plugins for MS-Word Application, your pdf readers, you web browsers, etc. You may find it beneficial, if you like to listen more than read.

Here is an sample presentation delivered by Ivona UK Englsih voice Brian.