2017년 8월 16일 수요일

Improving text to speech output? Other tips and tricks?


I an app I am building I plan to use text to speech extensively to playback when a button (with an image on it) is clicked.
The app is for autistic children so it is quite important the speech engine kicks out a good facsimile of the word as we would speak it.

For the most part the speech engine does a good job, but for some words and proper nouns, output is a bit odd. i have been looking at ways to improve this output.

I use GBR english.

A few examples:

Text box has the name "Rosie" in it. The speech engine knocks this out in a rushed fashion. I found that if i added another r - "Rrosie", this improved things.

Text box has the phrase "Dogs at home" in it.  Again it is a bit rushed. By adding a . after Dogs, so "Dogs. at home" this accentuates Dogs, inserts a short pause then finishes. Much better.

Text box has the word "Vtron" in it. (Don't ask!). This comes out vTron very quickly as opposed to Veetron.  So I tried "Vee. tron" and this was better.

I have not been able to find any documentation on this area of text to speech, so wondered if any one had any further tips, on what options there are to modify things.

Also, can I change to a GBR english female voice ?

--
For a GBR english female voice the following might work:    https://play.google.com/store/apps/developer?id=CereProc+Text-to-Speech&hl=en 

Have you read this article?   It tells how to change pitch , volume etc  and how to change the voice http://www.greenbot.com/article/2105862/how-to-get-started-with-google-text-to-speech.html  

What you did about vTron and Veetron   is a real solution ... you would need to write code so that every time the word vTron is encountered, the engine says 'Veetron'     The Android texttospeech engine is not as flexible as Microsofts text to speech which allows you to have an alternative vocabulary and pronunciation guide.   Andorid's text to speech does not have this feature as far as I can tell.

AI2 can only control the following TTS properties:  http://ai2.appinventor.mit.edu/reference/components/media.html#TextToSpeech  

so you are going to have to be creative.


--
Thanks for the response. I had seen most of what you had posted in links, it is the output of the voice as it interprets the words I am most interested in, which doesn't seem to get covered. so it looks as if my hit and miss approach will have to be the way. Changing the pitch sounds weird, but slowing things down a bit does help. If and when I come up with anything else I'll post it back here.

--
As long as you do not have 'special' pronunciations for lots of words  the replace all text (your conversion text) nested (using several one after another) segment vTron replacement Vee Tron might be your only solution. If you had the luxury of using the Microsoft system, you would do essentially the same thing but in their dictionary. It is all a matter of teaching the system how to pronounce words that are not in the Google dictionary.

Good luck with your worthwhile project. Let us know if this works for you.

--
Fortunately it is quite easy for me. The kids will just be tapping a button with a picture on it and the text to speech will issue back the name of the item in the picture. The trick will be to get the word sounds like what it is if it is a funny one. So I don't have to do any special on the fly programming. They/We operate on a one word at a time, mostly without phrasing (too confusing!). Most of our kids do not speak, so this will be their voice. When it works for them it is very gratifying :)

Just run the app on my Nexus 7 tablet and I have a female voice on there :) whilst on my HTC M8 it is male. Go figure....ah just had a search through the settings and I can select a female voice :)  there is a high quality version available for a 273mb download, might try that out?

--
Good luck with the voice(s).

I worked with an electrical engineer doing something similar but also with SpeechRecognition using Windows.   We built hardware and the software to allow a paraplegic to control an electronic device.  It worked quite well 80 percent of the time, so the person was 'happy' with the result.

Since you are coding the words you can just code the sounds like version of the word.    I believe there is a Google site where you can test TTS... that might be easier than experimenting on a phone (to get the synthesizer to say the word correctly).     Yes, you will have to experiment.   This technology has come a long way since I was working heavily in Windows but it is FAR FROM perfect, so the developer has to intervene.

Good luck with your project and work with the youngsters.

--
I've found that simply spelling the words phonetically in global variables helps the 'sound' of a word immensely.  These variable are not seen, only heard - so you don't have to worry about misspelled words.  I put hyphens (-) between syllables to ensure proper pauses where needed in enunciation.  It works great!

--

댓글 없음:

댓글 쓰기