2017년 6월 27일 화요일

Google Releases Huge Text-To-Speech (TTS) For Android Update v3.0

I love Update Wednesdays, and today we've already seen pretty decent updates to several Google apps. As you've already seen, Google Play Games was updated to v1.5, but the one I'm excited about the most is, without a doubt, Google TTS v3.0, which made a jump today from v2.4.

What’s New
So, what's so cool about TTS 3.0?

High-quality voices

First and foremost, Text-to-Speech 3.0 adds support for high quality voices. Just how drastic the difference is in day-to-day usage in various apps remains to be seen, but as far as file sizes go, we're talking about a jump from 5-6MB to 200+MB. The female English UK voice is a whopping 276MB.

I shot a quick video to demonstrate the difference between English (US) and the new high-quality English (US) as well as the high-quality English UK. The new voices aren't as robotic and don't seem as high-pitched, but don't expect to be totally blown away here:

Update: I have tested the new high-quality languages in Google Search, and the results are mixed. Responses vary based on the questions asked - for example, "what is 3+2" seems to be streaming speech from the server, and while it does honor the language you choose in Search's settings, it doesn't use local TTS at all. I had male English UK selected, but the result came back in female English UK.
However, something like "what is my next appointment" does use the new voices, with one pretty big caveat - as it turns out, loading such large voice packs could take up to 10 seconds even on a Nexus 5, which means the voice output could be severely delayed the first time you run Search after a reboot. Google has a lot of room for optimizations here, that's for sure.
Oh, and for whatever reason, Google Maps ignores the TTS settings altogether, it seems.

English (UK) male voice
Did you notice in the screenshot above that English UK is no longer limited to the default female voice? You can now select regular or high-quality variants of both female and male voices.

New languages: Portuguese and Spanish (United States)
The list of supported languages is going up - v3.0 adds support for Portuguese (Brazil) and Spanish (US).

UI tweaks
Thanks to the addition of multiple voices per language, a toggle has been added to each voice with 2 or more downloaded voice packs. This way you can switch between them on the fly.
The previous UI was much less flexible and only allowed downloading and deleting of 5 languages, as demonstrated below. As you can see, the Delete button has been replaced with a more elegant trash can.
As you probably already noticed, voice pack sizes are now visible, whereas the previous UI simply offered a Download button without any extra info.

 Left: Google TTS 2.4. Right: Google TTS 3.0

If you notice anything else I may have missed, don't hesitate to let me know. I also welcome any feedback regarding the quality of new high-quality voices in real-world applications. Do you hear a big difference? Are you psyched about the sexy male English UK voice? Let it all out.

The APK is signed by Google and upgrades your existing app. The cryptographic signature guarantees that the file is safe to install and was not tampered with in any way. Rather than wait for Google to push this download to your devices, which can take days, download and install it just like any other APK.

File namecom.google.android.tts-

Version: (Android 4.0+).
MD5: d66ccbd9674affc62cebce5ffb201717.

The 7 Best Android Text-To-Speech Engines

One of the coolest things about Android is how you can plug'n'play different components. There are numerous different web browsers, email clients, music players, and text-to-speech (TTS) engines. At iHear Network, we get an inside look into what TTS engines people are using. What we have seen is roughly what you would expect, that people with better TTS engines listen to more content than the people with the default or worse engines.

I have personally used a number of different engines and voices. I evaluate a voice by the following characteristics:

⦁ Sonic Quality - Does it have a high enough resolution sample to be believable?
⦁ Meter - Is the pace of speaking consistent with what you would expect?
⦁ Variance - Does the voice account for any difference based on the inflection of the word in context? This is generally pretty hard to do, so voices that excel here get a few bonus points, but it can be taken too far (see below).

To put a numeric value on the quality of a voice, we can compare the average number of plays per user for a given engine vs the average across all engines to obtain a relative playability score. Here are how they stack up:

[Update: The total for IVONA is now correct. Thanks for reporting that.]

IVONA Text-to-Speech HQ (FREE - ~250MB/voice)
Quality: 10  Meter: 8  Variance: 8  Playability: 7  Total: 33

Ivona voices are some of the best sounding voices, in my opinion, especially considering that they are all currently free. They seem to have more variations on how it says certain phrases, which makes reading longer articles more digestible and less robotic. It achieves this with a much larger data package that you need to download, which we recommend you do over WiFi to avoid extra data charges. Some people complain about Ivona being slow to start talking and/or that they experience delays during speech. I suspect this relates to how big the data set is and how their engine might synthesize speech. iHear Network has advanced speech caching to address this issue and deliver a very fluid experience with all engines, so your experience with IVONA may vary with other apps.

Classic Text To Speech Engine ($2.99 - ~10-20MB/voice)
Quality: 7  Meter: 8  Variance: 7  Playability: 10  Total: 32

Covers the most languages of any engine (40+). They also have a few different voices for English. Victoria is my favorite. She has great meter and enough variance to keep it non-robotic. The sonic quality is decent, but not the best. One thing we noticed while working with SVOX voices is that it is actually very hard to switch between different variants for a particular language-country combo. iHear Network has deeply integrated with SVOX to provide the best TTS experience. SVOX users listened to over twice as many articles as IVONA users, and over 10 times as many compared to most other engines.

CereProc (~$1.60 - ~140-170MB/voice)
Quality: 9  Meter: 9  Variance: 8  Playability: N/A  Total: *26*

I've tried the Idyacy Dodo Glasgow voice, and it's nearly incomprehensible. They also have Dog and Pig Latin. I'm not joking, you can really have your phone bark at you. However, the premium voices for CereProc are based on years of development for desktop deployments and they now have some of those voices available for Android. The download for Caitlin is 141MB, but automatically downloads to the SD card (at least on 2.3+). Caitlin is about the same quality as SVOX Victoria UK and close to Ivona Amy UK. I also checked out Adam (171MB, $1.60), and it is the most natural sounding English US voice that I have heard yet. Ivona's Kendra US voice is decent, as is SVOX's Grace voice, but Adam seems the most human to me, both in terms of meter and variance, with top notch sound quality.
*NOTE* CereProc doesn't have enough users to have a very meaningful Playability score, but I would expect it to be in the 8-10 range, which would put its total up to the 34-36 range.

Loquendo TTS Susan (~$5.36 - ~42MB)
Quality: 7  Meter: 6  Variance: 10  Playability: 4  Total: 27

Susan is an interesting voice. She is pretty emotional, which makes her a bit more human, but sometimes it goes over the top. Susan will convert certain emoticons into laughing, crying, and a couple of other sounds, which is unique to Loquendo. However, we've found that on longer text, she tends to get a little jumbled up. This might be due to some minor instability in the underlying engine. I'd be curious to see what more people think about this engine. Considering the high price of Loquendo, I would expect much more.

Google TTS (Free, limited availability)
Quality: 5  Meter: 6  Variance: 5  Playability: 6  Total: 22

This engine comes with some devices and is ironically not available on Google Play. I have a Samsung Tab 2 7.0 that came with some of these voices and they seem to be the new replacement for Pico voices. The british voice is male, which is an improvement over the old Pico UK voice, but we were hoping for something better on ICS/Samsung.

SVOX Pico (Free, limited availability)
Quality: 5  Meter: 7  Variance: 5  Playability: 4  Total: 21

This is the default voice installed on most Android devices (78% of iHear Network's user base), and has a few drawbacks. First, it might not have the TTS data installed when you first get your phone. This often leads to a case where the worst possible voice is used as a default and no international options are available. Second, the pico voices, at least for English, are pretty robotic sounding, especially the US one. For this reason, we typically default to the GB voice on new installs. Notice that Pico was developed by SVOX, after which they started building and selling premium voices that are branded as SVOX Classic.

Samsung TTS (FREE on Samsung devices)
Quality: 6  Meter: 6  Variance: 5  Playability: 4  Total: 21

Our tests with the Samsung voices show that for some reason, the volume of the voice drops way down. Even if it wasn't buggy, it still doesn't sound any better than Pico or Google TTS.

ESpeak (Not one of the top 7)
Quality: 2  Meter: 3  Variance: 1  Playability: N/A  Total: 6

This is possibly the worst voice I've ever heard. Switching between variants has always been a problem with Android's TTS API, but I know a lot of tricks to get it to work, and even I had problems with this engine. ESpeak sounds worse than old school Stephen Hawking. I tried to download EasyTts, which supposedly is an upgrade of ESpeak, but it just sounded like Pico US.

In summary, it's really hard to beat Ivona, considering that it is currently free, unless you don't want to download the 250+ MB data packages. CereProc voices seem to be the best value for the money of all the pay voices, and it will be very interesting to see if they can catch up with SVOX in popularity. Considering that you get as good or better experience for almost half the price, I can't see why not. I look forward to the day when all android devices come with a nice premium voice as a default, but until then, there are some good options available on Google Play.

Don't forget to check out iHear Network for a great Social News Narrator that reads Twitter, Facebook, and Pocket to you out loud.

See Also:

How to show a list of names & when one is clicked it shows an image

Just dabble with AI2, have followed some tutorials. Wondering what the blocks would be to make something where I have a screen, I have a list of names on it, and when I click a name it shows that person's photo. Set up a vertical arrangement (2 rows, 1 column) with 2 horizontal scrolls inside. In these I'm putting buttons; each one has a person's name (ex: Joe, Bob, Tom).

Bestow your wisdom upon me, internet!

It might be more appropriate to use the ListView rather than try to build your own custom list mechanism. In the ListView.AfterSelection event you can then set the Image.Source property of an Image object to the right photo for the given selection.

Use a list of names with spinner, ListPicker or Listview
then after selecting a name, use that value to set the image in an image component that corresponds to the persons name

Either upload the images as assets to the app (a good way to start), or better have them online somewhere and call them in as needed

Ok, that makes sense. I was originally going to make a home screen with a list of names and then a screen for each name (but I guess having 40 screens is a bad idea). Can anyone maybe get me started on what the blocks would look like?

Ok, I've got those blocks and that gives me a list, with names, that I can choose from. Now how do I get selecting a name to bring up a specific image?

I've got this so far. The list itself works but I can't get it to show the image when I pick the name. Help?

Here is the easiest way to do it:

Name your images the same as your "Names" (Bob > Bob.png, Bill > Bill.png, etc.)
Upload images to your project as assets


Blocks like this
Aia attached

There are many many different ways to achieve the same thing!

Ok, Tim, you are awesome. I have a bunch of questions.

Why initialize global names to an empty list?
I was missing the top two blocks (the empty list one and the Screen initialize). Don't really know what they do for this app.
ListPicker before picking, I understand that. You want to call up the names.
ListPicker after picking. Yep, I want the label's text to be the name chosen from the list.

I don't understand setting the image's picture to join the list picker's selection. It looks like it means the list picker's selection is tied to that specific image that I put in. If I have an image for every name, how do I get those to show up too?

Also, I subbed in my own names and images (in .png) and when I ran the emulator the image didn't show.

I got it!!! The join is about the names being the name, so the block just says .png. I was thinking I had to put in the specific image name and then make twenty other blocks or something. Ok, I got it to work. Amazing!

So what are some of the other ways to achieve the same result? I'm on a roll now.

Nope, confused again.
I have a space between the first and last names. So I do the same in the image names (ie "Joe Schmo.png"). I uploaded it and it removed the spaces. So to get the image to show with the name, I have to alter the name in the list so it doesn't have the space either.
How can I preserve the space? I don't really want to have a dash or underscore on the name.

OK. AI2 does that to any filenames with spaces (it is a good thing really!)

To overcome this in your app:

Names in list:

Bob Builder
BIll Flowerpot
Ben Flowerpot

Names of image files as uploaded to assets


Blocks magic:

Using the replace block you simply remove the space between the first name and last name as returned by the Listpicker. As indicated above "segment" has a space in it (one press of spacebar). AI2 will replace this space with "nothing"

Works like a charm! What if I wanted to have more text after the name, on the next line, for example. How would I tie a separate paragraph or phone numbers to each person?

In essence you have all the building blocks in place already to do this

Add another label below the "name" label
Create another list with your additional texts
Use the index feature of the lists to select the correct item to add to the new label, e.g Bobs' name = index 1 in the names list, Bob's description = index 1 of the additional texts list

Have a go!

In theory I understand. You use the index as a thing to tie two things in different lists together. I need help with blocks though. I see a bunch of options under lists involving indexes, and I'm not sure how I tie two lists together. I'm guessing it's not replace or remove but that's about it.

Also, I get this every time I try to start the emulator.

"Launch the aiStarter program on your computer and then try again."

No clue why. It worked a few days ago. I've tried launching aiStarter then connecting with the emulator. Tried resetting the connection, reopening aiStarter, etc. Pretty frustrating.

Never mind, killed adb.exe and it worked. Having a weird thing happen though. I made an app just like the first directory sort of thing, just with different names and photos. For some reason when I run the app and click the button to show the list, nothing comes up. The original app still works but I don't know why this one won't. Blocks are the same.

These blocks show how

Select list item: Attempt to get item number 13 of a list of length 0: ()<br/><i>Note:</i>&nbsp;You will not see another error reported for 5 seconds.

Alright, apps are good to go! I want to let several people download the app, but for privacy I'm not putting it on google play. I was planning on putting it on a website hosted through wix. I downloaded my .apk but wix doesn't support uploading .apk files into the media storage there. (If I wanted to do the same with a pdf I would just upload the file to the media storage and then add it to the page. Then people could view or download it.)

How can I get this in some shareable form on my wix site? I can't use the qr code option because not everyone will be downloading the app in the 2 hour limit. Perhaps I could package it in a zip file? I'd rather not do this because everyone will have to open all of it on their phones.

Have you tried uploading the .apk file to your Google Drive account and marking it shared to any one with the URL?

Then send your friends the URL, and two hours later turn off the sharing.