2017년 6월 27일 화요일

The 7 Best Android Text-To-Speech Engines


One of the coolest things about Android is how you can plug'n'play different components. There are numerous different web browsers, email clients, music players, and text-to-speech (TTS) engines. At iHear Network, we get an inside look into what TTS engines people are using. What we have seen is roughly what you would expect, that people with better TTS engines listen to more content than the people with the default or worse engines.


I have personally used a number of different engines and voices. I evaluate a voice by the following characteristics:


⦁ Sonic Quality - Does it have a high enough resolution sample to be believable?
⦁ Meter - Is the pace of speaking consistent with what you would expect?
⦁ Variance - Does the voice account for any difference based on the inflection of the word in context? This is generally pretty hard to do, so voices that excel here get a few bonus points, but it can be taken too far (see below).

To put a numeric value on the quality of a voice, we can compare the average number of plays per user for a given engine vs the average across all engines to obtain a relative playability score. Here are how they stack up:

[Update: The total for IVONA is now correct. Thanks for reporting that.]

IVONA Text-to-Speech HQ (FREE - ~250MB/voice)
Quality: 10  Meter: 8  Variance: 8  Playability: 7  Total: 33

Ivona voices are some of the best sounding voices, in my opinion, especially considering that they are all currently free. They seem to have more variations on how it says certain phrases, which makes reading longer articles more digestible and less robotic. It achieves this with a much larger data package that you need to download, which we recommend you do over WiFi to avoid extra data charges. Some people complain about Ivona being slow to start talking and/or that they experience delays during speech. I suspect this relates to how big the data set is and how their engine might synthesize speech. iHear Network has advanced speech caching to address this issue and deliver a very fluid experience with all engines, so your experience with IVONA may vary with other apps.


Classic Text To Speech Engine ($2.99 - ~10-20MB/voice)
Quality: 7  Meter: 8  Variance: 7  Playability: 10  Total: 32


Covers the most languages of any engine (40+). They also have a few different voices for English. Victoria is my favorite. She has great meter and enough variance to keep it non-robotic. The sonic quality is decent, but not the best. One thing we noticed while working with SVOX voices is that it is actually very hard to switch between different variants for a particular language-country combo. iHear Network has deeply integrated with SVOX to provide the best TTS experience. SVOX users listened to over twice as many articles as IVONA users, and over 10 times as many compared to most other engines.


CereProc (~$1.60 - ~140-170MB/voice)
Quality: 9  Meter: 9  Variance: 8  Playability: N/A  Total: *26*

I've tried the Idyacy Dodo Glasgow voice, and it's nearly incomprehensible. They also have Dog and Pig Latin. I'm not joking, you can really have your phone bark at you. However, the premium voices for CereProc are based on years of development for desktop deployments and they now have some of those voices available for Android. The download for Caitlin is 141MB, but automatically downloads to the SD card (at least on 2.3+). Caitlin is about the same quality as SVOX Victoria UK and close to Ivona Amy UK. I also checked out Adam (171MB, $1.60), and it is the most natural sounding English US voice that I have heard yet. Ivona's Kendra US voice is decent, as is SVOX's Grace voice, but Adam seems the most human to me, both in terms of meter and variance, with top notch sound quality.
*NOTE* CereProc doesn't have enough users to have a very meaningful Playability score, but I would expect it to be in the 8-10 range, which would put its total up to the 34-36 range.

Loquendo TTS Susan (~$5.36 - ~42MB)
Quality: 7  Meter: 6  Variance: 10  Playability: 4  Total: 27

Susan is an interesting voice. She is pretty emotional, which makes her a bit more human, but sometimes it goes over the top. Susan will convert certain emoticons into laughing, crying, and a couple of other sounds, which is unique to Loquendo. However, we've found that on longer text, she tends to get a little jumbled up. This might be due to some minor instability in the underlying engine. I'd be curious to see what more people think about this engine. Considering the high price of Loquendo, I would expect much more.

Google TTS (Free, limited availability)
Quality: 5  Meter: 6  Variance: 5  Playability: 6  Total: 22

This engine comes with some devices and is ironically not available on Google Play. I have a Samsung Tab 2 7.0 that came with some of these voices and they seem to be the new replacement for Pico voices. The british voice is male, which is an improvement over the old Pico UK voice, but we were hoping for something better on ICS/Samsung.

SVOX Pico (Free, limited availability)
Quality: 5  Meter: 7  Variance: 5  Playability: 4  Total: 21

This is the default voice installed on most Android devices (78% of iHear Network's user base), and has a few drawbacks. First, it might not have the TTS data installed when you first get your phone. This often leads to a case where the worst possible voice is used as a default and no international options are available. Second, the pico voices, at least for English, are pretty robotic sounding, especially the US one. For this reason, we typically default to the GB voice on new installs. Notice that Pico was developed by SVOX, after which they started building and selling premium voices that are branded as SVOX Classic.

Samsung TTS (FREE on Samsung devices)
Quality: 6  Meter: 6  Variance: 5  Playability: 4  Total: 21

Our tests with the Samsung voices show that for some reason, the volume of the voice drops way down. Even if it wasn't buggy, it still doesn't sound any better than Pico or Google TTS.



ESpeak (Not one of the top 7)
Quality: 2  Meter: 3  Variance: 1  Playability: N/A  Total: 6


This is possibly the worst voice I've ever heard. Switching between variants has always been a problem with Android's TTS API, but I know a lot of tricks to get it to work, and even I had problems with this engine. ESpeak sounds worse than old school Stephen Hawking. I tried to download EasyTts, which supposedly is an upgrade of ESpeak, but it just sounded like Pico US.

In summary, it's really hard to beat Ivona, considering that it is currently free, unless you don't want to download the 250+ MB data packages. CereProc voices seem to be the best value for the money of all the pay voices, and it will be very interesting to see if they can catch up with SVOX in popularity. Considering that you get as good or better experience for almost half the price, I can't see why not. I look forward to the day when all android devices come with a nice premium voice as a default, but until then, there are some good options available on Google Play.

Don't forget to check out iHear Network for a great Social News Narrator that reads Twitter, Facebook, and Pocket to you out loud.

See Also:
http://www.addictivetips.com/mobile/ivona-is-high-quality-alternative-to-stock-android-text-to-speech-engine/
http://www.androidcentral.com/android-app-review-svox-classic-tts-engine
http://www.androidpolice.com/2011/11/14/loquendos-tts-engines-are-miles-ahead-of-others-including-picotts/


댓글 없음:

댓글 쓰기