▲ | leetharris 5 hours ago | |||||||||||||
The Google ASR is one of the worst on the internet. We run benchmarks of the entire industry regularly and the only hyperscaler with a good ASR is Azure. They acquired Nuance for $20b a while ago and they have a solid lead in the cloud space. And to run it on a "free" product they probably use a very tiny, heavily quantized version of their already weak ASR. There's lots and lots of better meeting bots if you don't mind paying or have low usage that works for a free tier. At Rev we give away something like 300 minutes a month. | ||||||||||||||
▲ | jll29 an hour ago | parent | next [-] | |||||||||||||
Interesting. Do you have any peer reviewed scientific publications or technical reports regarding this work? We also compared Amazon, Google, Microsoft Azure as well as a bunch of smaller players (from Edinburgh and Cambridge) and - consistent with what you reported - we also found Google ranked worst - but that was a one-off study from 2019 (unpublished) on financial news. Word Error Rate (WER), the standard metric for the tast, is not everything. For some applications, the ability to upload custom lexicons is paramount (ASR systems that are word-based (almost all) as opposted to phoneme based require each word to be defined ahead of being able to recognize said word). | ||||||||||||||
▲ | an hour ago | parent | prev | next [-] | |||||||||||||
[deleted] | ||||||||||||||
▲ | baxtr 5 hours ago | parent | prev | next [-] | |||||||||||||
Very interesting. Thanks for sharing. Since you have experience in this, I’d like to hear your thoughts on a common assumption. It goes like this: don’t build anything that would be feature for a Hyperscalar because ultimately they win. I guess a lot of it is a question of timing? | ||||||||||||||
| ||||||||||||||
▲ | aftbit 3 hours ago | parent | prev [-] | |||||||||||||
Are there any self-hosted options that are even remotely competitive? I have tried Whisper2 a fair bit, and it seems to work okay in very clean situations, like adding subtitles to movie dialog, but not so well when dealing with multiple speakers or poor audio quality. | ||||||||||||||
|