Remix.run Logo
kelvinjps 4 hours ago

Google should have the needed tech for good AI transcription, why the don't integrate them in their auto-captioning? and instead the offer those crappy auto subtitles

briga 4 hours ago | parent | next [-]

Are they crappy though? Most of the time it gets things right, even if they aren't as accurate as a human. And sure, they probably have better techniques for this, but are they cost-effective to run at YouTube-scale? I think their current solution is good enough for most purposes, even if it isn't perfect

InsideOutSanta 3 hours ago | parent | next [-]

I'm watching YouTube videos with subtitles for my wife, who doesn't speak English. For videos on basic topics where people speak clear, unaccented English, they work fine (i.e. you usually get what people are saying). If the topic is in any way unusual, the recording quality is poor, or people have accents, the results very quickly turn into a garbled mess that is incomprehensible at best, and misleading (i.e. the subtitles seem coherent, but are wrong) at worst.

wahnfrieden 3 hours ago | parent | prev [-]

Japanese auto captions suck

summerlight an hour ago | parent | prev [-]

YT is using USM, which is supposed to be their SOTA ASR model. Gemini have much better linguistic knowledge, but it's likely prohibitively expensive to be used on all YT videos uploaded everyday. But this "correction" approach seems to be a nice cost-effective methodology to apply LLM indeed.