Remix.run Logo
devmor 10 months ago

Those transcriptions are already done by LLMs in the first place - in fact, audio transcription was one of the very first large scale commercial uses of the technology in its current iteration.

This is just like playing a game of markov telephone where the step in OP's solution is likely higher compute cost than the step YT uses, because YT is interested in minimizing costs.

albertzeyer 10 months ago | parent [-]

Probably just "regular" LMs, not large LMs, I assume. I assume some LM with 10-100M params or so, which is cheap to use (and very standard for ASR).

devmor 10 months ago | parent [-]

Could be. I ran through some offline LMs for voice assisted home automation a couple years ago and they were subpar compared to even the pathetic offering that Youtube provides - but Google of course has much more focused resources to fine tune a small dataset model.