Remix.run Logo
devmor 6 hours ago

Those transcriptions are already done by LLMs in the first place - in fact, audio transcription was one of the very first large scale commercial uses of the technology in its current iteration.

This is just like playing a game of markov telephone where the step in OP's solution is likely higher compute cost than the step YT uses, because YT is interested in minimizing costs.

albertzeyer 3 hours ago | parent [-]

Probably just "regular" LMs, not large LMs, I assume. I assume some LM with 10-100M params or so, which is cheap to use (and very standard for ASR).