▲ | devmor 6 hours ago | |
Those transcriptions are already done by LLMs in the first place - in fact, audio transcription was one of the very first large scale commercial uses of the technology in its current iteration. This is just like playing a game of markov telephone where the step in OP's solution is likely higher compute cost than the step YT uses, because YT is interested in minimizing costs. | ||
▲ | albertzeyer 3 hours ago | parent [-] | |
Probably just "regular" LMs, not large LMs, I assume. I assume some LM with 10-100M params or so, which is cheap to use (and very standard for ASR). |