Remix.run Logo
hunter2_ 3 months ago

With this context, it seems as though correction-by-LLM might be a net win among your Deaf/HoH friends even if it would be a net loss for you, since you're able to correct on the fly better than an LLM probably would, while the opposite is more often true for them, due to differences in experience with phonetics?

Soundex [0] is a prevailing method of codifying phonetic similarity, but unfortunately it's focused on names exclusively. Any correction-by-LLM really ought to generate substitution probabilities weighted heavily on something like that, I would think.

[0] https://en.wikipedia.org/wiki/Soundex

novok 3 months ago | parent | next [-]

You can also download the audio only with yt-dlp and then remake subs with whisper or whatever other model you want. GPU compute wise it will probably be less than asking an llm to try to correct a garbled transcript.

ldenoue 3 months ago | parent | next [-]

The current Flash-8B model I use costs $1 per 500 hours of transcript.

andai 3 months ago | parent [-]

If I read OpenAI's pricing right, then Google's thing is 200 times cheaper?

HPsquared 3 months ago | parent | prev [-]

I suppose the gold standard would be a multimodal model that also looks at the screen (maybe only if the captions aren't making much sense).

schrodinger 3 months ago | parent | prev | next [-]

I'd assume Soundex is too basic and English-centric to be a practical solution for an international company like Google. I was taught it and implemented it in a freshman level CS course in 2004, it can't be nearly state of the art!

shakna 3 months ago | parent | prev [-]

Soundex is fast, but inaccurate. It only prevails, because of the computational cost of things like levenshtein distance.