Remix.run Logo
novok 3 months ago

You can also download the audio only with yt-dlp and then remake subs with whisper or whatever other model you want. GPU compute wise it will probably be less than asking an llm to try to correct a garbled transcript.

ldenoue 3 months ago | parent | next [-]

The current Flash-8B model I use costs $1 per 500 hours of transcript.

andai 3 months ago | parent [-]

If I read OpenAI's pricing right, then Google's thing is 200 times cheaper?

HPsquared 3 months ago | parent | prev [-]

I suppose the gold standard would be a multimodal model that also looks at the screen (maybe only if the captions aren't making much sense).