Probably quite expensive over the whole catalog but the Berkley content would be cheap to do.

If it's, say, 5000 hours then through the best model at assembly.ai with no discounts it's cost less than $2000. I know someone could do whisper for cheaper, and there likely would be discounts at this rate but worst case it seems very doable even for an individual.

▲

ldenoue 7 months ago | parent | next [-]

My repo doesn't re process the audio track: instead it makes the raw ASR text transcript better by feeding it additional info (title and description) and asking the LLM to fix errors.

It is not perfect, it'd sometimes replace words with a synonym, but it is much faster and cheaper.

The low cost of Gemini 1.5 Flash-8B costs $1 per 500 hours of transcript.

▲

ei23 7 months ago | parent | prev [-]

With a RTX4090 and insanly-fast-whisper on whisper-large-v3-turbo (see Whisper-WebUI for easy testing) you can transscribe 5000h on consumer hardware in about 50h with timestamps. So, yeah. I also know someone.

	▲	IanCal 7 months ago \| parent [-]
		I can also run this all locally, my point was more that at the worst right now the most advanced model (afaik, I'm not personally benchmarking) paid for at the headline rates, for a huge content library, costs such a reasonable amount that an individual can do it. I've donated more to single charities than this would cost, while it's not an insignificant sum it's a "find one person who cares enough" level problem. Grabbing the audio from thousands of hours of video, or even just managing getting the content from wherever it's stored, is probably more of an issue than actually creating the transcripts. If anyone reading this has access to the original recordings, this is a pretty great time to get transcriptions.