▲ | IanCal 3 months ago | |||||||
Probably quite expensive over the whole catalog but the Berkley content would be cheap to do. If it's, say, 5000 hours then through the best model at assembly.ai with no discounts it's cost less than $2000. I know someone could do whisper for cheaper, and there likely would be discounts at this rate but worst case it seems very doable even for an individual. | ||||||||
▲ | ldenoue 3 months ago | parent | next [-] | |||||||
My repo doesn't re process the audio track: instead it makes the raw ASR text transcript better by feeding it additional info (title and description) and asking the LLM to fix errors. It is not perfect, it'd sometimes replace words with a synonym, but it is much faster and cheaper. The low cost of Gemini 1.5 Flash-8B costs $1 per 500 hours of transcript. | ||||||||
▲ | ei23 3 months ago | parent | prev [-] | |||||||
With a RTX4090 and insanly-fast-whisper on whisper-large-v3-turbo (see Whisper-WebUI for easy testing) you can transscribe 5000h on consumer hardware in about 50h with timestamps. So, yeah. I also know someone. | ||||||||
|