Remix.run Logo
ei23 a year ago

With a RTX4090 and insanly-fast-whisper on whisper-large-v3-turbo (see Whisper-WebUI for easy testing) you can transscribe 5000h on consumer hardware in about 50h with timestamps. So, yeah. I also know someone.

IanCal a year ago | parent [-]

I can also run this all locally, my point was more that at the worst right now the most advanced model (afaik, I'm not personally benchmarking) paid for at the headline rates, for a huge content library, costs such a reasonable amount that an individual can do it. I've donated more to single charities than this would cost, while it's not an insignificant sum it's a "find one person who cares enough" level problem.

Grabbing the audio from thousands of hours of video, or even just managing getting the content from wherever it's stored, is probably more of an issue than actually creating the transcripts.

If anyone reading this has access to the original recordings, this is a pretty great time to get transcriptions.