▲ | scubbo 11 hours ago | |||||||||||||||||||||||||
How coincidental - I needed exactly this just a couple days ago. I ended up vibecoding a script to feed an individual URL into yt-dlp then pipe the downloaded audio through Whisper - not quite the same thing as it's not downloading the _actual_ subtitles but rather generating its own transcription, but similar. I've only run it on a single video to test, but it seemed to work satisfactorily. I haven't upgraded to bulk processing yet, but I imagine I'd look for some API to get "all URLs for a channel" and then process them in parallel. | ||||||||||||||||||||||||||
▲ | Franklinjobs617 11 hours ago | parent [-] | |||||||||||||||||||||||||
That is some fantastic validation, thank you! It’s cool to hear you already vibecoded a solution for this. You've basically hit on the two main challenges: Transcription Quality vs. Official Subtitles: The Whisper approach is brilliant for videos without captions, but the downside is potential errors, especially with specialized terminology. YTVidHub's core differentiator is leveraging the official (manual or auto-generated) captions provided by YouTube. When accuracy is crucial (like for research), getting that clean, time-synced file is essential. The Bulk Challenge (Channel/Playlist Harvesting): You're spot on. We were just discussing that getting a full list of URLs for a channel is the biggest hurdle against API limits. You actually mentioned the perfect workaround! We tap into that exact yt-dlp capability—passing the channel or playlist link to internally get all the video IDs. That's the most reliable way to create a large batch request. We then take that list of IDs and feed them into our own optimized, parallel extraction system to pull the subtitles only. It's tricky to keep that pipeline stable against YouTube’s front-end changes, but using that list/channel parsing capability is definitely the right architectural starting point for handling bulk requests gracefully. Quick question for you: For your analysis, is the SRT timestamp structure important (e.g., for aligning data), or would a plain TXT file suffice? We're optimizing the output options now and your use case is highly relevant. Good luck with your script development! Let me know if you run into any other interesting architectural issues. | ||||||||||||||||||||||||||
|