▲ | Franklinjobs617 11 hours ago | ||||||||||||||||
That is some fantastic validation, thank you! It’s cool to hear you already vibecoded a solution for this. You've basically hit on the two main challenges: Transcription Quality vs. Official Subtitles: The Whisper approach is brilliant for videos without captions, but the downside is potential errors, especially with specialized terminology. YTVidHub's core differentiator is leveraging the official (manual or auto-generated) captions provided by YouTube. When accuracy is crucial (like for research), getting that clean, time-synced file is essential. The Bulk Challenge (Channel/Playlist Harvesting): You're spot on. We were just discussing that getting a full list of URLs for a channel is the biggest hurdle against API limits. You actually mentioned the perfect workaround! We tap into that exact yt-dlp capability—passing the channel or playlist link to internally get all the video IDs. That's the most reliable way to create a large batch request. We then take that list of IDs and feed them into our own optimized, parallel extraction system to pull the subtitles only. It's tricky to keep that pipeline stable against YouTube’s front-end changes, but using that list/channel parsing capability is definitely the right architectural starting point for handling bulk requests gracefully. Quick question for you: For your analysis, is the SRT timestamp structure important (e.g., for aligning data), or would a plain TXT file suffice? We're optimizing the output options now and your use case is highly relevant. Good luck with your script development! Let me know if you run into any other interesting architectural issues. | |||||||||||||||||
▲ | loveparade 10 hours ago | parent [-] | ||||||||||||||||
I've built something similar before for my own use cases and one thing I'd push back on are official subtitles. Basically no video I care about has ever had "official" subtitles and the auto generated subtitles are significantly worse than what you get by piping content through an LLM. I used Gemini because it was the cheapest option and still did very well. The biggest challenge with this approach is that you probably need to pass extra context to LLMs depending on the content. If you are researching a niche topic, there will be lots of mistakes if the audio isn't if high quality because that knowledge isn't in the LLM weights. Another challenge is that I often wanted to extract content from live streams, but they are very long with lots of pauses, so I needed to do some cutting and processing on the audio clips. In the app I built I would feed an RSS feed of video subscriptions in, and at the other end a fully built website with summaries, analysis, and transcriptions comes out that is automatically updated based on the youtube subscription rss feed. | |||||||||||||||||
|