I’m currently building YTVidHub—a tool that focuses on solving a very specific, repetitive workflow pain for researchers and content analysts.

The Pain Point: If you are analyzing a large YouTube channel (e.g., for language study, competitive analysis, or data modeling), you often need the subtitle files for 50, 100, or more videos. The current process is agonizing: copy-paste URL, click, download, repeat dozens of times. It's a massive time sink.

My Solution: YTVidHub is designed around bulk processing. The core feature is a clean interface where you can paste dozens of YouTube URLs at once, and the system intelligently extracts all available subtitles (including auto-generated ones) and packages them into a single, organized ZIP file for one-click download.

Target Users: Academic researchers needing data sets, content creators doing competitive keyword analysis, and language learners building large vocabulary corpora.

The architecture challenge right now is optimizing the backend queuing system for high-volume, concurrent requests to ensure we can handle large batches quickly and reliably without hitting rate limits.

It's still pre-launch, but I'd love any feedback on this specific problem space. Is this a pain point you've encountered? What's your current workaround?

▲

scubbo 11 hours ago | parent | next [-]

How coincidental - I needed exactly this just a couple days ago. I ended up vibecoding a script to feed an individual URL into yt-dlp then pipe the downloaded audio through Whisper - not quite the same thing as it's not downloading the _actual_ subtitles but rather generating its own transcription, but similar. I've only run it on a single video to test, but it seemed to work satisfactorily.

I haven't upgraded to bulk processing yet, but I imagine I'd look for some API to get "all URLs for a channel" and then process them in parallel.

▲

Franklinjobs617 11 hours ago | parent [-]

That is some fantastic validation, thank you! It’s cool to hear you already vibecoded a solution for this.

You've basically hit on the two main challenges:

Transcription Quality vs. Official Subtitles: The Whisper approach is brilliant for videos without captions, but the downside is potential errors, especially with specialized terminology. YTVidHub's core differentiator is leveraging the official (manual or auto-generated) captions provided by YouTube. When accuracy is crucial (like for research), getting that clean, time-synced file is essential.

The Bulk Challenge (Channel/Playlist Harvesting): You're spot on. We were just discussing that getting a full list of URLs for a channel is the biggest hurdle against API limits.

You actually mentioned the perfect workaround! We tap into that exact yt-dlp capability—passing the channel or playlist link to internally get all the video IDs. That's the most reliable way to create a large batch request. We then take that list of IDs and feed them into our own optimized, parallel extraction system to pull the subtitles only.

It's tricky to keep that pipeline stable against YouTube’s front-end changes, but using that list/channel parsing capability is definitely the right architectural starting point for handling bulk requests gracefully.

Quick question for you: For your analysis, is the SRT timestamp structure important (e.g., for aligning data), or would a plain TXT file suffice? We're optimizing the output options now and your use case is highly relevant.

Good luck with your script development! Let me know if you run into any other interesting architectural issues.

▲

loveparade 10 hours ago | parent [-]

I've built something similar before for my own use cases and one thing I'd push back on are official subtitles. Basically no video I care about has ever had "official" subtitles and the auto generated subtitles are significantly worse than what you get by piping content through an LLM. I used Gemini because it was the cheapest option and still did very well.

The biggest challenge with this approach is that you probably need to pass extra context to LLMs depending on the content. If you are researching a niche topic, there will be lots of mistakes if the audio isn't if high quality because that knowledge isn't in the LLM weights.

Another challenge is that I often wanted to extract content from live streams, but they are very long with lots of pauses, so I needed to do some cutting and processing on the audio clips.

In the app I built I would feed an RSS feed of video subscriptions in, and at the other end a fully built website with summaries, analysis, and transcriptions comes out that is automatically updated based on the youtube subscription rss feed.

▲

Franklinjobs617 7 hours ago | parent [-]

This is amazing feedback, thanks for sharing your deep experience with this problem space. You've clearly pushed past the 'download' step into true content analysis.

You've raised two absolutely critical architectural points that we're wrestling with:

Official Subtitles vs. LLM Transcription: You are 100% correct about auto-generated subs being junk. We view official subtitles as the "trusted baseline" when available (especially for major educational channels), but your experience with Gemini confirms that an optimized LLM-based transcription module is non-negotiable for niche, high-value content. We're planning to introduce an optional, higher-accuracy LLM-powered transcription feature to handle those cases where the official subs don't exist, specifically addressing the need to inject custom context (e.g., topic keywords) to improve accuracy on technical jargon.

The Automation Pipeline (RSS/RAG): This is the future. Your RSS-to-Website pipeline is exactly what turns a utility into a Research Engine. We want YTVidHub to be the first mile of that process. The challenges you mentioned—pre-processing long live stream audio—is exactly why our parallel processing architecture needs to be robust enough to handle the audio extraction and cleaning before the LLM call.

I'd be genuinely interested in learning more about your approach to pre-processing the live stream audio to remove pauses and dead air—that’s a huge performance bottleneck we’re trying to optimize. Any high-level insights you can share would be highly appreciated!

	▲	loveparade 3 hours ago \| parent [-]
		For the long videos I just relied in ffmpeg to remove silence. It has lots of options for it, but you may need to fiddle with the parameters to make it work. I ended up with something like: ``` stream = ffmpeg.filter( stream, 'silenceremove', detection='rms', start_periods=1, start_duration=0, start_threshold='-40dB', stop_periods=-1, stop_duration=0.15, stop_threshold='-35dB', stop_silence=0.15 ) ```

▲

langitbiru 10 hours ago | parent | prev [-]

I did consider building a tool like this before I pivot to something else. I'm learning materials in Chinese Mandarin language from a YouTube playlist. NotebookLLM doesn't support Chinese language yet so you must make sure your app supports Chinese Mandarin so I can use it. :)

A way to find specific materials would be nice. Think of converting the whole playlist into something like RAG then you can search anything from this playlist.

▲

Franklinjobs617 10 hours ago | parent [-]

Wow, thanks for this validation! Hearing from someone who almost built the solution themselves confirms we’re on the right track.

You hit the nail on the head regarding language support.

Mandarin/Multilingual Support: Absolutely, supporting a wide range of languages—especially Mandarin—is a top priority. Since we focus on extracting the official subtitles provided by YouTube, the language support is inherently tied to what the YouTube platform offers. We just need to ensure our system correctly parses and handles those specific Unicode character sets on the backend. We'll make sure CJK (Chinese, Japanese, Korean) languages are handled cleanly from Day 1.

The RAG/Semantic Search Idea: That is an excellent feature suggestion and exactly where I see the tool evolving! Instead of just giving the user a zip file of raw data, the true value is transforming that data into a searchable corpus. The idea of using RAG to search across an entire playlist/channel transcript is something we're actively exploring as a roadmap feature, turning the tool from a downloader into a Research Assistant.

Thanks for the use case and the specific requirements! It helps us prioritize the architecture.

▲

langitbiru 7 hours ago | parent [-]

> Since we focus on extracting the official subtitles provided by YouTube, the language support is inherently tied to what the YouTube platform offers.

You can use video understanding from Gemini LLM models to extract subtitles even the video doesn't have official subtitles. That's expensive for sure. But you should provide this option to willing users. I think.

	▲	Franklinjobs617 6 hours ago \| parent [-]
		That is a fantastic point, and you've perfectly articulated the core trade-off we're facing: Accuracy vs. Cost. You are 100% right. For the serious user (researcher, data analyst, etc.) the lack of an official subtitle is a non-starter. Relying solely on official captions severely limits the available corpus. The suggestion to use powerful models like Gemini for high-accuracy, custom transcription is excellent, but as you noted, the costs can spiral quickly, especially with bulk processing of long videos. Here is where we are leaning for the business model: We are committed to keeping the Bulk Download of all YouTube-provided subtitles free, but we must implement a fair-use limit on the number of requests per user to manage the substantial bandwidth and processing costs. We plan to introduce a "Pro Transcription" tier for those high-value, high-volume use cases. This premium tier would cover: Unlimited/High-Volume Bulk Requests. LLM-Powered Transcription: Access to the high-accuracy models (like the ones you mentioned) with custom context injection, bypassing the "no official subs" problem entirely—and covering the heavy processing costs. We are currently doing market research on fair pricing for the Pro tier. Your input helps us frame the value proposition immesnely. Thank you for pushing us on this critical commercial decision!