Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators

▲ Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators(github.com)

43 points by divyaprakash 6 hours ago | 15 comments

▲ Yash16 2 hours ago | parent | next [-]

Can I use this for other use cases instead of game videos? I want to create film-style scenes, cinematic elements, and smooth motion effects. I’m also thinking of deploying it as a SaaS and using it for video creation features in my app: https://picxstudio.com/

	▲	divyaprakash 2 hours ago \| parent [-]
		Definitely. The architecture is modular—just swap the LLM prompts for 'cinematic' styles. It's headless and dockerized, so it fits well as a SaaS backend worker

▲ mpaepper an hour ago | parent | prev | next [-]

How much memory do you need locally? Is a rtx 3090 with 24gb enough?

	▲	divyaprakash an hour ago \| parent [-]
		Yes, more than enough. I have rtx4080 laptop gpu with 12gb vram.

▲ HeartofCPU 4 hours ago | parent | prev | next [-]

It looks like it’s written by a LLM

	▲	divyaprakash 2 hours ago \| parent [-]
		Guilty as charged. I used Antigravity to handle the refactoring and docs so I could stay focused on the CUDA and VRAM orchestration.

▲ myky22 4 hours ago | parent | prev | next [-]

Wow, great job.

I did smth similar 4 years ago with YOLO ultralytics.

Back then I used chat messsges spike as one of several variables to detect highs and fails moments. It needed a lot a human validation but was so fun.

Keep going

	▲	divyaprakash 2 hours ago \| parent [-]
		Great idea. Integrating YOLO for 'Action Following' is high on the roadmap—I'd love a PR for that if you're interested!

▲ divyaprakash 6 hours ago | parent | prev | next [-]

I built this because I was tired of "AI tools" that were just wrappers around expensive APIs with high latency. As a developer who lives in the terminal (Arch/Nushell), I wanted something that felt like a CLI tool and respected my hardware.

The Tech:

    GPU Heavy: It uses decord and PyTorch for scene analysis. I’m calculating action density and spectral flux locally to find hooks before hitting an LLM.

    Local Audio: I’m using ChatterBox locally for TTS to avoid recurring costs and privacy leaks.

    Rendering: Final assembly is offloaded to NVENC.

Looking for Collaborators: I’m currently looking for PRs specifically around:

    Intelligent Auto-Zoom: Using YOLO/RT-DETR to follow the action in a 9:16 crop.

    Voice Engine Upgrades: Moving toward ChatterBoxTurbo or NVIDIA's latest TTS.

It's fully dockerized, and also has a makefile. Would love some feedback on the pipeline architecture!

▲

ramon156 3 hours ago | parent [-]

I don't get this reasoning. You were tired of LLM wrappers, but what is your tool? These two requirements (felt like a CLI and respects your hardware) do not line up.

Still a cool tool though! Although it seems partly AI generated.

▲

fouc 2 hours ago | parent | next [-]

Seems like the post you're replying to has since been edited to clarify that he's referring to the wrappers that rely on third party AI APIs over the internet rather than running locally.

▲

rustyhancock 3 hours ago | parent | prev [-]

I've started including a statement of AI usage in my docs.

HN is a niche audience but it seems like it's the first question everyone has when opening a repo.

Which is odd because the first question we should have is, does it work.

Personally I can't see myself ever writing the bulk of the README again, life's too short.

	▲	divyaprakash 2 hours ago \| parent \| next [-]
		Fair points all around. To be transparent: yes, I used an AI coding assistant (Antigravity) to help with the heavy lifting of refactoring the original legacy code and drafting the README. I’m with @rustyhancock on this—I’d rather focus my brainpower on the pipeline logic and hardware integration than on writing boilerplate and Markdown. However, orchestrating things like decord with CUDA kernels, managing VRAM across parallel processes, and getting audio sync right with local TTS requires a deep understanding of the stack. An LLM can help write a function, but it won't solve the architectural 'glue' needed to make it a reliable CLI tool. The project is open-source precisely because it’s a work in progress. It needs the 'human touch' for things like the RT-DETR auto-zoom and more nuanced video editing logic. PRs are more than welcome—I'd love to see where the community can push this beyond its current state.
	▲	Hamuko 2 hours ago \| parent \| prev [-]
		I think my life's too short to ever read your READMEs.

▲ Huston1992 2 hours ago | parent | prev [-]

big fan of the 'respects my hardware' philosophy. i feel like 90% of ai tools right now are just expensive middleware for openai, so seeing something that actually leverages local compute (and doesn't leak data) is refreshing