Remix clone Hacker News

new | show | ask | jobs Github

	▲	goodroot 7 hours ago
		Ah yeah, longform is interesting. Not sure how you're running it, via whichever "app thing", but... On resource limited machines: "Continuous recording" mode outputs when silence is detected via a configurable threshold. This outputs as you speak in more reasonable chunks; in aggregate "the same output" just chunked efficiently. Maybe you can try hackin' that up?
	▲	LuxBennu 7 hours ago \| parent [-]
		Yeah that makes sense, chunking on silence would sidestep the latency issue pretty cleanly. I've been running it through a basic fastapi wrapper so it just takes whatever audio blob gets thrown at it, no chunking logic on the server side. Might be worth adding a vad pass before sending to whisper though, would cut down on processing dead air too.