Remix clone Hacker News

new | show | ask | jobs Github

	▲	rahimnathwani 7 hours ago
		For this use case, why not use Whisper to transcribe the audio, and then an LLM to do a second step (summarization or answering questions or whatever)? If you need diarization, you can use something like https://github.com/m-bain/whisperX
	▲	pants2 7 hours ago \| parent \| next [-]
		Whisper simply isn't very good compared to LLM audio transcription like gpt-4o-transcribe. If Gemini 3 is even better it's a game-changer.
	▲	crazysim 7 hours ago \| parent \| prev [-]
		Since Gemini seems to be sucking at timestamps, perhaps Whisper can be used to help ground that as an additional input alongside the audio.