Remix clone Hacker News

new | show | ask | jobs Github

	▲	Workaccount2 7 hours ago
		My assumption is that Gemini has no insight into the time stamps, and instead is ballparking it based on how much context has been analyzed up to that point. I wonder if you put the audio into a video that is nothing but a black screen with a timer running, it would be able to correctly timestamp.
	▲	simonw 7 hours ago \| parent \| next [-]
		The Gemini documentation specifically mentions timestamp awareness here: https://ai.google.dev/gemini-api/docs/audio
	▲	minimaxir 7 hours ago \| parent \| prev [-]
		Per the docs, Gemini represents each second of audio as 32 tokens. Since it's a consistent amount, as long as the model is trained to understand the relation between timestamps and the number of tokens (which per Simon's link it does), it should be able to infer the correct amount of seconds.