Remix clone Hacker News

new | show | ask | jobs Github

	▲	minimaxir 5 hours ago
		The actual token calculations with input videos for Gemini 3 Pro is...confusing. https://ai.google.dev/gemini-api/docs/media-resolution
	▲	pseudosavant 3 hours ago \| parent [-]
		That is because it isn't actually tokens that are fed into the model for non-text. For text, it is tokenized, and each token has a specific set of vectors. But with other media, they've trained encoders that analyze the media and produce a set of vectors that are the same "format" as the token's vectors, but it isn't actually ever a token. Most companies have rules for how many tokens the media should "cost", but they aren't usually exact.