Remix clone Hacker News

new | show | ask | jobs Github

	▲	simianwords 3 days ago
		Thanks - still not clear what they did really. Some inference time hacks?
	▲	FergusArgyll 3 days ago \| parent \| next [-]
		That would imply the model always had a 1m token context but they limited it in the api and app? That's strange because they can just charge more for every token past 250k (like google does, I believe). But if not shouldn't it have to be completely retrained model? it's clearly not that - good question!
	▲	otabdeveloper4 3 days ago \| parent \| prev \| next [-]
		Most likely still 32k tokens under the hood, but with some context slicing/averaging hacks to make inference not error out on infinite input. (That's what I do locally with llama.cpp)
	▲	Aeolun 3 days ago \| parent \| prev [-]
		They already had 0.5M context window on the enteprise version.