Remix clone Hacker News

new | show | ask | jobs Github

	▲	NitpickLawyer a day ago
		Before committing to purchasing two of these, you should look at the true speeds that few people post. Not just the "it works". We're at a point where we can run these very large models "at home", and it is great! But true usage is now with very large contexts, both in prompt processing, and token generations. Whatever speeds these models get at "0" context is very different than what they get at "useful" context, especially in coding and such.
	▲	solarkraft a day ago \| parent \| next [-]
		Are there benchmarks that effectively measure this? This is essential information when speccing out an inference system/model size/quantization type.
	▲	cubefox a day ago \| parent \| prev [-]
		DeepSeek-v3.2 should be be better for long context because it is using (near linear) sparse attention.