Remix clone Hacker News

new | show | ask | jobs Github

	▲	otterley 7 hours ago
		I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re specifically referring to time to first token. So it’s not about token generation rate (tokens per second).
	▲	aurareturn 6 hours ago \| parent \| next [-]
		Yes. This is known. They added neural accelerators, aka Tensor core equivalent, in the GPU. This will make prompt processing competitive vs similar class GPUs.
	▲	jasonjmcghee 7 hours ago \| parent \| prev [-]
		It would probably be worth finding a more friendly way to market this, but it's a reasonable / accurate way to say it. The prompt processing sped up. Not the output generation. M4 was notoriously slow at this compared to DGX etc.