Remix clone Hacker News

new | show | ask | jobs Github

	▲	reactordev 3 hours ago
		Many are aware, just can’t offload it onto their hardware. The 8B models are easier to run on an RTX to compare it to local inference. What llama does on an RTX 5080 at 40t/s, Furiosa should do at 40,000t/s or whatever… it’s an easy way to have a flat comparison across all the different hardware llama.cpp runs on.