Remix clone Hacker News

new | show | ask | jobs Github

	▲	NitpickLawyer 3 hours ago
		You are all over this thread, but you have no idea how inference works, and it's obvious. Your napkin math is off because you don't know what to add up, you lack the necessary background. And yet you persist and reply all over this thread. I don't get it. Serving models on dedicated hardware is not the same as your at home 150t/s thing. Inference is measured in thousands of tokens / s in aggregate (i.e. for all the sessions in parallel). That's how they make money.