using it 24/7 brings the average cost down, not up.

the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use

bastawhiz 5 hours ago | parent | next [-]

That's the point: why would you buy a device that's specifically not optimized to be used for 24/7 inference? It's expensive hardware that's not designed to be used in that situation! The power use for inference isn't especially good and you're not getting even a fraction of the benefit from the hardware that you're paying for.

	▲	apf6 2 hours ago \| parent \| next [-]
		Good question but people are doing it anyway. It's a fact that right now tons of people are buying Mac Minis specifically for this use case, to treat them as their personal data center for agents. The concept of "power use for inference" is foreign. Those people are the ones that motivated this blog post I think.
	▲	dist-epoch 2 hours ago \| parent \| prev [-]
		> why would you buy a device that's specifically not optimized to be used for 24/7 inference because it costs $1k-$2k instead of $10k-30k+ for optimized devices

▲

groundzeros2015 5 hours ago | parent | prev [-]

The hardware has multiple uses for the same cost. The pay-per-use server does not.