> We still need good value hardware to run Kimi/GLM in-house

If you stream weights in from SSD storage and freely use swap to extend your KV cache it will be really slow (multiple seconds per token!) but run on basically anything. And that's still really good for stuff that can be computed overnight, perhaps even by batching many requests simultaneously. It gets progressively better as you add more compute, of course.

▲

HPsquared 3 hours ago | parent [-]

At a certain point the energy starts to cost more than renting some GPUs.

	▲	vardalab 23 minutes ago \| parent [-]
		Yeah, that is hard to argue with because I just go to OpenRouter and play around with a lot of models before I decide which ones I like. But there's something special about running it locally in your basement