The general public will self-host it's built in to your next phone or laptop straight out of the box or maybe from the App Store.

▲

delecti 4 hours ago | parent | next [-]

I agree that that's what it would take, but compute would need to get very cheap for it to be feasible to keep models running locally. That's an awful lot of memory to have just sitting with the model running in it.

▲

winrid 5 hours ago | parent | prev [-]

True. I was thinking more of power users. Do you think Opus level capabilities will run on your average laptop in a year? I think that's pretty far away if ever.

	▲	zozbot234 5 hours ago \| parent [-]
		You can demonstrate "running" the latest open Kimi or GLM model on a top-of-the-line laptop at very low throughput (Kimi at 2 tok/s, which is slow when you account for thinking time) today, courtesy of Flash-MoE with SSD weights offload. That's not Opus-like, it's not an "average" laptop and it's not really usable for non-niche purposes due to the low throughput. But it's impressive in a way, and it does give a nice idea of what might be feasible down the line.