Odd take. I'm running them locally at my desk (DGX Spark and 128GB MBP). They work fine for 90% of what most folks do. Admittedly, they do run slower on my hw than on the cloud.

▲

pants2 5 hours ago | parent [-]

Running them locally is cool and has privacy/autonomy benefits, but you can't really make a value case for it. Guaranteed if you run the math you will never run enough inference to pay off your hardware vs buying tokens. Last time I ran the math on my MBP I'd have to run inference 24 hours a day for 5+ years to pay off the cost of my MBP, not accounting for electricity costs.

▲

iooi 5 hours ago | parent | next [-]

Is this because of the tok/s? Since it's pretty easy to run up a $5k bill in API usage for Claude/ChatGPT in a month.

▲

pants2 5 hours ago | parent [-]

Yes, because of the limits on tok/s, and you have to compare apples to apples, not Gemma 27B to Opus 4.7.

▲

hedora 4 hours ago | parent [-]

Assuming the local models get the job done (e.g., you adjust your workflow so that you can run the local machine 100% all the time, or whatever), then the time to payback isn't very high. MSRP for a 128GB AMD was $1400 at launch. That's 7 months of claude code subscription. If you assume a 5 year depreciation cycle, you can buy a cluster of 8 such machines and still come out ahead. (Power is a few hundred watts per machine peak -- maybe 7 machines if you include electricity.) Of course, I'm assuming non-bubble numbers. Those boxes are like $3K now. Still, a normal person would probably not buy 8 of them at once. Instead, they'd space out buying a machine every few years as the technology improves.

For me, things are getting better faster than my ability to review / trust the resulting code, so tok/sec isn't a bottleneck anymore. Instead, quality of the tokens is the bottleneck. That points to me wanting a 1TB DRAM iGPU once they're available at pre-bubble RAM pricing.

▲

pants2 4 hours ago | parent [-]

You're comparing the highest tier Claude subscription to something Qwen3.5-122B-A10B running locally, apples to oranges.

If you compare to a smarter US model like Grok 4.3, $1400 will pay for 560M output tokens, which at ~25 t/s locally using it nonstop for 8 hours a day would take two years to pay back. Not accounting for bubble prices or electricity.

	▲	__mharrison__ 3 hours ago \| parent [-]
		Is the goal maximum t/s? According to openrouter, Opus 4.8 is 128 t/s. So 10x faster than my antirez/ds4.

▲

slopinthebag 3 hours ago | parent | prev | next [-]

The value of not having a reliance on a third party company, and not needing an internet connection, and having total privacy: ∞

▲

fragmede 3 hours ago | parent | prev [-]

Just have to put some numbers on privacy and autonomy. What's the fine to my company if I get hacked and leak all my customer's PII? What's the cost in productivity lost if OpenAI/Anthropic/Google decides to suspend my account for an unknown reason?