Currently it costs so much more to host an open model than it costs to subscribe to a much better hosted model. Which suggests it’s being massively subsidised still.

▲

finaard 2 days ago | parent | next [-]

For a lot of tasks smaller models work fine, though. Nowadays the problem is less model quality/speed, but more that it's a bit annoying to mix it in one workflow, with easy switching.

I'm currently making an effort to switch to local for stuff that can be local - initially stand alone tasks, longer term a nice harness for mixing. One example would be OCR/image description - I have hooks from dired to throw an image to local translategemma 27b which extracts the text, translates it to english, as necessary, adds a picture description, and - if it feels like - extra context. Works perfectly fine on my macbook.

Another example would be generating documentation - local qwen3 coder with a 256k context window does a great job at going through a codebase to check what is and isn't documented, and prepare a draft. I still replace pretty much all of the text - but it's good at collecting the technical details.

	▲	pbronez 2 days ago \| parent [-]
		I haven’t tried it yet, but Rapid MLX has a neat feature for automatic model switching. It runs a local model using Apple’s MLX framework, then “falls forward” to the cloud dynamically based on usage patterns: > Smart Cloud Routing > > Large-context requests auto-route to a cloud LLM (GPT-5, Claude, etc.) when local prefill would be slow. Routing based on new tokens after cache hit. --cloud-model openai/gpt-5 --cloud-threshold 20000 https://github.com/raullenchai/Rapid-MLX

▲

stingraycharles 2 days ago | parent | prev | next [-]

You can use open models through OpenRouter, but if you want good open models they’re actually pretty expensive fairly quickly as well.

▲

layoric 2 days ago | parent [-]

I've found MiniMax 2.7 pretty decent and even pay-as-you-go on OpenRouter, it's $0.30/mt in, and $1.20/mt out you can get some pretty heavy usage for between $5-$10. Their token subscription is heavily subsidized, but even if it goes up or away, its pretty decent. I'm pretty hopeful for these openweight models to become affordable at good enough performance.

	▲	stingraycharles 2 days ago \| parent [-]
		It’s okay, but if you compare it to eg Sonnet it’s just way too far off the mark all the time that I cannot use it.

▲

ericd 2 days ago | parent | prev | next [-]

Efficiency goes way up with concurrent requests, so not necessarily subsidy, could just be economy of scale.

▲

JumpCrisscross 2 days ago | parent | prev [-]

If I drop $10k on a souped-up Mac Studio, can that run a competent open-source model for OpenClaw?

▲

pbronez 2 days ago | parent | next [-]

Rapid MLX team has done some interesting benchmarking that suggests Qwopus 27B is pretty solid. Their tool includes benchmarking features so you can evaluate your own setup.

They have a metric called Model-Harness Index:

MHI = 0.50 × ToolCalling + 0.30 × HumanEval + 0.20 × MMLU (scale 0-100)

https://github.com/raullenchai/Rapid-MLX

	▲	JumpCrisscross 2 days ago \| parent [-]
		Pardon the silly question, but why do I need this tool versus running the model directly (and SSH’ing in when I’m away from home)?

▲

Atotalnoob 2 days ago | parent | prev [-]

Qwen is probably your best bet…

Edit: I’d also consider waiting for WWDC, they are supposed to be launching the new Mac Studio, an even if you don’t get it, you might be able to snag older models for cheaper

	▲	JumpCrisscross 2 days ago \| parent \| next [-]
		> consider waiting for WWDC 100% agree. I’m just looking forward to setting something up in my electronic closet that I can remote to instead of having everything tracked.
	▲	storus 2 days ago \| parent \| prev [-]
		Latest rumors are no Mac Studio until at least October.