Gotta say, I've lost all interest in cloud-based AI products. Too many cool features and workflows that I was once excited about that I can't or don't use anymore for a variety of reasons (price hikes, subjectively nerfed, disappeared altogether, replaced,...) for me to even remember. It's tiring.

I've set up a small rig, mostly settled on Qwen3.6 and I'm slowly adding features myself. It probably can't compete with Claude. I don't even know, I've stopped checking. It's providing a ton of value to me as is, and it only keeps getting better. All it takes is to realize that it doesn't actually matter if the grass is (maybe even objectively) greener somewhere else. Feels so good to know that it won't change under my feet. I've got this amazing, highly extensible tool, and it's mine.

▲ unleaded 2 hours ago | parent | next [-]

Qwen3.6-35B-A3B-UD-Q4_K_M runs at about 11 tokens/second on my poor old 1060. Absolutely nuts how far we've come

▲ broodbucket 2 hours ago | parent [-]

Mind sharing your llama.cpp settings for that?

	▲	unleaded 2 hours ago \| parent [-]
		`.\llama-server.exe -m ..\Qwen3.6-35B-A3B-UD-Q4_K_M.gguf -ngl 999 --n-cpu-moe 41 -c 262144 --port 8081 --flash-attn on --cache-type-k turbo4 --cache-type-v turbo3 --no-mmap --mlock --host 0.0.0.0 -t 8 -tb 8 -np 1` Using this llama.cpp fork https://github.com/TheTom/llama-cpp-turboquant and mostly copying from this video https://www.youtube.com/watch?v=8F_5pdcD3HY Haven't had much time to test it other than asking a few questions & changing some HTML in cline so it might be thick as a brick for all I know, but still worth trying

▲ JSR_FDED 3 hours ago | parent | prev | next [-]

This sounds very appealing. What size Mac mini would I need for that?

▲

jadbox 2 hours ago | parent | next [-]

A PC with an nvidia card with 16gb vram works just fine for Qwen MoE models, and these have worked great as a daily driver for me.

▲

mathgeek 2 hours ago | parent | prev | next [-]

Good summary blog: https://maloyan.xyz/blog/running-qwen-locally-mac-mini-m4

▲

blensor 3 hours ago | parent | prev [-]

I am curious if you implicitly assumed they are Macs or if that's what you are looking for specifically?

▲

JSR_FDED 2 hours ago | parent [-]

I assumed the 27B dense model would be preferable to a MoE model, and that it wouldn’t fit into a consumer graphics card, which leaves the Macs.

Then I assumed for cost and battery/heat reasons that a Mini would be better than a laptop.

	▲	blensor an hour ago \| parent [-]
		The reason why I was curious is that I am running my stuff on a Strix Halo and I get the feeling that this class of devices ( gmktek, minisforum, lenovo, etc. ) seem to becoming a pretty good alternative

▲ hathym 3 hours ago | parent | prev | next [-]

Same here, I’ve removed my credit card from Copilot and won’t be renewing

▲ cyanydeez 3 hours ago | parent | prev | next [-]

I never got into any of the AI models because it was clear local first was going to be more valueable, if they were to replace coding tasks.

I tried out a few models and ended up going with either Qwen3-Coder-Next (no think, just do) and Qwen3.6-35B (thinking, w/llamacpp token budget). Created a customized prompt that works fairly well to around ~60k tokens and then is a toss up on whether it's poisoned itself or I've directly steered it into the wrong. When it's clear that's happened, if it's important to continue, ask it to write a doc then start fresh.

I don't kno whow any one cold have witnessed the last 2 decades of American VC funded tech startups and tell themselves, "you know, this will be a reliable technolgy with no hidden problems".

Even a sober technical evaluation is just two steps:

1. You're proposing to build a app on a non-deterministic model.

2. That model is hosted behind a non-deterministic system (model alignment, model guardrails, system context subterfuge, cost/token pricing)

---

So you want to build your app and you think you're going to kep up with both #1 and #2?

▲

ACCount37 2 hours ago | parent | next [-]

We live in a non-deterministic world. Anything "deterministic" in it is a castle built on quicksand.

LLMs are, as far as the nastiness of the Real World goes, really fucking benign. Future models outperform past models, both in open weight land and at the big frontier labs. Performance per $ only ever goes up. That's just nice.

	▲	windexh8er 11 minutes ago \| parent \| next [-]
		> We live in a non-deterministic world. Anything "deterministic" in it is a castle built on quicksand. Except the Enterprise, and a lot of what people want compute for, is built on deterministic systems or processes. I'm not saying the non-deterministic nature of LLMs isn't useful. However I've worked with a lot of organizations on SOAR projects, for example. When you can weave the deterministic and non-deterministic together you get a relatively efficient system. A workflow that will stay on the rails and will come to a conclusion as expected. And the "as expected" part is critical in these types of systems. The reality of, using SOAR as an example, is also that most enterprise would be much better served by fast SLMs. Parse an email and validate if it's SPAM / Phishing or read a chunk of firewall logs and look for outliers / indications for escalation - those things can get messy in a deterministic system because of potentially unstructured data. I don't believe it's either / or. And I believe that LLMs just aren't efficient, fast or reliable in the sense that deterministic are. It seems, at least to me, a better together story.
	▲	cyanydeez an hour ago \| parent \| prev [-]
		YES, but you seem to not understand that having two non-deterministic layers is incompatible. #1 is fine: it has random issue and you build around those random issues; those issues don't change unless you change them. #2 is not fine; that non-determinism you do not control, have no insight into, etc. I'm saying sure, give me #1 if it means I can build a harness around it and smooth over the edges. But I'm not taking #1 and #2. There's zero reasonable way to manae two non-deterministic systems.

▲

maykthewessen 2 hours ago | parent | prev [-]

Qwen is the Alibaba distilled Anthropic Claude model

So piracy on an by piracy trained ai model..

▲

cogman10 2 hours ago | parent [-]

Piracy? Lol.

Alibaba didn't steal Opus weights, they used opus output to train their model.

If this is piracy, then so is reverse engineering efforts powering a bunch of Linux drivers.

	▲	cyanydeez an hour ago \| parent [-]
		If that's piracy, I'm going to the library and arresting everyone there! Also, yeah, they already stole their copyrighted works, so a thief from a thief is still...theives?

▲ anon373839 3 hours ago | parent | prev | next [-]

What features/workflows have you added?

▲ 2 hours ago | parent | prev [-]

[deleted]