Qwen3.7-Max: The Agent Frontier

▲ Qwen3.7-Max: The Agent Frontier(qwen.ai)

359 points by kevinsimper 6 hours ago | 128 comments

▲ goldenarm 3 hours ago | parent | next [-]

The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5.5! Congrats to the team

▲

gslepak an hour ago | parent | next [-]

> The non-hallucination rate in AA-omniscience is SOTA

Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations.

It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test.

	▲	rlt 40 minutes ago \| parent [-]
		Well, yes, garbage in garbage out. That's a given and not what's meant by "hallucination" in this context.

▲

throawayonthe 2 hours ago | parent | prev | next [-]

referencing this:

https://artificialanalysis.ai/evaluations/omniscience?models...

(had to add it to the chart, wasn't displayed by default. is it the lowest rate in the datasetor no?)

▲

sheepscreek 2 hours ago | parent | prev | next [-]

Truly incredible! Very impressed by their progress. I wonder how much of their own chips did they use for training.

▲

baq an hour ago | parent | prev [-]

wonder at which level there's a capability state transition? 5%? 1%?

▲ briga an hour ago | parent | prev | next [-]

I was getting dangerously close to my weekly Claude Code limit last night so I had Claude set up Qwen3.6 with llama.cpp and OpenCode. Honestly it's a great (free!) alternative to Claude Code--certainly more than good enough for a lot of smaller less complex tasks. I'm excited to try this new version. The fact that open-source models are so close to the frontier is very impressive.

▲

plufz an hour ago | parent | next [-]

Which exact model are you using? And with which parameters and quant? And on what hardware? Are you using any specific MCPs or other tools to optimize performance like context-mode or dynamic context pruning? I’ve used local models a reasonable amount before but I’m just starting out with opencode. Haven’t had great results yet but really want this to work for simpler tasks. My opencode newly installed is also having iterm on 100% cpu in idle. :/

	▲	briga an hour ago \| parent [-]
		I'm running Qwen3.6:27b Q4 KM on a 4090 and similarly fast CPU and I think 32GB of RAM. Make sure the context window is set to be big enough otherwise the conversation will keep compacting. No special MCP tools set up yet. Qwen is able to do web search out-of-the-box although I think it is getting blocked by anti-bot firewalls--I still need to figure out if I can fix that.

▲

ecshafer 8 minutes ago | parent | prev | next [-]

Qwen3.6 with claude code works great. I get a lot better results with that than opencode and qwen3.6. Claude Code is a great harness, and good harness/tool integration makes a big difference. You just have a settings.json with your ollama setup and the qwen model and you can use it.

▲

leonidasv an hour ago | parent | prev [-]

Qwen Max are usually closed, unfortunately.

▲ tekacs 4 hours ago | parent | prev | next [-]

As they start to release more proprietary models, I so wish that they partnered with one of the major US hyperscalers to allow using these models through something US-domiciled.

Totally understand why it may not be reasonable or in their best interest (and that the US is _absolutely_ not doing the same reflexively). But it would be lovely to be able to try these out on production workloads in earnest.

▲ embedding-shape 4 hours ago | parent | next [-]

Unless US hyperscalers do the same in reverse, I hope the status quo stays as it is. Either people are happy to share, and the sharing should happen both ways, or US hyperscalers can keep isolating themselves as they've done so far.

▲ adjejmxbdjdn 4 hours ago | parent [-]

I do hope The U.S. hyperscalers do the same as well.

In an ideal world U.S. residents would use Chinese AI models and Chinese residents would use U.S. AI models.

Governments in both countries are collecting data for nefarious reasons. But the Chinese government has far less influence on a U.S. resident and vice versa.

We are all better off if our data is collected by a government halfway across the world instead of our own governments which hold incredible amounts of power over us.

▲ giancarlostoro 3 hours ago | parent | next [-]

It would have been the world we live in if China wasn't involved in so much corporate espionage. I don't even feel comfortable using their open weight models on anything my employer makes, the only time I use Qwen is for greenfield "how good is this?" type of projects, but otherwise, how do I trust that it wont mysteriously hallucinate phoning home?

On the other hand, there's other models where the source is 100% open, the training data is known, and people have reproduced the same model from scratch, so while those trail behind, there's definitely an effort to make models more open and capable.

▲ deaux 2 hours ago | parent | next [-]

The US has for decades been engaged in mass dumping of their products to establish monopolies all over the world, and punishing anyone who dares try do anything about it. This isn't better than corporate espionage.

▲ eloisant 3 hours ago | parent | prev [-]

I agree, but the same goes for the US. Remember Echelon.

▲ stickfigure 3 hours ago | parent [-]

It's highly improbable that the US government has a secret team inside Anthropic and OpenAI manipulating their training regimen. For better or worse, these companies are filled with ideologues and something that invasive would trigger an army of whistleblowers (despite legal consequences).

▲ booty 2 hours ago | parent | next [-]

    It's highly improbable that the US government has a secret team inside Anthropic and OpenAI manipulating their training regimen.

Two thoughts.

One: it would be relatively technically trivial for $GOVERNMENT_AGENCY to just monitor all the prompts + context we send over the wire to OpenAI/Anthropic/etc. That's a goldmine of sensitive personal and corporate data, no secret team needed (although, the LLM providers obviously would need to cooperate)

Two: Rather than secret infiltration teams influencing model training I think what's more likely on the training side of things is simply self-censoring by the LLM providers, so that they don't risk angering the government.

I highly doubt that China has government interlopers, secret or otherwise, inside Qwen's training team. Nonetheless, "sensitive" issues like Tiananmen Square are censored. I would imagine that much/most such censorship in China is self-censorship that doesn't leave a legal/paper trail. That's what we're in danger of seeing (more of) in America IMO.

	▲	Barbing an hour ago \| parent [-]
		> relatively technically trivial for $GOVERNMENT_AGENCY to just monitor all the prompts + context we send I take this for granted given Room 641A https://en.wikipedia.org/wiki/Room_641A Thus, I’ve pondered whether anything they’ve learned has changed the world / had a big impact (like on their understanding of human psychology, perhaps per region). They’ve heard phone calls, they’ve read emails, diaries get brought to court… but these are systems that would be used like diaries but also prompt users for more and more.

▲ throwaw12 an hour ago | parent | prev | next [-]

> secret team inside Anthropic and OpenAI manipulating their training regimen

You don't need a secret team to manipulate whats coming from them: https://responsiblestatecraft.org/israel-chatgpt/

▲ Planktonne 2 hours ago | parent | prev | next [-]

> these companies are filled with ideologues

Are they? They don't behave like it.

▲ gmerc 3 hours ago | parent | prev [-]

Its very hard to be so naive.

▲

SR2Z 2 hours ago | parent [-]

I think you are being ridiculous. Tampering with an LLMs pretraining is a difficult undertaking. There is plenty of evidence that training a model to walk the party line leaves it less capable than if it weren't.

It's not very subtle manipulation either; ask qwen of Taiwan is a part of China in German and in English and only the English answer will be party-approved.

	▲	embedding-shape an hour ago \| parent [-]
		Compared to what we have proof the US government have engaged in before? Do people not remember PRISM anymore? It was virtually impossible to think of the scope before it was leaked, and you'd be marked as a conspiracy theorist for believing that happened, before it was made concretely true. I think it's borderline naive to assume various agencies haven't infiltrated OpenAI, Anthropic and others, essentially the entire world was wiretapped by NSA in the past, to assume they don't have an employee or two at these companies does seem a bit naive to me.

▲ adrianN 2 hours ago | parent | prev | next [-]

In an ideal world everybody runs open models on hardware they control.

	▲	LeifCarrotson 2 hours ago \| parent [-]
		I'm running Qwen 3.6 via https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8 and it's pretty great. I'll update to the 3.7 equivalent when that's ready. It's not nearly worth it to me to get an incremental improvement in performance if it means I have to move to hosted environments with Qwen 3.7 (or Claude or Gemini or whatever).

▲ nickdothutton 3 hours ago | parent | prev | next [-]

China is much more interested in waging a campaign against companies that represent the material of the future growth in productivity, exports, and prosperity of the US and her people, than learning about you as an individual. Unless of course you are a Chinese dissident living in the US.

▲

giancarlostoro 3 hours ago | parent | next [-]

Which is basically the current primary use for AI is programming more than anything, you hear about AI in programming more than in any other field.

	▲	saghm 2 hours ago \| parent [-]
		There are also a lot more novels about writing than making movies and a lot more songs about music than plays. It's not clear that this is because it's actually the primary use-case or if it's just because people who work with computers will inevitably talk quite a lot about computer things. For the past several years, pretty much everyone I meet who isn't in software but find out I do (doctors, people who sit next to me on a plane, etc.) will ask me my thoughts about AI because it's so widely discussed in general, and they're curious about my perspective on it as someone in software, but most of the time they're most curious about understanding more about how it might affect their own lives, not mine.

▲

WarmWash 3 hours ago | parent | prev [-]

China definitley wants information on all Americans. This commment is so far off the mark you it's on par with "Billionaires aren't interested in taking your money"

As Americans go through life, some of them will become people with power. When you need to leverage that power, having the right knowledge about them can effectively transfer that power to you.

Tiktok was a goldmine, because every 20-something on their way to a future position of power was uploading every single facit of their digital life to CCP servers everyday.

▲ boomskats 3 hours ago | parent | prev | next [-]

Yeah, about that. https://en.wikipedia.org/wiki/UKUSA_Agreement#Controversy

▲ CodingJeebus 3 hours ago | parent | prev [-]

> We are all better off if our data is collected by a government halfway across the world instead of our own governments which hold incredible amounts of power over us.

Sure, that is until each government's dataset is interesting enough to the other to facilitate a data-sharing agreement.

There's gotta be an internet "law" that says something like "Eventually, the data you volunteer to a benign 3rd party eventually winds up being used against you by someone". This is short-term thinking at it's finest.

▲ tmoravec 33 minutes ago | parent | prev | next [-]

Qwen3.6-Plus is available from Fireworks.

▲ dchftcs 2 hours ago | parent | prev | next [-]

fireworks hosts Qwen 3.6 Plus, they might also get Qwen 3.7 Plus.

▲ epolanski 3 hours ago | parent | prev | next [-]

US hyperscalers, all of them, are financially invested in the US AI labs and have the incentives to keep the status quo.

▲ 0xbadcafebee 3 hours ago | parent | prev | next [-]

I'm more interested in hearing specific reasons why one wouldn't use a Chinese company. Unless you're thinking Alibaba is going to ship chat logs to some government ministry that will then dole out proprietary information to new competitors (which doesn't seem logistically feasible), or you run a human rights organization, it feels a bit like FUD.

▲

vessenes 3 hours ago | parent | next [-]

All this data is accessible to national security agencies; this is true in every country in the world.

China has more integration between intelligence and industry than many western countries, and it does present a higher risk of unwanted “tech transfer” to industry than running on oracle or Google or ms or Amazon does in the US.

DHS has long staffed full time agents in California to deal with foreign IP exfiltration - using qwen is like fast/easy mode for IP exfiltration: why make anyone get a job in your palo alto office when you can just send it to them in Hanzhou?

Upshot - If you have something proprietary you’re working on I would generally advise not to just direct send it to Alibaba.

	▲	culi 5 minutes ago \| parent [-]
		I highly doubt China has a more sophisticated integration of their intelligence ministries than the USA. The world in which that was true would look very different from our own.

▲

bachmeier 2 hours ago | parent | prev | next [-]

> Unless you're thinking Alibaba is going to ship chat logs to some government ministry

This made me think of a Seinfeld episode: "I didn't know it was possible not to know that."

▲

noelsusman 2 hours ago | parent | prev | next [-]

>Unless you're thinking Alibaba is going to ship chat logs to some government ministry that will then dole out proprietary information to new competitors (which doesn't seem logistically feasible)

That's exactly the fear, and why would it not be logistically feasible? The threat is definitely a bit overhyped, but China has a longstanding track record of aggressive corporate espionage.

▲

tekacs 3 hours ago | parent | prev | next [-]

… building and selling a product to US companies that sends company-internal data to Chinese AI providers is not a particularly good way to get people to buy it.

Even if they weren’t individually worried about their proprietary data being shared with Chinese domestic competitors or with government… their audit / security programs likely wouldn’t allow it for a _huge_ range of types of data.

▲

dpoloncsak 3 hours ago | parent | prev [-]

Because my CEO thinks China scary big hacker guys over there

▲ motiw 3 hours ago | parent | prev [-]

ChatLLM support QWEN, do you consider this as US safe?

▲ flakiness an hour ago | parent | prev | next [-]

I'm using pi agent and love to try qwen models (hosted). What are the good options? The official provider doesn't include Alibaba. Is OpenRouter etc. fast enough?

(As a reference, DeepSeek v4 is severely throttled on these proxy services.)

	▲	atilimcetin 6 minutes ago \| parent [-]
		I use pi + openrouter (with qwen3.6-max-preview) a lot. I never hit any stability or performance problems yet.

▲ ndom91 2 hours ago | parent | prev | next [-]

Is this one of those ones where they'll drop the huggingface release a week later? Or do we know for sure that this is staying proprietary?

▲

Davidzheng 2 hours ago | parent [-]

someone correct if i'm wrong, but I think the max models are usually non-open

▲

sroussey 2 hours ago | parent [-]

The plus and max models have never been open as far as I know.

	▲	zackangelo 2 hours ago \| parent [-]
		With the 3.5 release, the Plus model was just a rebrand of the open weight 397B. But I suspect that will change going forward. They haven’t released the weights for 3.6 but they did make it available through a few US providers.

▲ tarruda 4 hours ago | parent | prev | next [-]

Looking forward to more open weight releases from Qwen, especially 122B and 397B.

▲

smcleod 4 hours ago | parent | next [-]

Yeah that 60-150b~ range is such a sweet spot for current 'prosumer' hardware, I'd love to see something like a 120b-a14b or there about.

▲

tarruda 4 hours ago | parent | next [-]

I have a 128G mac studio and even 397B was a happy surprise to me due to its high quantization resilience.

I've created a 2.54BPW quant that fit on my hardware with 128k context, 20 tps tg and 200tps pp, while maintaining high scores on many benchmarks: https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discus...

▲

chrisweekly 4 hours ago | parent | next [-]

Apple store's current options for mac studio seem to max out at 96GB. I'm questioning ROI, esp. given it's not upgradeable. Curious about others' takes on new mac hardware.

▲

tarruda 3 hours ago | parent | next [-]

> I'm questioning ROI

If by ROI you mean saving more money than using paid APIs, then I don't think it is worth it. All you gain is full sovereignty over your AI usage.

▲

drob518 3 hours ago | parent | prev | next [-]

Currently, Apple is letting some of its models go out of stock in preparation for new models coming in a few weeks. I would expect at least 128 GB models at that time. That said, the memory crunch is hitting everyone.

▲

the_lucifer 2 hours ago | parent [-]

Yep, even with their supply chain prowess, they're being hit now given some longer term contracts vis-à-vis their memory are nearing renewals.

	▲	drob518 2 hours ago \| parent [-]
		Yep. Something needs to break soon. Or rather, something WILL break soon, one way of another. Was talking to a friend last night who works planning infrastructure rollout and he said costs for equipment has roughly doubled in the last six months. Soon, these projects aren’t going to be viable.

▲

ramses0 an hour ago | parent | prev [-]

I'd held off from buying a new personal laptop for quite a few years and felt that the M5-128gb was justifiable once I started really seeing payoffs from using AI at work.

Running w/ Cursor and doing some "nights and weekends" type coding / conversations, I was hitting $100-200 of usage within a few weeks. I know there's probably better ways to manage costs, but I was getting enough value out of it to keep bumping my spend limit from $20 => $40 => $80 => $120 (and then I stopped spending! :-)

Messing around with local-llm, I've settled on `omlx` and `gemma` for "conversational", and I think it's `qwen-120b-a3b-6bit` or something for the "heavy hitter". Gemma "gets it" a lot more, whereas that particular `qwen` tends to fall into the "MuSt WrItE CoOooDeee!" behaviour in a lot of cases instead of holding a conversation, and does an awesome job of randomly spitting out ascii-art diagrams or including full-blown bash shell scripts to illustrate different cases.

My POV is: "Local for slightly slower/casual usage", the ~1% of battery usage per minute of LLM is shockingly accurate (eg: 30 minutes == 30% drop!). "Gemma for discussion and emitting DESIGN-... docs", and "Qwen for converting DESIGN-... to PLAN-...", (as well as implementation, but generally from a fresh context loading the relevant PLAN-... or supporting docs)

...then supplement that with direct Cursor usage in case I screw up some setting on being able to get the local LLM working, or if I need to include literal web-research or really having access to some SOTA model. Using the pi-coder harness locally, web pages are kindof a difficult conundrum as they can be kindof gigantic and are really worthy of special casing, some sort of sub-harness, etc... but the more "stuff" you put into the agent, the less context window (and memory!) you have available, so it's a real balancing act.

The other biggest problem is that you're limited (locally) to ~20-80tps and in some cases you have to chew on or "swallow" the whole prompt up to that point if you end up with some sort of cache miss (TTFT). The `omlx` server does a pretty good job (after you tweak some settings and stuff) of allowing MANY prompt continuations to nearly immediately start generated tokens, but sometimes if I have two agents going (eg: Gemma talking shit about Qwen's output or vice versa) in a longer context window, then you'll take that hit.

"Other people's compute" is definitely more freeing, but even looking at $200/mo usage that's $2400 vs. the ~$6k for a maxed out MBP. Call it $2500 vs. $7500 and you'd say that "local AI gives you a 3-year amortization window for a slower, worse experience" ... but if you're strategic about your usage, the ability to "talk for free" and occasionally "burst" to an online provider or having some hugging-face tokens to try out different models that you can't quite run locally is really nice. Talking to the AI (locally) to even just do non-coding planning without worrying about data leakage or privacy issues is phenomenal, and you end up owning a really nice laptop!

In some ways, seeing the "advantage" of having the local 128gb capacity for LLM, I'm semi-wishing I'd have gotten a mac mini instead, but then I can't quite do the 100% offline stuff (eg: coffee-shop) that the maxed out laptop allows.

If it were a mini running locally, I'd feel more comfortable calling it the always-on "AI brain" to process my emails, run crontab summaries, whatever kindof "open-claw-ish" stuff that you could do w/o relying on having to "keep the laptop lid open all the time". I'm sure there's ways to repurpose things, but longer-term, call it even 3-5 years from now... any sort of 128gb machine will be more than capable where you'd want to have one "doing stuff" locally within your home network (IMHO).

	▲	chrisweekly 30 minutes ago \| parent [-]
		Thank you! That was a generous and helpful response, I really appreciate it. Food for thought... >"...if you're strategic about your usage, the ability to "talk for free" and occasionally "burst" to an online provider or having some hugging-face tokens to try out different models that you can't quite run locally is really nice. Talking to the AI (locally) to even just do non-coding planning without worrying about data leakage or privacy issues is phenomenal, and you end up owning a really nice laptop!" ^ this resonates, loudly.

▲

ttoinou 4 hours ago | parent | prev [-]

better than antirez ds4 ?

	▲	tarruda 4 hours ago \| parent [-]
		I only tried a very early version of that when it was just a llama.cpp fork and Qwen was certainly better in my tests. But I was not super impressed with deepseek 4 flash using it from the official API either, so it doesn't seem quantization fault. It is a good model, but nothing out of the ordinary in the few benchmarks I ran on it (with full awareness that benchmarks are biased).

▲

gcr 4 hours ago | parent | prev [-]

What’s the price point for getting into that sweet spot?

I’m on an M1 Max with 32GB VRAM, so I’m looking forward to the 27B or 35B-A3B models. Is dropping $5k for an RTX 6000 or a DGX Spark really the best option?

▲

tempoponet 4 hours ago | parent | next [-]

Expect to pay $4k-10k

- Your RTX 6000 is closer to $10k now

- Sparks are creeping into the $4-5k range

- AMD Strix are ~3.5k

- Apple depends on chipset and memory. Sweet spot would be 128gb M3 Ultra, probably $6-8k but admittedly haven't been tracking closely. New M5 might come in the fall. You can get a new 128gb M5 Max laptop for ~5-6k today.

- a 4x3090 rig would take $5-6k

Every platform has tradeoffs, but it's mostly ecosystem, memory bandwidth, and power consumption. They're all slow. The best option is likely to rent hardware on Runpod. The RIO on self-hosting is very low unless you have a specific need or you're ok treating it as a hobby.

▲

anonym29 4 hours ago | parent | next [-]

Bosgame M5 (Strix Halo) w/ 128 GB still goes for $2800 right now. SH systems have surged in price dramatically but quite unevenly.

>The best option is likely to rent hardware on Runpod.

Vast.ai is much cheaper, but the broader point here is contestable. The only dimension in which cloud GPU rentals win is cost. You lose the confidentiality, integrity, and availability benefits of local deployments.

	▲	ai_fry_ur_brain 3 hours ago \| parent [-]
		Rentals are priced to pay themselves off in 1-1.5 years (when renting them out per hour, not selling tokens). Its never a better option to rent. Not that I'd encourage anyone to throw large amounts of money to have access to LLMs, but you're definately going to be better off buying something that you can amortize over multiple years with a multi year warranty.

▲

ai_fry_ur_brain 3 hours ago | parent | prev [-]

And for what? Spend 10-15k for the slopiest of slop code, non deterministic automations, and the ability to spawn an AI gf?

This whole thing is really starting to remind me of the crypto hype phases of 2016-2018 when everyone thought their investment in GPUs was going to make them rich.

	▲	organsnyder 3 hours ago \| parent \| next [-]
		It is possible to get real work done with LLMs. There are plenty of ethical concerns, and they're definitely over-hyped, but they are exceptionally useful tools when used well.
	▲	gamander2 an hour ago \| parent \| prev \| next [-]
		These models contain a wealth of knowledge that is being censored, not just deliberately, but by training data bias. Fine-Tuning and steering can produce unexpected new insights. For example a model that is trained to believe so-called "conspiracy theories", which many believe to be the ground truth.
	▲	dvfjsdhgfv an hour ago \| parent \| prev [-]
		I upvoted your comment even though I disagree with you. Yes, LLMs are sloppy, and local models usually more so (but things change fast). But the local ones have one big advantage: they are private. So you can safely feed them the collection of your private documents and things you wouldn't trust people like sama with. The fact that some people do not care is one of the failures of our educational system.

▲

tarruda 4 hours ago | parent | prev | next [-]

> What’s the price point for getting into that sweet spot?

In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.

▲

ttoinou 4 hours ago | parent | prev | next [-]

M5 Max 64GB (sweet spot) or 128GB (only 1000 USD, better to keep it for the future) more are the best quality price ratio, future proof, reliable, resellable and flexible workloads. Harder to use as a server might be the only drawback

▲

throwaw12 4 hours ago | parent | next [-]

What do you recommend for non-Mac setup? I am a Mac user, but its getting expensive, and not seeing reason to jump to the latest M5

▲

barbacoa an hour ago | parent | next [-]

Try looking into Ryzen AI Max 395. AMD made a CPU/GPU soc with unified memory specifically for ai inference. Can buy mini PCs with up to 128gb ram.

▲

simple10 40 minutes ago | parent [-]

The Ryzen AI Max 395 128gb is super cool, but not fast for inference. Order of magnitude slower than dedicated GPU but at half the cost. You can run larger models on it but it's slow. Great for local async work. Not great for daily chat or code agent driver.

	▲	throwa356262 a few seconds ago \| parent [-]
		The latest NPUs are pretty fast, I think what is missing is more optimised software support.

▲

varispeed 3 hours ago | parent | prev [-]

Probably a comparable non-Mac setup will be Threadripper, but it will become much more expensive. My view is that actually Apple products are the cheapest on the market when it comes to performance.

▲

roger_ 4 hours ago | parent | prev [-]

M5 Max 128GB for $1k?

	▲	tempoponet 4 hours ago \| parent \| next [-]
		The memory upgrade is $1k on a Macbook Pro. The laptop is ~$5500.
	▲	smallerize 4 hours ago \| parent \| prev [-]
		I think they mean the upgrade to 128GB is +$1k.

▲

embedding-shape 4 hours ago | parent | prev | next [-]

If I could find a RTX Pro 6000 for $5K I'd definitively grab it, I'm running RedHatAI/Qwen3.6-35B-A3B-NVFP4 on one (I had to pay closer to $10K for it though) with 260K context and it's a blast! ds4 by antirez also works well, even IQ2XXS seems to work relatively well but Qwen3.6-35B-A3B-NVFP4 is both faster and higher quality responses (at least for coding and translations which I use them mostly for).

▲

anonym29 4 hours ago | parent | prev [-]

Strix Halo at $2k with similar TG and about half the PP of DGX Spark was a pretty good deal IMO, especially considering it's also a full x86 system... 16c/32t Zen 5, 40 CU RDNA 3.5, 128 GB unified memory at ~220 GB/s real-world speeds (256 GB/s theoretical) - that runs full tilt at 140W in performance mode and idles at ~10W.

Unfortunately, the prices rose on these a lot, but unevenly. Beelink GTR 9 Pro is $4400, Framework Desktop is ~$3500, for what is basically the exact same mainboard as a Bosgame M5 for $2800.

Apple's M5 Max is another attractive option. Apple silicon traditionally had great MBW and was good at TG, but struggled with PP, but the new neural engines in those GPU cores have made a big difference in a good way here.

Gorgon Halo is rumored for June announcement with Q4'26 release with basically +100 MHz clocks on Strix Halo, LPDDR5X-8533 instead of LPDDR5X-8000, but more importantly, 192 GB max instead of 128 GB.

I'd say it's better to wait for Gorgon Halo than to grab Strix Halo now. However, Medusa Halo, rumored for H2'27, is slated to have up to 26c Zen 6 (heterogeneous cores - kinds funny that AMD is heading towards these as Intel retreats from them), 48 CU of RDNA 5 instead of 40 CU RDNA 3.5, and a 384 bit bus w/ LPDDR6, which should make 256 GB at more like ~490-600 GB/s MBW, which will really make Strix and Gorgon Halo obsolete.

Also worth keeping an eye out for Serpent Lake (intel CPU + nvidia iGPU on a single board with unified memory, rumored for 2028-2029 iirc), and on the 160 GB Crescent Island Intel dGPU.

▲

mixtureoftakes 4 hours ago | parent | prev | next [-]

I'm more excited for qwen3.7 9b and 72b, these are usually so good for their size

▲

guitcastro 4 hours ago | parent | prev | next [-]

I am still waiting for qwem image-edit 2.0 open weight

▲

Pxtl 2 hours ago | parent | prev [-]

Ouch. I'm just getting into tinkering with these things - mine is running on a vanilla gaming desktop with a 12gb 3060 and 32gb of ram. Even going above Qwen 9B risks completely locking up the machine.

▲ goyozi 6 hours ago | parent | prev | next [-]

These are very good numbers. I still don’t get why they don’t compare against latest competitor versions in these posts, it’s not like we’re all not going to notice.

▲

NiloCK 4 hours ago | parent | next [-]

I find it forgivable if it's within minor version bump. (NB that x.5 is now a defacto major-version bump for LLMs for whatever reason).

Even with LLMs, posts like this don't just fall out of a coconut tree. If you have a set of target benchmarks for your own model, then keeping "the set" of side-by-side comparable models is its own maintenance headache.

▲

Aurornis 4 hours ago | parent | prev | next [-]

I think the argument is that trying to suggest that they’re close to N months from SOTA.

Realistically I assume they hope readers don’t notice the fine details.

The Qwen models are great for open weights but for every past release they haven’t performed as well as the benchmarks in my experience. They’re optimizing for benchmark numbers because they know it works.

▲

epolanski 3 hours ago | parent [-]

> Realistically I assume they hope readers don’t notice the fine details.

The pool of people reading such articles while ignoring such details can't be big.

	▲	Aurornis 3 hours ago \| parent [-]
		I disagree. Most people skim articles, not read them deeply. On Hacker News I wonder if most people even opened the article at all most times.

▲

htrp 4 hours ago | parent | prev | next [-]

I think its part of the expectation setting (with a side of we did our distillation/ eval harness on a specific model).

if they say it's 4.7 comparable, it anchors that into your head as the model to evaluate against.

▲

hmokiguess 5 hours ago | parent | prev | next [-]

this puzzles me too, I want to know

▲

beydogan 4 hours ago | parent | prev | next [-]

honestly, initial version of Opus-4.6 was much better than whatever we are being served right now as 4.7. If it performs same level to that, i'm totally willing to switch.

▲

hypercube33 3 hours ago | parent [-]

4.6 was an awful experience the month I used it right after launch where it didn't ask anything just made assumptions and went on its merry way. 4.5 and 4.7 don't do that for me but 4.7 eats my quota for breakfast so I've been avoiding using it because I like to have it for more than an hour a day.

	▲	goyozi 2 hours ago \| parent \| next [-]
		I feel like I had the best and worst ~month experience on 4.6. Initially when it came out, it seemed to ask good questions and genuinely do well on complex tasks. From about mid-March it was absolutely abysmal, it seemed to assume the stupidest answer/angle for everything and make weird mistakes. 4.7 seems decent so far but usage hurts - at some point my company switched me to standard seat and I used up 80% of my session usage in 1 prompt. I got my premium seat back since but I think pro/standard plan + opus 4.7 is unusable for daily driving.
	▲	verdverm an hour ago \| parent \| prev [-]
		That experience is also likely tied to the claude harness around the model, and not being as tuned right after model release. They iterate on this and different models need different words (unfortunately...).

▲

maelito 4 hours ago | parent | prev [-]

Marketing.

▲ eddyaipt 3 hours ago | parent | prev | next [-]

The pattern I trust most is adding a small verification artifact after every external action. Agents usually fail from silent state drift faster than from lack of reasoning depth.

	▲	_boffin_ 2 hours ago \| parent [-]
		Can you go into more depth about this

▲ aliljet 27 minutes ago | parent | prev | next [-]

Where can a user reasonably host this in an affordable way to access the local LLM revolution?

▲ jdw64 2 hours ago | parent | prev | next [-]

QWEN really hits the sweet spot it's cheap, fast, and actually good.

▲ bsenftner 4 hours ago | parent | prev | next [-]

Any reports from people using their coding agent(s)?

▲

rayboy1995 3 hours ago | parent | next [-]

I'm running Qwen 3.6 27B Q5 K M GGUF on a Tesla P40 and koboldcpp using pi.dev as the harness, I gotta say I am impressed. Took some setup and configuring but I already have some code it has made commited and pushed. It can be slow on my hardware at >50k tokens, but the fact I bought this one P40 for like $150 back when the LLM trend started I can't complain. (I have a second one too but I couldn't physically fit the card in my server unfortunately.)

The setup I had to do was important and I had to compile koboldcpp with a few special params for my hardware, I mostly just had Claude figure it out. I don't remember everything I did now but it was very slow and would often stop mid task, it seems it was mostly a parsing issue. It made the model seem broken/dumb, but once I had all that settled I actually am able to use this how I use Claude Code. Disclaimer, I am pretty explicit with requirements, I imagine this fails more when you leave it to figure out things on its own but for my flow its pretty rad.

Currently setting it up as an automated agent now to pull Trello cards, create PRs for them, and move the card to be reviewed.

Command I am using to run: python koboldcpp.py \ --port 61514 --quiet --multiuser --gpulayers 999 --contextsize 262144 --quantkv 2 \ --usecublas normal --threads 4 --jinja --jinja_tools --jinja_kwargs '{"enable_thinking":true, "preserve_thinking":false}' \ --skiplauncher --model /data/models/Qwen3.6-27B-Q5_K_M.gguf --smartcache 5

	▲	lostmsu an hour ago \| parent [-]
		Qwen recommends to preserve_thinking: true for agentic/coding workloads.

▲

vibe42 3 hours ago | parent | prev [-]

I'm using the pi-mono coding agent (open source, free) without any extensions and very simple prompts. The 3.6 27B model (BF16, 250k context) uses 67GB VRAM on an RTX PRO 9000.

It's very capable on almost any coding task I've thrown at it, and very good for easy-to-medium hard scripts, new code bases.

It struggles on some complex tasks in larger code bases, e.g. using to debug and fix bugs in llama.cpp it gets close to working code but often introduces errors. For such tasks its still very useful as a search/explore tool and drafting fixes.

▲ cft 17 minutes ago | parent | prev | next [-]

Downloading this and cancelling Google Antigravity Pro at the same time:

I had a Google Pro account that I inherited from buying a Pixel 9 XL - it's free for a year after a flagship Pixel phone purchase. After a year they started charging for it, and i tolerated it, because Flash was usable in Antigravity for dumb auxiliary tasks that I did not want to waste GPT/Opus on. It had a separate generous quota from Gemini 3.1 Pro. Now with Flash 3.5 they combined the quotas with Pro, such that on a Google pro account you can work 4-5 hours per week in Flash. And by the way, 3.1 Pro is useless for programming, compared to Codex/Opus

▲ indigodaddy 41 minutes ago | parent | prev | next [-]

Is it multimodal/vision?

▲ XCSme 4 hours ago | parent | prev | next [-]

Any info on pricing and latency?

	▲	mchusma 23 minutes ago \| parent [-]
		I've looked like a dozen places, I don't see anything. :(

▲ bratao 5 hours ago | parent | prev | next [-]

It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.

▲

vessenes 4 hours ago | parent | next [-]

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.

▲

varispeed 3 hours ago | parent | prev | next [-]

Opus-4.6 was probably the best model so far before it got nerfed. 4.7 is nowhere near experience I had. In fact I stopped using it completely because more often than not its output is just dumber than local models.

	▲	leonidasv an hour ago \| parent [-]
		Same here. Can't stand 4.7.

▲

dyauspitr 2 hours ago | parent | prev [-]

Because these can’t compete with the SoTA but they’re close.

▲ xiaoluolyg 37 minutes ago | parent | prev | next [-]

congrats to qwen teams, remarkable

▲ hmaddipatla 2 hours ago | parent | prev | next [-]

The tokenomics and value for capability, context and latency look like they could deliver super competitive offer - what would it take for you to switch??

▲ esafak 3 hours ago | parent | prev | next [-]

Does anyone have experience with the Alibaba Cloud Model Studio that serves these qwen models?

▲ howmayiannoyyou 4 hours ago | parent | prev | next [-]

I can't bring myself to use any model that trains or sends telemetry back to my country's primary competitor/adversary. I don't care how much money is saved.

▲

Mashimo 4 hours ago | parent | next [-]

That is understandable. Just don't do it. No need to announce it.

▲

InsideOutSanta 4 hours ago | parent | prev [-]

As somebody in Europe, uh, that doesn't leave many options.

▲

avazhi 3 hours ago | parent [-]

This is the current European modus operandi: virtue signal and cry about tech that other countries produce, pass local laws that limit its use in their countries even though they have no viable local alternatives, brag amongst themselves about decoupling from US and Chinese tech, and then look on wistfully as the rest of the world moves on without a single fuck given.

Europe's sense of superiority and actual global importance/relevance is assbackwards.

▲

deaux 2 hours ago | parent [-]

> as the rest of the world moves on without a single fuck given.

Hilarious thing to say when half this comment section is Americans giving so much of a fuck that they consider China-adjacent hosted models unusable due to the supposed risks. If what you were saying was true then those pragmatic Americans would just use whatever is most effective.

	▲	avazhi 2 hours ago \| parent [-]
		Americans have their own frontier models, that's the point. Europeans have quite literally nothing native, so they are forced to choose between the Americans or Chinese, and they dislike both and trust neither. The Americans can cry about Chinese censorship and turn around and use Claude or Opus or Gemma or whatever, but the Europeans just throw a fit and then have to use one of the two anyway. And that whole crying about something while being completely helpless vis-a-vis doing anything about it is the definition of Europe so far this century. Globally irrelevant outside Germany.

▲ dfansteel 4 hours ago | parent | prev [-]

Can anyone check its knowledge base for me? I’m honestly not able to run it and the Qwen models I can run censor information critical towards the Chinese government.

Tiananmen Square is the first place to start.

▲

Mashimo 4 hours ago | parent [-]

> I’m honestly not able to run it

What do you mean? This is not self hosted, it's closed source. And any website that targets China or is hosted in China will probably censor Tiananmen Square.

▲

polski-g 2 hours ago | parent [-]

There is no reason why they couldn't license the model to Friendli/Fireworks/etc and have it hosted in the US to alleviate this concern.

	▲	Mashimo 2 hours ago \| parent \| next [-]
		I don't know about this model specifically, but other china models did not have the limitation. It was purely on the hosted end, tacked on as a self check while the text was generating. Did that change?
	▲	SR2Z 2 hours ago \| parent \| prev [-]
		The reason is to create domestic demand for Chinese AI chips so they can eventually be free of NVIDIA.