I may have genuinely new data for you.

Qwen3.5-35B-A3B is reported to perform slightly better than the model you mentioned.

It runs fine but non-optimal on a single 3090 with even 131072 tokens of context , and due to the hybrid attention architecture, the memory usage and compute scale rather less drastically than ctx^2. I've had friends with smaller cards still getting work out of it. Generation is at around 20 tokens/sec on that 3090 (without doing anything special yet) . You'll need enough DRAM to hold the bits of the model that don't fit. Nothing to write home about, but genuinely usable in a pinch or for tasks that don't need immediate interactivity.

It's the first local model that passes my personal kimbench usability benchmark at least. Just be aware that it is extremely verbose in thinking mode. Seems to be a qwen thing.

(edit: On rechecking my numbers; I now realize I can possibly optimize this a lot better)

▲

Someone1234 3 days ago | parent | next [-]

With respect, this isn't "new data" it is an anecdote. And it kind of represents exactly the problem I was talking about above:

- Qwen is near Sonnet 4.5!

- How do I run that?

- [Starts talking about something inferior that isn't near Sonnet 4.5].

It is this strange bait/switch discussion that happens over and over. Least of all because Sonnet has a 200K context window, and most of these ancdotes aren't for anywhere near that context size.

	▲	Kim_Bruning 3 days ago \| parent [-]
		You're not wrong; but... imho it's closer to Sonnet 4.0 [1] on my personal benchmark [2]. And I HAVE run it at just over 200Ktoken context, it works, it's just a bit slow at that size. It's not great, but ... usable to me? I used Sonnet 4.0 over api for half a year or so before, after all. Only way to know if your own criteria are now matched -or not yet- is to test it for yourself with your own benchmark or what have you. And it does show a promising direction going forward: usable (to some) local models becoming efficient enough to run on consumer hardware. [1] released mid-2025 [2] take with salt - only tests personal usability + Note that some benchmarks do show Qwen3.5-35B-A3B matching Sonnet 4.5 (released later last year); but I treat those with the same skepticism you do , clearly ;)

▲

yencabulator a day ago | parent | prev [-]

One sure would expect Qwen3.5-35B-A3B to "perform slightly better" than Qwen3-235B-A22B!