Just want to echo the recommendation for qwen3.5:9b. This is a smol, thinking, agentic tool-using, text-image multimodal creature, with very good internal chains of thought. CoT can be sometimes excessive, but it leads to very stable decision-making process, even across very large contexts -something we haven't seen models of this size before.

What's also new here, is VRAM-context size trade-off: for 25% of it's attention network, they use the regular KV cache for global coherency, but for 75% they use a new KV cache with linear(!!!!) memory-token-context size expansion! which means, eg ~100K token -> 1.5gb VRAM use -meaning for the first time you can do extremely long conversations / document processing with eg a 3060.

Strong, strong recommend.

▲

steve_adams_86 2 hours ago | parent | next [-]

I've been building a harness for qwen3.5:9b lately (to better understand how to create agentic tools/have fun) and I'm not going to use it instead of Opus 4.6 for my day job but it's remarkably useful for small tasks. And more than snappy enough on my equipment. It's a fun model to experiment with. I was previously using an old model from Meta and the contrast in capability is pretty crazy.

I like the idea of finding practical uses for it, but so far haven't managed to be creative enough. I'm so accustomed to using these things for programming.

▲

threecheese 12 minutes ago | parent | prev | next [-]

You can really see the limitations of qwen3.5:9b in reasoning traces- it’s fascinating. When a question “goes bad”, sometimes the thinking tokens are WILD - it’s like watching the Poirot after a head injury.

Example: “what is the air speed velocity of a swallow?” - qwen knew it was a Monty Python gag, but couldnt and didnt figure out which one.

▲

ggsp an hour ago | parent | prev | next [-]

How much difference are you seeing between standard and Q4 versions in terms of degradation, and is it constant across tasks or more noticeable in some vs others?

	▲	rnewme an hour ago \| parent [-]
		Less than expected, search for unsloths recent benchmark

▲

dsr_ 27 minutes ago | parent | prev | next [-]

Correction: not thinking, not a creature.

If it was a creature I would feel some sorrow when I killed it.

If you are feeling sorrow when you reboot a machine running an LLM, get to a psychiatrist ASAP.

▲

kingo55 an hour ago | parent | prev [-]

How's it compare in quality with larger models in the same series? E.g 122b?