| ▲ | d4rkp4ttern a day ago | |||||||
For token-generation speed, a challenging test is to see how it performs in a code-agent harness like Claude Code, which has anywhere between 15-40K tokens from the system prompt itself (+ tools/skills etc). Here the 26B-A4B variant is head and shoulders above recent open-weight models, at least on my trusty M1 Max 64GB MacBook. I set up Claude Code to use this variant via llama-server, with 37K tokens initial context, and it performs very well: ~40 tokens/sec, far better than Qwen3.5-35B-A3B, though I don't know yet about the intelligence or tool-calling consistency. Prompt processing speed is comparable to the Qwen variant at ~400 tok/s. My informal tests, all with roughly 30K-37K tokens initial context:
Full instructions for running this and other open-weight models with Claude Code are here:https://pchalasani.github.io/claude-code-tools/integrations/... | ||||||||
| ▲ | JoshPurtell a day ago | parent [-] | |||||||
gpt oss 20b is not dense | ||||||||
| ||||||||