| ▲ | nu11ptr 6 hours ago | |||||||
What hardware do you have it running on? Do you feel you could replace the frontier models with it for everyday coding? Would/will you? | ||||||||
| ▲ | sosodev 2 hours ago | parent | next [-] | |||||||
Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+ I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes. | ||||||||
| ▲ | politelemon 5 hours ago | parent | prev | next [-] | |||||||
60 to 70 on a 5080, but only tinkering for now. The smaller models seem exceptionally good for what they are, and some can even do OCR reliably. | ||||||||
| ▲ | bigyabai 5 hours ago | parent | prev [-] | |||||||
I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context. > Do you feel you could replace the frontier models with it for everyday coding? Would/will you? Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too. | ||||||||
| ||||||||