|
| ▲ | entrope 14 hours ago | parent | next [-] |
| I let Qwen3.6-27B chew on a bug all last night. It choked at some point and stopped responding (probably a context overflow before pi-coding-agent could compact it). Claude Sonnet 4.6 found and fixed the bug in under 10 minutes. Qwen3.6 is pretty amazing for a 27B model, but it's not hard to run into its limits. With a Radeon R9700 and unsloth's 6-bit quantization, I get ~20 TPS and 110k context, so it can do a fair bit quickly. |
| |
| ▲ | 2ndorderthought 14 hours ago | parent [-] | | You definitely need to watch it more than a model 100 times larger. But the fact that it runs one 1 GPU and does what it does is insane. Imagine what a 30b model looks like in 6 months or 1 year? |
|
|
| ▲ | datadrivenangel 14 hours ago | parent | prev | next [-] |
| Inference speed is still slow in a meaningfully different way. The models are good, but not great, and much slower, which for coding means a 2-3 minute task with claude code and opus takes an hour and has a higher chance of being wrong. |
| |
| ▲ | 2ndorderthought 13 hours ago | parent [-] | | It's only slow if you can't afford to run it properly. A lot of people are getting 70-100 tokens per second on 1 gpu. Not sure what Claude opus or sonnet run at. I know when it goes offline it's 0 tokens per second |
|
|
| ▲ | ekjhgkejhgk 15 hours ago | parent | prev [-] |
| We're in the same boat. I would rather have NO llm, than an llm that collects my data (which you should assume is all of them, unless you've been asleep for the last 20 years). Fortunately, I don't have to pick one or the other - instead I run Qwen 3.6 35B A3B. It's a bit slow with my 8gb GPU (I'm in the process of getting a bigger one) but again, to me the choice isn't "what's the best I can get", it's "what's the best local I can get". |