| ▲ | Twirrim 6 hours ago |
| I've been finding it very practical to run the 35B-A3B model on an 8GB RTX 3050, it's pretty responsive and doing a good job of the coding tasks I've thrown at it. I need to grab the freshly updated models, the older one seems to occasionally get stuck in a loop with tool use, which they suggest they've fixed. |
|
| ▲ | fy20 3 hours ago | parent | next [-] |
| I guess you are doing offloading to system RAM? What tokens per second do you get? I've got an old gaming laptop with a RTX 3060, sounds like it could work well as a local inference server. |
| |
| ▲ | manmal 2 hours ago | parent [-] | | In the article, they claim up to 25t/s for the LARGEST model with a 24GB VRAM card. Need a lot of RAM obviously |
|
|
| ▲ | ufish235 6 hours ago | parent | prev | next [-] |
| Can you give an example of some coding tasks? I had no idea local was that good. |
| |
| ▲ | hooch 4 hours ago | parent [-] | | Changed into a directory recently and fired up the qwen code CLI and gave it two prompts: "so what's this then?" - to which it had a good summary across stack and product, and then "think you can find something todo in the TODO?" - and while I was busy in Claude Code on another project, it neatly finished three HTML & CSS tasks - that I had been procrastinating on for weeks. This was a qwen3-coder-next 35B model on M4 Max with 64GB which seems to be 51GB size according to ollama. Have not yet tried the variants from the TFA. | | |
| ▲ | manmal 2 hours ago | parent [-] | | 3.5 seems to be better at coding than 3-coder-next, I’d check it out. |
|
|
|
| ▲ | fragmede 6 hours ago | parent | prev [-] |
| Which models would that be? |