| ▲ | _davide_ 6 hours ago | |||||||
i used to mix remote and local minimax 2.7(q3) on my strix halo, it run at 30 tg and 220 tokens pp... it was a bit painful slow, but it was a good feeling i could stay offline. unfortunately m3 which is at opus .8 levels is 460b parameters and doesn't even fit in 128gb of memory, let alone a big context. strix halo feels like a toy for ai purposes. https://kyuz0.github.io/amd-strix-halo-toolboxes/ | ||||||||
| ▲ | sosodev 5 hours ago | parent [-] | |||||||
My strix halo board is feeling more useful and less toylike with the recent performance gains combined from MTP, better quantization, and generalized performance improvements across the stack. For example, I can run Unsloth's Gemma4-31B 4-bit QAT model with around 30tg and 200pp. I don't find that to be too slow at all. Particularly because it's nearly full accuracy and good enough for a lot of different stuff I throw at it. I think it also helps that I'm using my machine to do home server stuff. It excels at all of the traditional workloads. Then I can lean on the AI to help with automation here and there. I find it deeply satisfying. | ||||||||
| ||||||||