| ▲ | mikewarot 18 hours ago | |
I updated Ollama (again) and changed my windows swap file settings to use up to 200 Gb of C: (an SSD). On the largest model (gemma4:31b), I seem to be getting about 5 tokens per second. This is amazing to me, because I'm using a $100 computer, without any fancy GPU. I love watching it "think". Consider this is thousands of times faster than any written conversations in the past. Those involved pieces of paper being transported, read, considered, replies written, then transported back. If it'll write code that doesn't completely suck, I think even this is good enough. What do you consider the lowest acceptable rate of generating tokens/second? | ||
| ▲ | mudkipdev 18 hours ago | parent [-] | |
Under 15 is too slow for conversation personally. I guess 5 tokens per second is nice if you're one of the people who likes letting coding agents run overnight | ||