| ▲ | fsuts 2 hours ago | ||||||||||||||||||||||
6 tokens per second? Can you put up with that? As seems very slow. I aim for 40t/s on a laptop and choose models that deliver that speed over larger slower ones | |||||||||||||||||||||||
| ▲ | segmondy 2 hours ago | parent [-] | ||||||||||||||||||||||
I have been putting up with it forever. We are spoiled by MixtureOfExperts. Folks were delighted to run llama3-70B at such speed. We were happy with 15-20tk/sec with 8b models, and if you could run llama3-405B at 1tk/sec you were a god. To each their own. I can live with 6 high quality tokens. If I could get a Fable equivalent model, I'll gladly take 2tk/sec if that's what it took to run it locally. | |||||||||||||||||||||||
| |||||||||||||||||||||||