| ▲ | segmondy 2 hours ago | |||||||||||||
I have been putting up with it forever. We are spoiled by MixtureOfExperts. Folks were delighted to run llama3-70B at such speed. We were happy with 15-20tk/sec with 8b models, and if you could run llama3-405B at 1tk/sec you were a god. To each their own. I can live with 6 high quality tokens. If I could get a Fable equivalent model, I'll gladly take 2tk/sec if that's what it took to run it locally. | ||||||||||||||
| ▲ | manmal 2 hours ago | parent | next [-] | |||||||||||||
But what is it doing for you that you couldn’t do yourself at that speed? I‘m really curious and on the fence of partly going local. | ||||||||||||||
| ||||||||||||||
| ▲ | froh 2 hours ago | parent | prev [-] | |||||||||||||
do you use caveman or similar? | ||||||||||||||