| ▲ | Oras 4 hours ago | |||||||||||||||||||||||||
as not custom chips like Grog and Cerebras. Did you expect a single GPU chip to reach 3k tps? | ||||||||||||||||||||||||||
| ▲ | embedding-shape 4 hours ago | parent | next [-] | |||||||||||||||||||||||||
I think many would assume "not enterprise" or "not datacenter grade" when someone says "Standard GPUs", but maybe that specific phrase have a specific meaning I'm not familiar with. Edit: I just tried a 4B model on a RTX Pro 6000, getting ~500 tok/s with llama.cpp not even trying to optimize or change anything, just default settings. I'm sure with vLLM it'd be a lot faster already, still before manually tuning configs. I wouldn't call that card "Standard GPU" either FWIW, but it makes the claimed performance numbers feel not as exciting, especially given the hardware they were using. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | bcjdjsndon an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||
> Did you expect a single GPU chip to reach 3k tps? Did the article headline not say Standard GPU? | ||||||||||||||||||||||||||
| ▲ | WithinReason 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
so what would be the above-standard GPUs then that they are excluding? Cerebras is not GPU | ||||||||||||||||||||||||||
| ▲ | 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||