Remix.run Logo
nl 4 days ago

Sorry I meant Groq custom hardware, not Grok!

I don't see any latency comparisons in the link

danpalmer 4 days ago | parent [-]

The link is just to the book, the details are scattered throughout. That said the page on GPUs specifically speaks to some of the hardware differences and how TPUs are more efficient for inference, and some of the differences that would lead to lower latency.

https://jax-ml.github.io/scaling-book/gpus/#gpus-vs-tpus-at-...

Re: Groq, that's a good point, I had forgotten about them. You're right they too are doing a TPU-style systolic array processor for lower latency.