| ▲ | danpalmer 4 days ago | |||||||
I thought it was generally accepted that inference was faster on TPUs. This was one of my takeaways from the LLM scaling book: https://jax-ml.github.io/scaling-book/ – TPUs just do less work, and data needs to move around less for the same amount of processing compared to GPUs. This would lead to lower latency as far as I understand it. The citation link you provided takes me to a sales form, not an FAQ, so I can't see any further detail there. > Both Cerebras and Grok have custom AI-processing hardware (not CPUs). I'm aware of Cerebras' custom hardware. I agree with the other commenter here that I haven't heard of Grok having any. My point about knowledge grounding was simply that Grok may be achieving its latency with guardrail/knowledge/safety trade-offs instead of custom hardware. | ||||||||
| ▲ | nl 4 days ago | parent [-] | |||||||
Sorry I meant Groq custom hardware, not Grok! I don't see any latency comparisons in the link | ||||||||
| ||||||||