Remix.run Logo
simonw 7 days ago

There's something very interesting about being able to serve strong LLMs at much higher token speeds.

Mistral previously partnered with Cerebras on Le Chat: https://www.cerebras.ai/blog/mistral-le-chat

I'm quite surprised that neither OpenAI nor Anthropic appear to have done a similar deal. Their inference is slow in comparison - like 5.10x slower than what Cerebras can achieve.

Google have their own TPUs which seem to be giving them a performance edge. Google AI mode is lightning fast in comparison to GPT-5 Thinking search for result equality that looks to be in the same ballpark.

... that said, on reading the linked press release there's actually no mention of model performance at all:

> a long-term collaboration agreement to explore the use of AI models across ASML’s product portfolio as well as research, development and operations, to benefit ASML customers with faster time to market and higher performance holistic lithography systems.

tempusalaria 7 days ago | parent [-]

Cerebras has very limited scale. Mistral has very few users so they can use cerebra’s in inference whereas OpenAI and Anthropic cannot. If mistral grows a lot they will stop using cerebras