Remix.run Logo
MallocVoidstar 6 hours ago

I'm going to assume this is a predominantly AI-written article since for some reason it's talking about GroqCloud serving Llama 2, which they don't.

It claims they serve Llama 2 7B @ 750 tokens/s with 2K context, but over on OpenRouter Groq is listed as serving Llama 3.1 8B @ 1300 tokens/s with 128K context. (And the official GroqCloud site says 840 tokens/s.)