Remix clone Hacker News

new | show | ask | jobs Github

	▲	MallocVoidstar 6 hours ago
		I'm going to assume this is a predominantly AI-written article since for some reason it's talking about GroqCloud serving Llama 2, which they don't. It claims they serve Llama 2 7B @ 750 tokens/s with 2K context, but over on OpenRouter Groq is listed as serving Llama 3.1 8B @ 1300 tokens/s with 128K context. (And the official GroqCloud site says 840 tokens/s.)