Remix clone Hacker News

new | show | ask | jobs Github

	▲	brikym 2 hours ago
		I wonder what latency and tok/s this model on Groq or Cerebras would be capable of. I have a couple LLM driven games [1][2] where speed is really important to the experience. Currently the best performance I can get is the gpt-oss models on Groq or Cerebras but they need quite a bit of extra context and tools to correct for mistakes. I'm making a bet I'll be able to get the same performance much cheaper in the next few months. [1] https://sleuththetruth.com [2] https://lextension.net/