Remix.run Logo
jjcm 4 hours ago

A lot of naysayers in the comments, but there are so many uses for non-frontier models. The proof of this is in the openrouter activity graph for llama 3.1: https://openrouter.ai/meta-llama/llama-3.1-8b-instruct/activ...

10b daily tokens growing at an average of 22% every week.

There are plenty of times I look to groq for narrow domain responses - these smaller models are fantastic for that and there's often no need for something heavier. Getting the latency of reponses down means you can use LLM-assisted processing in a standard webpage load, not just for async processes. I'm really impressed by this, especially if this is its first showing.

spot5010 2 hours ago | parent | next [-]

These seem ideal for robotics applications, where there is a low-latency narrow use case path that these chips can serve, maybe locally.

freakynit 4 hours ago | parent | prev | next [-]

Exactly. One easily relatable use-case is structured content extraction or/and conversion to markdown for web page data. I used to use groq for same (gpt-oss20b model), but even that used to feel slow when doing theis task at scale.

LLM's have opened-up natural language interface to machines. This chip makes it realtime. And that opens a lot of use-cases.

redman25 2 hours ago | parent | prev [-]

Many older models are still better at "creative" tasks because new models have been benchmarking for code and reasoning. Pre-training is what gives a model its creativity and layering SFT and RL on top tends to remove some of it in order to have instruction following.