| ▲ | jjcm 4 hours ago | |
A lot of naysayers in the comments, but there are so many uses for non-frontier models. The proof of this is in the openrouter activity graph for llama 3.1: https://openrouter.ai/meta-llama/llama-3.1-8b-instruct/activ... 10b daily tokens growing at an average of 22% every week. There are plenty of times I look to groq for narrow domain responses - these smaller models are fantastic for that and there's often no need for something heavier. Getting the latency of reponses down means you can use LLM-assisted processing in a standard webpage load, not just for async processes. I'm really impressed by this, especially if this is its first showing. | ||
| ▲ | spot5010 2 hours ago | parent | next [-] | |
These seem ideal for robotics applications, where there is a low-latency narrow use case path that these chips can serve, maybe locally. | ||
| ▲ | freakynit 4 hours ago | parent | prev | next [-] | |
Exactly. One easily relatable use-case is structured content extraction or/and conversion to markdown for web page data. I used to use groq for same (gpt-oss20b model), but even that used to feel slow when doing theis task at scale. LLM's have opened-up natural language interface to machines. This chip makes it realtime. And that opens a lot of use-cases. | ||
| ▲ | redman25 2 hours ago | parent | prev [-] | |
Many older models are still better at "creative" tasks because new models have been benchmarking for code and reasoning. Pre-training is what gives a model its creativity and layering SFT and RL on top tends to remove some of it in order to have instruction following. | ||