Remix clone Hacker News

new | show | ask | jobs Github

	▲	jjcm 4 hours ago
		A lot of naysayers in the comments, but there are so many uses for non-frontier models. The proof of this is in the openrouter activity graph for llama 3.1: https://openrouter.ai/meta-llama/llama-3.1-8b-instruct/activ... 10b daily tokens growing at an average of 22% every week. There are plenty of times I look to groq for narrow domain responses - these smaller models are fantastic for that and there's often no need for something heavier. Getting the latency of reponses down means you can use LLM-assisted processing in a standard webpage load, not just for async processes. I'm really impressed by this, especially if this is its first showing.
	▲	spot5010 2 hours ago \| parent \| next [-]
		These seem ideal for robotics applications, where there is a low-latency narrow use case path that these chips can serve, maybe locally.
	▲	freakynit 4 hours ago \| parent \| prev \| next [-]
		Exactly. One easily relatable use-case is structured content extraction or/and conversion to markdown for web page data. I used to use groq for same (gpt-oss20b model), but even that used to feel slow when doing theis task at scale. LLM's have opened-up natural language interface to machines. This chip makes it realtime. And that opens a lot of use-cases.
	▲	redman25 2 hours ago \| parent \| prev [-]
		Many older models are still better at "creative" tasks because new models have been benchmarking for code and reasoning. Pre-training is what gives a model its creativity and layering SFT and RL on top tends to remove some of it in order to have instruction following.