Remix clone Hacker News

new | show | ask | jobs Github

	▲	general_reveal 7 hours ago
		It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second. So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel. Those are the compute needs. We just need compute everywhere as fast as possible.