| ▲ | general_reveal 7 hours ago | |
It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second. So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel. Those are the compute needs. We just need compute everywhere as fast as possible. | ||