Remix clone Hacker News

new | show | ask | jobs Github

	▲	cyanydeez 6 hours ago
		It'd probably be helpful for power users and transparency to actually show how the cache is being used. If you run local models with llamacpp-server, you can watch how the cache slots fill up with every turn; when subagents spawn, you see another process id spin up and it takes up a cache slot; when the model starts slowing down is when the context grows (amd 395+ around 80-90k) and the cache loads are bigger because you've got all that. So yeah, it doesn't take much to surface to the user that the speed/value of their session is ephemeral because to keep all that cache active is computationally expensive because ... You're still just running text through a extremely complex process, and adding to that text and to avoid re-calculation of the entire chain, you need the cache.