Remix clone Hacker News

new | show | ask | jobs Github

	▲	joelthelion 8 hours ago
		8GB is really low. That said, perhaps there is a niche for slow LLM inference for non-interactive use. For example, if you use LLMs to triage your emails in the background, you don't care about latency. You just need the throughput to be high enough to handle the load.