Remix clone Hacker News

new | show | ask | jobs Github

	▲	rjh29 2 hours ago
		I wonder how many years it'll take for the API token cost to exceed the money spent on ram.
	▲	zozbot234 9 minutes ago \| parent [-]
		The DS4 folks are unofficially testing ways to run the model with lower performance on lower-RAM machines. Similar efforts are going on with llama.cpp. The results are a bit of a challenge, prefill time tends to explode which is a limitation if you care about agentic workflows.