Remix clone Hacker News

new | show | ask | jobs Github

	▲	zengid 3 hours ago
		any tips for running it locally within an agent harness? maybe using pi or opencode?
	▲	stratos123 2 hours ago \| parent [-]
		It pretty much just works. Run the unsloth quant in llama.cpp and hook it up to pi. A bunch of minor annoyances like not having support for thinking effort. It also defaults to "interleaved thinking" (thinking blocks get stripped from context), set `"chat_template_kwargs": {"preserve_thinking": True},` if you interrupt the model often and don't want it to forget what it was thinking.