Remix.run Logo
zengid 3 hours ago

any tips for running it locally within an agent harness? maybe using pi or opencode?

stratos123 2 hours ago | parent [-]

It pretty much just works. Run the unsloth quant in llama.cpp and hook it up to pi. A bunch of minor annoyances like not having support for thinking effort. It also defaults to "interleaved thinking" (thinking blocks get stripped from context), set `"chat_template_kwargs": {"preserve_thinking": True},` if you interrupt the model often and don't want it to forget what it was thinking.