Remix clone Hacker News

new | show | ask | jobs Github

	▲	aesthesia 3 hours ago
		Thinking shouldn't be too hard to deal with---just let the model generate freely until it hits a </think> token, then do constrained decoding, right?
	▲	stymaar 13 minutes ago \| parent [-]
		Sure, but does llama-cpp support that?