That reduces the quality of the response though.

debugnik 6 hours ago | parent | next [-]

As opposed to emitting non-JSON tokens and having to throw away the answer?

	▲	written-beyond 5 hours ago \| parent \| next [-]
		Don't shoot the messenger
	▲	jgalt212 6 hours ago \| parent \| prev [-]
		Or just run json.dumps on the correct answer in the wrong format.

▲

Der_Einzige 5 hours ago | parent | prev [-]

THIS IS LIES: https://blog.dottxt.ai/say-what-you-mean.html

I will die on this hill and I have a bunch of other Arxiv links from better peer reviewed sources than yours to back my claim up (i.e. NeurIPS caliber papers with more citations than yours claiming it does harm the outputs)

Any actual impact of structured/constrained generation on the outputs is a SAMPLER problem, and you can fix what little impact may exist with things like https://arxiv.org/abs/2410.01103

Decoding is intentionally nerfed/kept to top_k/top_p by model providers because of a conspiracy against high temperature sampling: https://gist.github.com/Hellisotherpeople/71ba712f9f899adcb0...

	▲	iugtmkbdfil834 an hour ago \| parent \| next [-]
		I honestly would like to hope people were more up in arms over this, but.. based on historical human tendencies, convenience will win here.
	▲	otabdeveloper4 an hour ago \| parent \| prev [-]
		I use LLMs for Actual Work (boring shit). I always set temperature to literally zero and don't sample.