Interestingly, that gives a different response distribution from simply regenerating while the output doesn't match the schema.

▲

Rudybega 5 days ago | parent | next [-]

This is true, but there are methods to greatly reduce the effect of this and generate results that match or even improve overall output accuracy:

e.g. DOMINO https://arxiv.org/html/2403.06988v1

▲

joshred 5 days ago | parent | prev [-]

It sounds like they are describing a regex filter being applied to the model's beam search. LLMs generate the most probable words, but they are frequently tracking several candidate phrases at a time and revising their combined probability. It lets them self correct if a high probability word leads to a low probability phrase.

I think they are saying that if highest probability phrase fails the regex, the LLM is able to substitute the next most likely candidate.

	▲	stavros 5 days ago \| parent [-]
		You're actually applying a grammar to the token. If you're outputting, for example, JSON, you know what characters are valid next (because of the grammar), so you just filter out the tokens that don't fit the grammar.