Remix.run Logo
ryukoposting a day ago

A footnote in the GPT-5 announcement was that you can now give OpenAI's API a context-free grammar that the LLM must follow. One way of thinking about this feature is that it's a user-defined world model. You could tell the model "the sky is" => "blue" for example.

Obviously you can't actually use this feature as a true world model. There's just too much stuff you have to codify, and basing such a system on tokens is inherently limiting.

The basic principle sounds like what we're looking for, though: a strict automata or rule set that steers the model's output reliably and provably. Perhaps a similar kind of thing that operates on neurons, rather than tokens? Hmm.

spindump8930 17 hours ago | parent | next [-]

It's good to have this support in APIs but grammar constrained decoding has been around for quite a while, even before the contemporary LLM era (e.g. [1] is similar in spirit). Local vs global planning is a huge issue here though - if you enforce local constraints during decoding time, an LLM might be forced to make suboptimal token decisions. This could result in a "global" (i.e. all tokens) miss, where the probability of the constrained output is far lower than the probability of the optimal response (which may also conform to the grammar). Algorithms like beam search can alleviate this, but it's still difficult. This is one of the reasons that XML tags work better than JSON outputs - less constraints on "weird" tokens.

[1] https://aclanthology.org/P17-2012/

ijk 16 hours ago | parent | prev | next [-]

Oh, OpenAI finally added it? Structured generation has been available in things like llama.cpp and Instructor for a while, so I was wondering if they were going to get around to adding it.

In the examples I've seen, it's not something you can define an entire world model in, but you can sure constrain the immediate action space so the model does something sensible.

nxobject a day ago | parent | prev | next [-]

> There's just too much stuff you have to codify, and basing such a system on tokens is inherently limiting.

As a complete amateur who works in embedded: I imagine the restriction to a linear, ordered input stream is fundamentally limiting as well, even with the use of attention layers.

gavmor a day ago | parent | prev [-]

I suspect something more akin to a LoRA and/or circuit tracing will help us keep track of the truth.