▲ | spindump8930 17 hours ago | |
It's good to have this support in APIs but grammar constrained decoding has been around for quite a while, even before the contemporary LLM era (e.g. [1] is similar in spirit). Local vs global planning is a huge issue here though - if you enforce local constraints during decoding time, an LLM might be forced to make suboptimal token decisions. This could result in a "global" (i.e. all tokens) miss, where the probability of the constrained output is far lower than the probability of the optimal response (which may also conform to the grammar). Algorithms like beam search can alleviate this, but it's still difficult. This is one of the reasons that XML tags work better than JSON outputs - less constraints on "weird" tokens. |