| ▲ | rybosome 10 hours ago | ||||||||||||||||
I have heard this argument before, but never actually seen concrete evals. The argument goes that because we are intentionally constraining the model - I believe OAI’s method is a soft max (I think, rusty on my ML math) to get tokens sorted by probability then taking the first that aligns with the current state machine - we get less creativity. Maybe, but a one-off vibes example is hardly proof. I still use structured output regularly. Oh, and tool calling is almost certainly implemented atop structured output. After all, it’s forcing the model to respond with a JSON schema representing the tool arguments. I struggle to believe that this is adequate for tool calling but inadequate for general purpose use. | |||||||||||||||||
| ▲ | crystal_revenge 10 hours ago | parent [-] | ||||||||||||||||
> but never actually seen concrete evals. The team behind the Outlines library has produced several sets of evals and repeatedly shown the opposite: that constrained decoding improves model performance (including examples of "CoT" which the post claims isn't possible). [0,1] There was a paper that claimed constrained decoding hurt performance, but it had some fundamental errors which they also wrote about [2]. People get weirdly superstitious when it comes to constrained decoding as though t somehow "limiting the model" when it's just a simple as applying a conditional probably distribution to the logits. I also suspect this post is largely to justify the fact that BAML parses the results (since the post is written by them). 0. https://blog.dottxt.ai/performance-gsm8k.html | |||||||||||||||||
| |||||||||||||||||