| ▲ | NitpickLawyer 9 hours ago | |
> Would instead of the RL step a constrained decoding say via something like xgrammar fix syntax generation issue ? It can, but you have to consider two things here: a) constrained decoding ensures adherence to syntax, not semantics. Say you're editing a field in an enum in rust. You can write syntactically correct rust code that doesn't address the new field further in the code (say in a switch). You'd get correctly syntactic code, but the compiler will scream at you. RL works on both. b) if your goal is to further train the model, so it works on many tasks, RL helps with exploring new paths and training the model further. Constrained grammars help with inference, but the model doesn't "learn" anything. With RL you can also have many reward functions at the same time. Say one that rewards good syntax, one that rewards "closing" all the functions so tree-sitter doesn't complain, and one that rewards 0 errors from the compiler. The model gets to train on all 3 at the same time. | ||