Remix.run Logo
omneity 2 days ago

You might actually get that desired behavior through reasoning, or if the model was reinforced for coding workflows involving COM, or at least enough stack diversity for the model to encounter the need to develop this capability.

In the case of LLMs with reasoning, they might pull this off because reasoning is in fact a search in the direction of extra considerations that improve its performance on the task. This is measured by the verifier during reasoning training, which the LLM learns to emulate during inference hence improved performance.

As for RL coding training, the difference can be slightly blurry since reasoning is also done with RL, but for coding models specifically they also discover additional considerations, or even recipes, through self play against a code execution environment. If that environment includes COM and the training data has COM-related tasks, then the process has a chance to discover the behavior you described and reinforce it during training increasing its likelihood during actual coding.

LLMs are not really just autocomplete engines. Perhaps the first few layers or for base models can be seen as such, but as you introduce instruct and reinforcement tuning LLMs build progressively higher levels of conceptual abstractions from words to sentences to tasks like CNNs learn basic geometric features then composing those into face parts and so on.