Remix.run Logo
erichocean 2 days ago

The tip of the sphere in agentic code harnesses today is to RL train them as dedicated conductor/orchestrator models.

Not 200 lines of Python.

aszen 2 days ago | parent [-]

Can you elaborate on this?

8note 2 days ago | parent | next [-]

as a comparison, the gemini cli agent with gemini 2 half the time writes its own tool call parameters incorrectly. it didnt quite know when to make a tool call, which tool result was the most recent(it always assumed the first one was the one to use, rather than the last one, when multiple reads of the same file were in context) etc.

gemini 3 has pretty clearly been trained for this workflow of text output, since it can actually get the right calls in the first shot most of the time, and pays attention to the end of the context and not just the start.

gemini 3 is sitting within a format of text that it has been trained to be in, where for gemini 2, it only had the prompt to tell it how to work within the tool

erichocean 2 days ago | parent | prev [-]

Here you go: https://research.nvidia.com/labs/lpr/ToolOrchestra/

Big models (like Claude Opus 4.5) can (and do) just RL-train this into the main model.