Remix.run Logo
UltraSane 2 hours ago

It is more like restricting the mechanic to only using commercially available tools and not allow them to create CUSTOM tools.

fc417fc802 an hour ago | parent [-]

No, that would be analogous to disallowing customized harnesses, ie tooling specially crafted by someone else for the specific task at hand. Insisting that an LLM solve something without the ability to make use of any external tooling whatsoever is almost perfectly analogous to insisting that a human mechanic work on a car with nothing but his own bare hands.

The wrench is to the mechanic as the stock python repl is to the LLM.

UltraSane an hour ago | parent [-]

They want the LLM that does the ARC-AGI-3 to be the same LLM that everyone uses.

fc417fc802 an hour ago | parent [-]

Rephrase that in terms of the human mechanic and hopefully you can see the error of that reasoning. LLMs that perform tasks (as opposed to merely holding conversations) use tools just like we do. That's literally how we design them to operate.

In fact the LLMs that everyone uses today typically have access to specialized task specific tooling. Obviously specialized tools aren't appropriate for a test that measures the ability to generalize but generic tools are par for the course. Writing a bot to play a game for you would certainly serve to demonstrate an understanding of the task.