Remix.run Logo
LarsDu88 3 hours ago

I disagree with this. Reinforcement learning with verifiable rewards training is actually the secret sauce that is leading Claude and GPT to automating software engineering tasks.

All the easily verifiable domains such as mathematics, coding, and things that can be run inside a reasonable simulation are falling very very fast.

By next year if not sooner, mathematicians will be wildly outpaced by LLMs for reasoning.

Alex_L_Wood an hour ago | parent | next [-]

Coding is anything but “easily” verifiable.

LarsDu88 24 minutes ago | parent [-]

It's extremely verifiable. The reinforcement finetuning strategy I'm referring to involves LLM creating coding tasks with an expected output, implementing the code, and then having a compiler (or interpreter in the case of languages like python) succeed or fail to run the code. Then compare the output to expected output. The verification process (run interpreter + run test) can be done in seconds. One can generate millions of datasets like this for free and there is extensive research showing with the right policy, an agent will be able to learn to reason - first as good as human, and in many cases superior to a human.

2 hours ago | parent | prev [-]
[deleted]