The most successful applications like coding are not the result of pure LLM/generative modeling. They come from closing the loop with an agentic harness. The generate-test-selectively refine loop is the core modality of scientific work. An LLM + RL with Verifiable Rewards + feedback from compiler/terminal runs mimics this process to a great extend.

This is Fisher/Box feedback loop (https://www-sop.inria.fr/members/Ian.Jermyn/philosophy/writi...) implemented on a modern computational system. LLM is just a component. I wish Sutton had commented on this fuller picture of what we have now instead of commenting just on the LLM/Backprop side of things. I am honestly curious of whether such a loop can at least partially automate discovery.

There are more elements to discovery though. It is still not clear where the initial working model/hypothesis comes from or how the updates are selected (unless it is just parameter induction). I recently read about Hanson's Patterns of Discovery which aims in that direction. I have still not read it, but I am curious if it has any mechanistic clues.

▲

flir 2 hours ago | parent [-]

Completely agree on the importance of the harness.

The problem I see is the same problem Evolutionary Algorithms had: you can generate potential solutions until you run out of cash, but you still need to evalulate those solutions. You need a fitness function, and that means you need to at least know the general shape of the solution. If anyone knows of any work towards more open-ended fitness functions, I'd love to read it.

	▲	piker 2 hours ago \| parent [-]
		Seems to a layperson like myself that in Math they're using Lean and in programming contexts they're using compilers, such that the models themselves tend towards embedding that determinism "intuitively".