| ▲ | musebox35 4 hours ago | |||||||
The most successful applications like coding are not the result of pure LLM/generative modeling. They come from closing the loop with an agentic harness. The generate-test-selectively refine loop is the core modality of scientific work. An LLM + RL with Verifiable Rewards + feedback from compiler/terminal runs mimics this process to a great extend. This is Fisher/Box feedback loop (https://www-sop.inria.fr/members/Ian.Jermyn/philosophy/writi...) implemented on a modern computational system. LLM is just a component. I wish Sutton had commented on this fuller picture of what we have now instead of commenting just on the LLM/Backprop side of things. I am honestly curious of whether such a loop can at least partially automate discovery. There are more elements to discovery though. It is still not clear where the initial working model/hypothesis comes from or how the updates are selected (unless it is just parameter induction). I recently read about Hanson's Patterns of Discovery which aims in that direction. I have still not read it, but I am curious if it has any mechanistic clues. | ||||||||
| ▲ | flir 2 hours ago | parent [-] | |||||||
Completely agree on the importance of the harness. The problem I see is the same problem Evolutionary Algorithms had: you can generate potential solutions until you run out of cash, but you still need to evalulate those solutions. You need a fitness function, and that means you need to at least know the general shape of the solution. If anyone knows of any work towards more open-ended fitness functions, I'd love to read it. | ||||||||
| ||||||||