| ▲ | foobar10000 2 hours ago | |
It needs a closed loop. Strategy -> [ Plan -> [Execute -> FastVerify -> SlowVerify] -> Benchmark -> Learn lessons] -> back to strategy for next big step. Claude teams and a Ralph wiggum loop can do it - or really any reasonable agent. But usually it all falls apart on either brittle Verify or Benchmark steps. What is important is to learn positive lessons into a store that survives git resets, machine blowups, etc… Any telegram bot channel will do :) The entire setup is usually a pain to set up - docker for verification, docker for benchmark, etc… Ability to run the thing quickly, ability for the loop itself to add things , ability to do this in worktree simultaneously for faster exploration - and got help you if you need hardware to do this - for example, such a loop is used to tune and custom-fuse CUDA kernels - which means a model evaluator, big box, etc…. | ||
| ▲ | wahnfrieden 2 hours ago | parent [-] | |
I do it easily just by asking Codex | ||