It worked for me several times.

It's easy to say that these increasingly popular tools are only able to produce useless junk. You haven't tried, or you haven't "closed the loop" so that the agent can evaluate its own progress toward acceptance criteria, or you are monitoring incompetent feeds of other users.

▲

nikkwong 3 hours ago | parent [-]

I'm definitely bullish on LLM's for coding. It sounds to me as though getting it to run on its own for hours and produce something usable requires more careful thought and setup than just throwing a prompt at it and wishing for the best—but I haven't seen many examples in the wild yet

▲

foobar10000 2 hours ago | parent | next [-]

It needs a closed loop.

Strategy -> [ Plan -> [Execute -> FastVerify -> SlowVerify] -> Benchmark -> Learn lessons] -> back to strategy for next big step.

Claude teams and a Ralph wiggum loop can do it - or really any reasonable agent. But usually it all falls apart on either brittle Verify or Benchmark steps. What is important is to learn positive lessons into a store that survives git resets, machine blowups, etc… Any telegram bot channel will do :)

The entire setup is usually a pain to set up - docker for verification, docker for benchmark, etc… Ability to run the thing quickly, ability for the loop itself to add things , ability to do this in worktree simultaneously for faster exploration - and got help you if you need hardware to do this - for example, such a loop is used to tune and custom-fuse CUDA kernels - which means a model evaluator, big box, etc….

	▲	wahnfrieden 2 hours ago \| parent [-]
		I do it easily just by asking Codex

▲

rcarmo 2 hours ago | parent | prev [-]

well, you can start with https://github.com/rcarmo/go-textile, https://github.com/rcarmo/go-rdp, https://github.com/rcarmo/go-ooxml, https://github.com/rcarmo/go-busybox (still WIP). All of these are essentially SPEC and test-driven and they are all working for me (save a couple of bugs in go-rdp I need to fix myself, and some gaps in the ECMA specs for go-ooxml that require me to provide actual manually created documents for further testing).

I am currently porting pyte to Go through a similar approach (feeding the LLM with a core SPEC and two VT100/VT220 test suites). It's chugging along quite nicely.