"At one point we tried “improving” the prompt with Claude’s help. It ballooned to 1,500 words. The agent immediately got slower and dumber. We went back to 103 words and it was back on track."

Isn't this the exact opposite of every other piece of advice we have gotten in a year?

Another general feedback just recently, someone said we need to generate 10 times, because one out of those will be "worth reviewing"

How can anyone be doing real engineering in such a: pick the exact needle out of the constantly churning chaos-simulation-engine that (crashes least, closest to desire, human readable, random guess)

▲

joshka 8 days ago | parent | next [-]

One of the big things I think a lot of tooling misses, which Geoffrey touches on is the automated feedback loops built into the tooling. I expect you could probably incorporate generation time and token cost to automatically self tune this over time. Perhaps such things as discovering which prompts and models are best for which tasks automatically instead of manually choosing these things.

You want to go meta-meta? Get ralph to spawn subagents that analyze the process of how feedback and experimentation with techniques works. Perhaps allocate 10% of the time and effort to identifying what's missing that would make the loops more effective (better context, better tooling, better feedback mechanism, better prompts, ...?). Have the tooling help produce actionable ideas for how humans in the loop can effectively help the tooling. Have the tooling produce information and guidelines for how to review the generated code.

I think one of the big things missing in many of the tools currently available is tracking metrics through the entire software development loop. How long does it take to implement a feature. How many mistakes were made? How many errors were caught by tests? How many tokens does it take? And then using this information to automatically self-tune.

▲

xboxnolifes 7 days ago | parent | prev | next [-]

Its not the exact opposite of what ive been reading. Basically every person claiming to have success with LLM coding that ive read have said that too long of a prompt leads to too much context which leads to the LLM diverging from working on the problem as desired.

▲

mistrial9 8 days ago | parent | prev | next [-]

the core might be - the difference between an LLM context window, and an agent's orders in a text. LLM itself is a core engine, running in an environment of some kind (instruct vs others?). Agents on the other hand, are descendants of the old Marvin Minsky stuff in a way.. it has objectives and capacities, at a glance. LLMs are connected to modern agents because input text is read to start the agent.. inner loops are intermediate outputs of LLM, in language. There is no "internal code" to this set of agents, it is speaking in code and text to the next part of the internal process.

There are probably big oversights or errors in that short explanation. The LLM engine, the runner of the engine, and the specifics of some environment, make a lot of overlap and all of it is quite complicated.

hth

▲

Rastonbury 8 days ago | parent | prev | next [-]

For the work they are doing porting and building off a spec there is already good context in the existing code and spec, compared with net new features in a greenfield project.

▲

dhorthy 8 days ago | parent | prev [-]

Hmm what sorts of advice in the last year are you referring to? Like the “run it ten times and pick the best one” thing? Or something else?

I kind of agree that picking from 10 poorly-promoted projects is dumb.

The engineering is in setting up the engine and verification so one agent can get it right (or 90% right) on a single run (of the infinite ish loop)

	▲	jjani 8 days ago \| parent [-]
		> Hmm what sorts of advice in the last year are you referring to? They're almost certainly referring to first creating a fleshed out spec and then having it implement that, rather than just 100 words.