I like these examples that predictably show the weaknesses of current models.

This reminds me of that example where someone asked an agent to improve a codebase in a loop overnight and they woke up to 100,000 lines of garbage [0]. Similarly you see people doing side-by-side of their implementation and what an AI did, which can also quite effectively show how AI can make quite poor architecture decisions.

This is why I think the “plan modes” and spec driven development are so important effective for agents, because it helps to avoid one of their main weaknesses.

[0] https://gricha.dev/blog/the-highest-quality-codebase

▲

pugworthy 2 days ago | parent [-]

To me, this doesn't show the weakness of current models, it shows the variability of prompts and the influence on responses. Because without the prompt it's hard to tell what influenced the outcome.

I had this long discussion today with a co-worker about the merits of detailed queries with lots of guidance .md documents, vs just asking fairly open ended questions. Spelling out in great detail what you want, vs just generally describing what you want the outcomes to be in general then working from there.

His approach was to write a lot of agent files spelling out all kinds of things like code formatting style, well defined personas, etc. And here's me asking vague questions like, "I'm thinking of splitting off parts of this code base into a separate service, what do you think in general? Are there parts that might benefit from this?"

▲

sothatsit 2 days ago | parent | next [-]

It is definitely a weakness of current models. The fact that people find ways around those weaknesses does not mean the weaknesses do not exist.

Your approach is also very similar to spec driven development. Your spec is just a conversation instead of a planning document. Both approaches get ideas from your brain into the context window.

▲

OccamsMirror 2 days ago | parent | prev [-]

So which approach worked better?

	▲	pugworthy a day ago \| parent [-]
		Challenging to answer, because we're at different levels of programming. I'm Senior / Architect type with many years of experience programming, and he's an ME using code to help him with data processing and analysis. I have a hunch if you asked which approach we took based on background, you'd think I was the one using the detailed prompt approach and him the vague.