> what they cannot do is maintain clear mental models

The more I use claude code, the more frustrated I get with this aspect. I'm not sure that a generic text-based LLM can properly solve this.

▲

dlivingston 7 days ago | parent | next [-]

Reminds me of how Google's Genie 3 can only run for a ~minute before losing its internal state [0].

My gut feeling is that this problem won't be solved until some new architecture is invented, on the scale of the transformer, which allows for short-term context, long-term context, and self-modulation of model weights (to mimic "learning"). (Disclaimer: hobbyist with no formal training in machine learning.)

[0]: https://news.ycombinator.com/item?id=44798166

	▲	skydhash 7 days ago \| parent [-]
		It’s the nature of formal system. Someones need to actually do the work of defining those rules or have a smaller set of rules that can generate the larger set. But anytime you invent a rule. That means a few things that are possible can’t be represented in the system. You’re mostly hoping that those things aren’t meaningful. LLMs techniques allows us to extract rules from text and other data. But those data are not representative of a coherent system. The result itself is incoherent and lacks anything that wasn’t part of the data. And that’s normal. It’s the same as having a mathematical function. Every point that it maps to is meaningful, everything else may as well not exists.

▲

elephanlemon 7 days ago | parent | prev | next [-]

I’ve been thinking about this recently… maybe a more workable solution at the moment is to run a hierarchy of agents, with the top level one maintaining the general mental model (and not filling its context with anything much more than “next agent down said this task was complete”). Definitely seems like anytime you try to have one Code agent run everything it just goes off the rails sooner or later, ignoring important details from your original instructions, failing to make sure it’s adhering to CLAUDE.md, etc. I think you can do this now with Code’s agent feature? Anyone have strategies to share?

	▲	skydhash 7 days ago \| parent [-]
		Telephone game don’t work that well. That’s how an emperor can be isolated in his palace and every edict becomes harmful. It’s why architect/developer didn’t work. You need to be aware of all the context you need to make sure you’ve done a good job

▲

cmrdporcupine 7 days ago | parent | prev | next [-]

Honestly it forces you -- rightfully -- to step back and be the one doing the planning.

You can let it do the grunt coding, and a lot of the low level analysis and testing, but you absolutely need to be the one in charge on the design.

It frankly gives me more time to think about the bigger picture within the amount of time I have to work on a task, and I like that side of things.

There's definitely room for a massive amount of improvement in how the tool presents changes and suggestions to the user. It needs to be far more interactive.

	▲	mock-possum 7 days ago \| parent \| next [-]
		That’s my experience as well - I’m the one with the mental model, my responsibility is using text to communicate that model to the LLM using language it will recognize from its training data to generate the code to follow suit. My experience with prompting LLMs for codegen is really not much different from my experience with querying search engines - you have to understand how to ‘speak the language’ of the corpus being searched, in order to find the results you’re looking for.
	▲	micromacrofoot 7 days ago \| parent \| prev [-]
		Yes this is exactly it, you need to talk to Claude about code on a design/architecture level... just telling it what you want the code to output will get you stuck in failure loops. I keep saying it and no one really listens: AI really is advanced autocomplete. It's not reasoning or thinking. You will use the tool better if you understand what it can't do. It can write individual functions pretty well, stringing a bunch of them together? not so much. It's a good tool when you use it within its limitations.

▲

edaemon 7 days ago | parent | prev | next [-]

Same here. I have used this tool which helps a bit: https://github.com/rizethereum/claude-code-requirements-buil...

That and other tricks have only made me slightly less frustrated, though.

▲

SoftTalker 7 days ago | parent | prev [-]

Is this really that diffferent from the "average" programmer, especially a more junior one?

> LLMs get endlessly confused: they assume the code they wrote actually works; when test fail, they are left guessing as to whether to fix the code or the tests; and when it gets frustrating, they just delete the whole lot and start over.

I see this constantly with mediocre developers. Flailing, trying different things, copy-pasting from StackOverflow without understanding, ultimately deciding the compiler must have a bug, or cosmic rays are flipping bits.

▲

layer8 7 days ago | parent | next [-]

The article explicitly calls out that that’s what they are looking for in a competent software engineer. That incompetent developers exist, and that junior developers tend to not be very competent yet, doesn’t change anything about that. The problem with LLMs is that they’re already the final product of training/learning, not the starting point. The (in)ability of an LLM to form stable mental models is fixed in its architecture, and isn’t anything you can teach them.

▲

SoftTalker 7 days ago | parent [-]

I just (re) read the article and the word "competent" doesn't appear in it. It doesn't discuss human developer competency at all, except in comparison to LLMs.

	▲	layer8 7 days ago \| parent [-]
		Yes, I replaced “effective” by “competent” in my response, because I found that word slightly preferable in the context discussed.

▲

Xss3 7 days ago | parent | prev | next [-]

I feel like something is wrong where you are, maybe your juniors do not feel incentivized or encouraged to learn, code reviews might not be strict enough, quality may not be valued enough, and immense pressure to move tickets might be put on people, or all of the above in various doses.

I feel this way because at my company our interns on a gap year from their comp sci degree don't blame the compiler, cosmic bits, or blindly copy from stack overflow.

They're incentivized and encouraged to learn and absolutely choose to do so. The same goes for seniors.

If you say 'I've been learning about X for ticket Y' in the standup people basically applaud it, managers like us training ourselves to be better.

Sure managers may want to see a brief summary or a write-up applicable to our department if you aren't putting code down for a few days, but that's the only friction.

▲

hahn-kev 6 days ago | parent | prev [-]

I find it impressive that LLMs can so closely mimic the behaviour of a junior dev. Even if that's not a desirable outcome it's still impressive and interesting.