Remix.run Logo
MoreQARespect 2 hours ago

Humans have the ability to retrospect, push back on a faulty spec, push back on an unclarified spec, do experiments, make judgement calls and build tools and processes to account for their own foibles.

wizzwizz4 an hour ago | parent | next [-]

Humans also have the ability to introspect. Ultimately, (nearly) every software project is intended to provide a service to humans, and most humans are similar in most ways: "what would I want it to do?" is a surprisingly-reliable heuristic for dealing with ambiguity, especially if you know where you should and shouldn't expect it to be valid.

The best LLMs can manage is "what's statistically-plausible behaviour for descriptions of humans in the corpus", which is not the same thing at all. Sometimes, I imagine, that might be more useful; but for programming (where, assuming you're not reinventing wheels or scrimping on your research, you're often encountering situations that nobody has encountered before), an alien mind's extrapolation of statistically-plausible human behaviour observations is not useful. (I'm using "alien mind" metaphorically, since LLMs do not appear particularly mind-like to me.)

bluGill 31 minutes ago | parent [-]

Most companies I've worked for have had 'know the customer' events so that developers learn what the customers really do and in turn even if we are not in their domain we have a good idea what they care about.

pablobaz an hour ago | parent | prev [-]

which bits of this do you think llm based agents can't do?

interstice an hour ago | parent | next [-]

Not get stuck on an incorrect train of thought, not ignore core instructions in favour of training data like breaking naming conventions across sessions or long contexts, not confidently state "I completely understand the problem and this will definitely work this time" for the 5th time without actually checking. I could go on.

ModernMech an hour ago | parent | prev [-]

The main thing they cannot do is be held accountable for any decisions, which makes them not trustworthy.

vbezhenar an hour ago | parent [-]

This is not correct. They can say "sorry" which makes them as accountable as ordinary developer.

interstice an hour ago | parent | next [-]

I've found recent versions of Claude and codex to be reluctant in this regard. They will recognise the problem they created a few minutes ago but often behave as if someone else did it. In many ways that's true though, I suppose.

bluefirebrand 20 minutes ago | parent | prev [-]

That's not what accountability is