Remix.run Logo
bad_username 3 hours ago

> There is no world where you input a document lacking clarity and detail and get a coding agent to reliably fill in that missing clarity and detail

That is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in. Furthermore, LLMs are the ultimate detail fillers, because they are language interpolation/extrapolation machines. And their popularity is precisely because they are usually very good at filling in details: LLMs use their vast knowledge to guess what detail to generate, so the result usually makes sense.

This doesn't detract much from the main point of the article though. Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved, important details have to be constrained, and for that they have to be specified. And whereas we have decades of tools and culture for coding, we largely don't have that for extremely detailed specs (except maybe at NASA or similar places). We could figure it out in the future, but we haven't yet.

Someone 3 hours ago | parent | next [-]

> That is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions.

LLMs can generate (relatively small amounts of) working code from relatively terse descriptions, but I don’t think they can do so _reliably_.

They’re more reliable the shorter the code fragment and the more common the code, but they do break down for complex descriptions. For example, try tweaking the description of a widely-known algorithm just a little bit and see how good the generated code follows the spec.

> Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved

Seems you agree they _cannot_ reliably generate (relatively small amounts of) working code from relatively terse descriptions

mike_hearn 2 hours ago | parent [-]

Neither can humans, but the industry has decades of experience with how to instruct and guide human developer teams using specs.

dxdm an hour ago | parent | next [-]

Usually, you don't want your developers to be coding monkeys, for good results. You need the human developer in the loop to even define the spec, maybe contributing ideas, but at the very least asking questions about "what happens when..." and "have you thought about...".

In fact, this is a huge chunk of the value a developer brings to the table.

MoreQARespect 28 minutes ago | parent | prev [-]

Humans have the ability to retrospect, push back on a faulty spec, push back on an unclarified spec, do experiments, make judgement calls and build tools and processes to account for their own foibles.

36 minutes ago | parent | prev | next [-]
[deleted]
lmm 3 hours ago | parent | prev | next [-]

> LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in.

They can generate boilerplate, sure. Or they can expand out a known/named algorithm implementation, like pulling in a library. But neither of those is generating detail that wasn't there in the original (at most it pulls in the detail from somewhere in the training set).

tibbe 2 hours ago | parent [-]

They do more than that. If you ask for ui with a button that button won't be upside down even if you didn't specify its orientation. Lots of the detail can be inferred from general human preferences, which are present in the LLMs' training data. This extends way beyond CS stuff like details of algorithm implementations.

zabzonk an hour ago | parent | next [-]

Isn't "not being upsidedown" just one of the default properties of a button in whatever GUI toolkit you are using? I'd be worried if an LLM _did_ start setting all the possible button properties.

MoreQARespect 16 minutes ago | parent [-]

Putting LLMs on a pedestal is very much in vogue these days.

skywhopper 2 hours ago | parent | prev [-]

That’s exactly what they said. Details “elsewhere in its training set”.

skywhopper 2 hours ago | parent | prev | next [-]

“LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions”

Only with well-known patterns that represent shared knowledge specified elsewhere. If the details they “fill in” each time differ in ways that change behavior, then the spec is deficient.

If we “figure out” how to write such detailed specs in the future, as you suggest, then that becomes the “code”.

roysting an hour ago | parent | prev [-]

In get the sense that what you are responding to and even many comments to yours are expressing a kind of coping with the current dynamic, only exacerbated by the rather elitist and egoistic mentality that people in tech have had for a very long time now; i.e., they are falling…being pushed from Mt Olympus and there is A LOT of anxious rationalization going on.

Not a mere 5 years ago even tech people were chortling down their upturned noses that people were complaining that their jobs were being “taken”, and now that the turns have tabled, there is a bunch of denial, anger, and grief going on, maybe even some depression as many of the recently unemployed realize the current state of things.

It’s all easy to deride the inferiority of AI when you’re employed in a job doing things as you had been all your career, thinking you cannot be replaced… until you find yourself on the other side of the turn that has tabled.

otikik an hour ago | parent | next [-]

I use AI for my work every single day - and during some weekends too. Claude Code, with Opus. It is far from being able to reliably produce the code that we need for production. It produces code that looks ok most of the time, but I have seen it lose track of key details, misinterpret requisites and even ignore them sometimes - "on purpose", as in it writing something like "let's not do that requirement, it's not necessary".

This kind of thing happens at least once per day to me, maybe more.

I am not denying that it is useful, let me be clear. It is extremely convenient, especially for mechanical tasks. It has other advantages like quick exploration of other people's code, for example. If my employer didn't provide a corporate account for me, I would pay one from my own pocket.

That said, I agree with OP and the author that it is not reliable when producing code from specs. It does things right, I would say often. That might be good enough for some fields/people. It's good enough for me, too. I however review every line it produces, because I've seen it miss, often, as well.

rdevilla an hour ago | parent | prev [-]

I can't help but imagine that this is how some people felt about doctors once webmd came out.

It's some nice rhetoric, but you're not actually saying much.