Remix.run Logo
mcqueenjordan 8 hours ago

As usual with Oxide's RFDs, I found myself vigorously head-nodding while reading. Somewhat rarely, I found a part that I found myself disagreeing with:

> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.

Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.

Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".

I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.

averynicepen 7 hours ago | parent | next [-]

Writing is an expression of an individual, while code is a tool used to solve a problem or achieve a purpose.

The more examples of different types of problems being solved in similar ways present in an LLM's dataset, the better it gets at solving problems. Generally speaking, if it's a solution that works well, it gets used a lot, so "good solutions" become well represented in the dataset.

Human expression, however, is diverse by definition. The expression of the human experience is the expression of a data point on a statistical field with standard deviations the size of chasms. An expression of the mean (which is what an LLM does) goes against why we care about human expression in the first place. "Interesting" is a value closely paired with "different".

We value diversity of thought in expression, but we value efficiency of problem solving for code.

There is definitely an argument to be made that LLM usage fundamentally restrains an individual from solving unsolved problems. It also doesn't consider the question of "where do we get more data from".

>the code you actually want to ship is so far from what LLMs write

I think this is a fairly common consensus, and my understanding is the reason for this issue is limited context window.

twodave 7 hours ago | parent [-]

I argue that the intent of an engineer is contained coherently across the code of a project. I have yet to get an LLM to pick up on the deeper idioms present in a codebase that help constrain the overall solution towards these more particular patterns. I’m not talking about syntax or style, either. I’m talking about e.g. semantic connections within an object graph, understanding what sort of things belong in the data layer based on how it is intended to be read/written, etc. Even when I point it at a file and say, “Use the patterns you see there, with these small differences and a different target type,” I find that LLMs struggle. Until they can clear that hurdle without requiring me to restructure my entire engineering org they will remain as fancy code completion suggestions, hobby project accelerators, and not much else.

lukasb 8 hours ago | parent | prev | next [-]

One difference is that clichéd prose is bad and clichéd code is generally good.

joshka 8 hours ago | parent [-]

Depends on what your prose is for. If it's for documentation, then prose which matches the expected tone and form of other similar docs would be clichéd in this perspective. I think this is a really good use of LLMs - making docs consistent across a large library / codebase.

minimaxir 8 hours ago | parent | next [-]

I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.

girvo 7 hours ago | parent | next [-]

The “change capture”/straight jacket style tests LLMs like to output drive me nuts. But humans write those all the time too so I shouldn’t be that surprised either!

mulmboy 2 hours ago | parent [-]

What do these look like?

diamond559 4 hours ago | parent | prev [-]

If the goal is to document the code and it gets sidetracked and focuses on only certain parts it failed the test. It just further proves llm's are incapable of grasping meaning and context.

danenania 8 hours ago | parent | prev | next [-]

A problem I’ve found with LLMs for docs is that they are like ten times too wordy. They want to document every path and edge case rather focusing on what really matters.

It can be addressed with prompting, but you have to fight this constantly.

bigiain 7 hours ago | parent [-]

I think probably my most common prompt is "Make it shorter. No more than ($x) (words|sentences|paragraphs)."

dcre 8 hours ago | parent | prev [-]

Docs also often don’t have anyone’s name on them, in which case they’re already attributed to an unknown composite author.

mcqueenjordan 5 hours ago | parent | prev | next [-]

I guess to follow up slightly more:

- I think the "if you use another model" rebuttal is becoming like the No True Scotsman of the LLM world. We can get concrete and discuss a specific model if need be.

- If the use case is "generate this function body for me", I agree that that's a pretty good use case. I've specifically seen problematic behavior for the other ways I'm seeing it OFTEN used, which is "write this feature for me", or trying to one shot too much functionality, where the LLM gets to touch data structures, abstractions, interface boundaries, etc.

- To analogize it to writing: They shouldn't/cannot write the whole book, they shouldn't/cannot write the table of contents, they cannot write a chapter, IMO even a paragraph is too much -- but if you write the first sentence and the last sentence of a paragraph, I think the interpolation can be a pretty reasonable starting point. Bringing it back to code for me means: function bodies are OK. Everything else gets questionable fast IME.

IgorPartola 4 hours ago | parent | prev | next [-]

My suspicion is that this is a form of the paradox where you can recognize that the news being reported is wrong when it is on a subject in which you are an expert but then you move onto the next article on a different subject and your trust resumes.

Basically if you are a software engineer you can very easily judge quality of code. But if you aren’t a writer then maybe it is hard for you to judge the quality of a piece of prose.

themk 6 hours ago | parent | prev | next [-]

I recently published an internal memo which covered the same point, but I included code. I feel like you still have a "voice" in code, and it provides important cues to the reviewer. I also consider review to be an important learning and collaboration moment, which becomes difficult with LLM code.

AlexCoventry 6 hours ago | parent | prev | next [-]

> I think that the code you actually want to ship is so far from what LLMs write

It depends on the LLM, I think. A lot of people have a bad impression of them as a result of using cheap or outdated LLMs.

dcre 8 hours ago | parent | prev | next [-]

In my experience, LLMs have been quite capable of producing code I am satisfied with (though of course it depends on the context — I have much lower standards for one-off tools than long-lived apps). They are able to follow conventions already present in a codebase and produce something passable. Whereas with writing prose, I am almost never happy with the feel of what an LLM produces (worth noting that Sonnet and Opus 4.5’s prose may be moving up from disgusting to tolerable). I think of it as prose being higher-dimensional — for a given goal, often the way to express it in code is pretty obvious, and many developers would do essentially the same thing. Not so for prose.

make_it_sure 2 hours ago | parent | prev [-]

try Opus 4.5, you'll be surprised. It might be true for past versions of LLMs, but they advanced a lot.