Remix.run Logo
roadside_picnic 10 hours ago

> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.

I've had plenty of junior devs justify massive code bases of random scripts and 100+ line functions with the same logic. There's a reason senior devs almost always push back on this when it's encountered.

Everything hinges on that "if". But you're baking a tautology into your reasoning: "if LLMs can do everything we need them to, we can use LLMs for everything we need".

The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.

So "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.

This is clearly not the case with simplistic LLM usage today. "Ah! But you need agents and memory and context management, etc!" But all of these are abstractions. This is what I believe the parent comment is really pointing out.

If AI could do what we originally hoped it could: follow simple instructions to solve complex tasks. We'd be great, and I would agree with your argument. But we are very clearly not in that world. Especially since Karpathy can't even keep up with the sophisticated machinery necessary to properly orchestrate these tools. All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.

simonw 9 hours ago | parent | next [-]

I'm not arguing for using LLMs as an abstraction.

I'm saying that a key component of the dependency calculation has changed.

It used to be that one of the most influential facts affecting your decision to add a new library was the cost of writing the subset of code that you needed yourself. If writing that code and the accompanying tests represented more than an hour of work, a library was usually a better investment.

If the code and tests take a few minutes those calculations can look very different.

Making these decisions effectively and responsibly is one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.

The code we are producing remains the same. The difference is that a senior developer may have written that function + tests in several hours, at a cost of thousands of dollars. Now that same senior developer can produce exactly the same code at a time cost of less than $100.

all_factz 8 hours ago | parent | next [-]

React is hundreds of thousands of lines of code (or millions - I haven’t looked in awhile). Sure, you can start by having the LLM create a simple way to sync state across components, but in a serious project you’re going to run into edge-cases that cause the complexity of your LLM-built library to keep growing. There may come a point at which the complexity grows to such a point that the LLM itself can’t maintain the library effectively. I think the same rough argument applies to MomentJS.

simonw 7 hours ago | parent | next [-]

If the complexity grows beyond what it makes sense to do without React I'll have the LLM rewrite it all in React!

I did that with an HTML generation project to switch from Python strings to Jinja templates just the other day: https://github.com/simonw/claude-code-transcripts/pull/2

DrammBA 6 hours ago | parent | next [-]

Simon, you're starting to sound super disconnected from reality, this "I hit everything that looks like a nail with my LLM hammer" vibe is new.

simonw 6 hours ago | parent | next [-]

My habits have changed quite a bit with Opus 4.5 in the past month. I need to write about it..

godelski 5 hours ago | parent | next [-]

What's concerning to many of us is that you've (and others) have said this same thing s/Opus 4.5/some other model/

That feels more like chasing than a clear line of improvement. It's interrupted very different from something like "my habits have changed quite a bit since reading The Art of Computer Programming". They're categorically different.

mkozlows an hour ago | parent | next [-]

It's because the models keep getting better! What you could do with GPT-4 was more impressive than what you could do with GPT 3.5. What you could do with Sonnet 3.5 was more impressive yet, and Sonnet 4, and Sonnet 4.5.

Some of these improvements have been minor, some of them have been big enough to feel like step changes. Sonnet 3.7 + Claude Code (they came out at the same time) was a big step change; Opus 4.5 similarly feels like a big step change.

(If you don't trust vibes, METR's task completion benchmark shows huge improvements, too.)

If you're sincerely trying these models out with the intention of seeing if you can make them work for you, and doing all the things you should do in those cases, then even if you're getting negative results somehow, you need to keep trying, because there will come a point where the negative turns positive for you.

If you're someone who's been using them productively for a while now, you need to keep changing how you use them, because what used to work is no longer optimal.

pertymcpert 4 hours ago | parent | prev [-]

Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.

Why do you use the word "chasing" to describe this? I don't understand. Maybe you should try it and compare it to earlier models to see what people mean.

godelski 2 hours ago | parent [-]

  > Why do you use the word "chasing" to describe this?
I think you'll get the answer to this if you read my comment and your response to understand why you didn't address mine.

Btw, I have tried it. It's annoying that people think the problem is not trying. It was getting old when GPT 3.5 came out. Let's update the argument...

v64 6 hours ago | parent | prev | next [-]

Looking forward to hearing about how you're using Opus 4.5, from my experience and what I've heard from others, it's been able to overcome many obstacles that previous iterations stumbled on

indigodaddy 5 hours ago | parent | prev | next [-]

Can you expound on Opus 4.5 a little? Is it so good that it's basically a superpower now? How does it differ from your previous LLM usage?

pertymcpert 4 hours ago | parent [-]

To repeat my other comment:

> Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.

remich 5 hours ago | parent | prev [-]

Please do. I'm trying to help other devs in my company get more out of agentic coding, and I've noticed that not everyone is defaulting to Opus 4.5 or even Codex 5.2, and I'm not always able to give good examples to them for why they should. It would be great to have a blog post to point to…

dimitri-vs 6 hours ago | parent | prev [-]

Reality is we went from LLMs as chatbots editing a couple files per request with decent results. To running multiple coding agents in parallel to implement major features based on a spec document and some clarifying questions - in a year.

Even IF llms don't get any better there is a mountain of lemons left to squeeze in their current state.

zdragnar 7 hours ago | parent | prev [-]

That would go over on any decently sized team like a lead balloon.

simonw 6 hours ago | parent [-]

As it should, normally, because "we'll rewrite it in React later" used to represent weeks if not months of massively disruptive work. I've seen migration projects like that push on for more than a year!

The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.

zdragnar 5 hours ago | parent | next [-]

And everyone else's work has to be completely put on hold or thrown away because you did the whole thing all at once on your own.

That's definitely not something that goes over well on anything other than an incredibly trivial project.

pertymcpert 4 hours ago | parent [-]

Why did you jump to the assumption that this:

> The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.

... meant that person would do it in a clandestine fashion rather than this be an agreed upon task prior? Is this how you operate?

zdragnar 3 hours ago | parent | next [-]

My very first sentence:

> And everyone else's work has to be completely put on hold

On a big enough team, getting everyone to a stopping point where they can wait for you to do your big bang refactor to the entire code base- even if it is only a day later- is still really disruptive.

The last time I went through something like this, we did it really carefully, migrating a page at a time from a multi page application to a SPA. Even that required ensuring that whichever page transitioned didn't have other people working on it, let alone the whole code base.

Again, I simply don't buy that you're going to be able to AI your way through such a radical transition on anything other than a trivial application with a small or tiny team.

zeroonetwothree 3 hours ago | parent | prev [-]

If you have 100s of devs working on the project it’s not possible to do a full rewrite in one go. So its to about clandestine but rather that there’s just no way to get it done regardless of how much AI superpowers you bring to bear.

reactordev 4 hours ago | parent | prev | next [-]

I’m going to add my perspective here as they seem to all be ganging up on you Simon.

He is right. The game has changed. We can now refactor using an agent and have it done by morning. The cost of architectural mistakes is minimal and if it gets out of hand, you refactor and take a nap anyway.

What’s interesting is now it’s about intent. The prompts and specs you write, the documents you keep that outline your intended solution, and you let the agent go. You do research. Agent does code. I’ve seen this at scale.

Teever 3 hours ago | parent | prev [-]

Let's say I'm mildly convinced by your argument. I've read your blog post that was popular on HN a week or so ago and I've made similar little toy programs with AI that scratch a particular niche.

Do you care to make any concrete predictions on when most developers will embrace this new normal as part of their day to day routine? One year? Five?

And how much of this is just another iteration in the wheel of recarnation[0]? Maybe we're looking at a future where we see return to the monoculture library dense supply chain that we use today but the libraries are made by swarms of AI agents instead and the programmer/user is responsible for guiding other AI agents to create business logic?

[0] https://www.computerhope.com/jargon/w/wor.htm

simonw 2 hours ago | parent | next [-]

It's really hard to predict how other developers are going to work, especially given how resistant a lot of developers are to fully exploring the new tools.

I do think there's been a bit of a shift in the last two months, with GPT 5.1 and 5.2 Codex and Opus 4.5.

We have models that can reliably follow complex instructions over multiple hour projects now - that's completely new. Those of us at the cutting edge are still coming to terms with the consequences of this (as illustrated by this Karpathy tweet).

I don't trust my predictions myself, but I think the next few months are going to see some big changes in terms of what mainstream developers understand these tools as being capable of.

mkozlows an hour ago | parent | prev [-]

"The future is already here, it's just unevenly distributed."

At some companies, most developers already are using it in their day to day. IME, the more senior the developer is, the more likely they are to be heavily using LLMs to write all/most of their code these days. Talking to friends and former coworkers at startups and Big Tech (and my own coworkers, and of course my own experience), this isn't a "someday" thing.

People who work at more conservative companies, the kind that don't already have enterprise Cursor/Anthropic/OpenAI agreements, and are maybe still cautiously evaluating Copilot... maybe not so much.

chairmansteve 2 hours ago | parent | prev | next [-]

"React is hundreds of thousands of lines of code".

Most of which are irrelevant to my project. It's easier to maintain a few hundred lines of self written code than to carry the react-kitchen-sink around for all eternity.

wanderlust123 5 hours ago | parent | prev [-]

Not all UIs converge to a React like requirement. For a lot of use cases React is over-engineering but the profession just lacks the balls to use something simpler, like htmx for example.

zeroonetwothree 3 hours ago | parent | next [-]

Core react is fairly simple, I would have no problem using it for almost everything. The overengineering usually comes at a layer on top.

all_factz 4 hours ago | parent | prev [-]

Sure, and for those cases I’d rather tell the agent to use htmx instead of something hand-rolled.

brians 8 hours ago | parent | prev | next [-]

A major difference is when we have to read and understand it because of a bug. Perhaps the LLM can help us find it! But abstraction provides a mental scaffold

godelski 5 hours ago | parent [-]

I feel like "abstraction" is overloaded in many conversations.

Personally I love abstraction when it means "generalize these routines to a simple and elegant version". Even if it's harder to understand than a single instance it is worth the investment and gives far better understanding of the code and what it's doing.

But there's also abstraction meaning to make less understandable or more complex and I think LLMs operate this way. It takes a long time to understand code. Not because any single line of code is harder to understand but because they need to be understood in context.

I think part of this is in people misunderstanding elegance. It doesn't mean aesthetically pleasing, but to do something in a simple and efficient way. Yes, write it rough the first round but we should also strive for elegance. It more seems like we are just trying to get the first rough draft and move onto the next thing.

qazxcvbnmlp 5 hours ago | parent | prev | next [-]

Without commenting if parent is right or wrong. (I suspect it is correct)

If its true, the market will soon reward it. Being able to competently write good code cheaper will be rewarded. People don't employ programmers because they care about them, they are employed to produce output. If someone can use llms to produce more output for less $$ they will quickly make the people that don't understand the technology less competitive in the workplace.

zx8080 4 hours ago | parent [-]

> more output for less $$

That's a trap: it's not obvious for those without experience in both business and engineering on how to estimate or later calculate this $$. The trap is in the cost of changes and fix budget when things will break. And things will break. Often. Also, the requirements will change often, that's normal (our world is not static). So the cost has some tendency to change (guess which direction). The thoughtless copy-paste and rewrite-everything approach is nice, but the cost goes up steep with time soon. Those who don't know it will be trapped dead and lose business.

tbrownaw 3 hours ago | parent [-]

Predicting costs may be tricky, but measuring them after the fact it's a fair bit easier.

squigz 7 hours ago | parent | prev [-]

> Making these decisions effectively and responsibly is one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.

They're not being disrupted. This is exactly why some people don't trust LLMs to re-invent wheels. It doesn't matter if it can one-shot some code and tests - what matters is that some problems require experience to know what exactly is needed to solve that problem. Libraries enable this experience and knowledge to centralize.

When considering whether inventing something in-house is a good idea vs using a library, "up front dev cost" factors relatively little to me.

joquarky 7 hours ago | parent [-]

Don't forget to include supply chain attacks in your risk assessment.

cameronh90 4 hours ago | parent | prev | next [-]

Rather, the problem more often I see with junior devs is pulling in a dozen dependencies when writing a single function would have done the job.

Indeed, part of becoming a senior developer is learning why you should avoid left-pad but accept date-fns.

We’re still in the early stages of operationalising LLMs. This is like mobile apps in 2010 or SPA web dev in 2014. People are throwing a lot of stuff at the wall and there’s going be a ton of churn and chaos before we figure out how to use it and it settles down a bit. I used to joke that I didn’t like taking vacations because the entire front end stack will have been chucked out and replaced with something new by the time I get back, but it’s pretty stable now.

Also I find it odd you’d characterise the current LLM progress as somehow being below where we hoped it would be. A few years back, people would have said you were absolutely nuts if you’d have predicted how good these models would become. Very few people (apart from those trying to sell you something) were exclaiming we’d be imminently entering a world where you enter an idea and out comes a complex solution without any further guidance or refining. When the AI can do that, we can just tell it to improve itself in a loop and AGI is just some GPU cycles away. Most people still expect - and hope - that’s a little way off yet.

That doesn’t mean the relative cost of abstracting and inlining hasn’t changed dramatically or that these tools aren’t incredibly useful when you figure out how to hold them.

Or you could just do what most people always do and wait for the trailblazers to either get burnt or figure out what works, and then jump on the bandwagon when it stabilises - but accept that when it does stabilise, you’ll be a few years behind those who have been picking shrapnel out of their hands for the last few years.

whstl 9 hours ago | parent | prev | next [-]

> The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.

Hyperbole. It's also very often a "world of pain" with a lot of senior code.

mannanj 9 hours ago | parent | prev | next [-]

> things will break and when they do, it will incur a world of pain

How much if this is still true and exaggerated in our world environment today where the cost of making things is near 0?

I think “Evolution” would say that the cost of producing is near 0 so the possibility of creating what we want is high. The cost of trying again is low so mistakes and pain aren’t super high. For really high stakes situation (which most situations are not) bring the expert human in the loop until the expert better than that human is AI.

bdangubic 9 hours ago | parent | prev | next [-]

> All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.

the people are telling you “you are not doing it right!” - that’s it, there is nothing to interpret addition to this basic sentence

neoromantique 9 hours ago | parent | prev | next [-]

I'm sorry, but I don't agree.

Current dependency hell that is modern development, just how wide the openings are for supply chain attacks and seemingly every other week we get a new RCE.

I'd rather 100 loosely coupled scripts peer reviewed by a half a dozen of LLM agents.

pca006132 8 hours ago | parent [-]

But this doesn't solve dependency hell. If the functionalities were loosely coupled, you can already vendor the code in and manually review them. If they are not, say it is a db, you still have to depend on that?

Or maybe you can use AI to vendor dependencies, review existing dependencies and updates. Never tried that, maybe that is better than the current approach, which is just trusting the upstream most of the time until something breaks.

joquarky 6 hours ago | parent [-]

Are you really going to manually review all of moment.js just to format a date?

pca006132 6 hours ago | parent [-]

By vendoring the code in, in this case I mean copying the related code into the project. You don't review everything. It is a bad way to deal with dependencies, but it feels similar to how people are using LLMs now for utility functions.

baq 9 hours ago | parent | prev [-]

> "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.

Ignoring for a second they actually already are indeed, it doesn’t matter because the cost of rewriting the mess drops by an order of magnitude with each frontier model release. You won’t need good code because you’ll be throwing everything away all the time.

bspinner 9 hours ago | parent [-]

I've yet to understand this argument. If you replace a brown turd with a yellowish turd, it'll still be a turd.

PaulHoule 8 hours ago | parent [-]

In everyday life I am a plodding and practical programmer who has learned the hard way that any working code base has numerous “fences” in the Chesterton sense.

I think, though, that for small systems and small parts of systems LLMs do move the repair-replace line in the replace direction, especially if the tests are good.