This is a pretty common position: "I don't worry about getting left behind - it will only take a few weeks to catch up again".

I don't think that's true.

I'm really good at getting great results out of coding agents and LLMs. I've also been using LLMs for code on an almost daily basis since ChatGPT's release on November 30th 2022. That's more than three years ago now.

Meanwhile I see a constant flow of complaints from other developers who can't get anything useful out of these machines, or find that the gains they get are minimal at best.

Using this stuff well is a deep topic. These things can be applied in so many different ways, and to so many different projects. The best asset you can develop is an intuition for what works and what doesn't, and getting that intuition requires months if not years of personal experimentation.

I don't think you can just catch up in a few weeks, and I do think that the risk of falling behind isn't being taken seriously enough by much of the developer population.

I'm glad to see people like antirez ringing the alarm bell about this - it's not going to be a popular position but it needs to be said!

▲ Humorist2290 5 hours ago | parent | next [-]

It needs to be said that your opinion on this is well understood by the community, respected, but also far from impartial. You have a clear vested interest in the success of _these_ tools.

There's a learning curve to any toolset, and it may be that using coding agents effectively is more than a few weeks of upskilling. It may be, and likely will be, that people make their whole careers about being experts on this topic.

But it's still a statistical text prediction model, wrapped in fancy gimmicks, sold at a loss by mostly bad faith actors, and very far from its final form. People waiting to get on the bandwagon could well be waiting to pick up the pieces once it collapses.

▲

mattmanser 4 hours ago | parent [-]

I have a lot of respect from Simon and read a lot of his articles.

But I'm still seeing clear evidence it IS a statistical text prediction model. You ask it the right niche thing and it can only pump out a few variations of the same code, that's clearly someone else's code stolen almost verbatim.

And I just use it 2 or 3 times a day.

How are SimonW and AntiRez not seeing the same thing?

How are they not seeing the propensity for both Claude + ChatGPT to spit out tons of completely pointless error handling code, making what should be a 5 line function a 50 line one?

How are they not seeing that you constantly have to nag it to use modern syntax. Typescript, C#, Python, doesn't matter what you're writing in, it will regularly spit out code patterns that are 10 years out of date. And woe betide you using a library that got updated in the last 2 years. It will constantly revert back to old syntax over and over and over again.

I've also had to deal with a few of my colleagues using AI code on codebases they don't really understand. Wrong sort, id instead of timestamp. Wrong limit. Wrong json encoding, missing key converters. Wrong timezone on dates. A ton of subtle, not obvious, bugs unless you intimately know the code, but would be things you'd look up if you were writing the code.

And that's not even including the bit where the AI obviously decided to edit the wrong search function in a totally different part of the codebase that had nothing to do with what my colleague was doing. But didn't break anything or trigger any tests because it was wrapped in an impossible to hit if clause. And it created a bunch of extra classes to support this phantom code, so hundreds of new lines of code just lurking there, not doing anything but if I hadn't caught it, everyone thinks it does do something.

	▲	simonw an hour ago \| parent \| next [-]
		It's mostly a statistical text model, although the RL "reasoning" stuff added in the past 12 months makes that a slightly less true statement - it has extra tricks now to help it bias bits of code to statistically predict that are more likely to work. The real unlock though is the coding agent harnesses. It doesn't matter any more if it statistically predicts junk code that doesn't compile, because it will see the compiler error and fix it. If you tell it "use red/green TDD" it will write the tests first, then spot when the code fails to pass them and fix that too. > How are they not seeing the propensity for both Claude + ChatGPT to spit out tons of completely pointless error handling code, making what should be a 5 line function a 50 line one? TDD helps there a lot - it makes it less likely the model will spit out lines of code that are never executed. > How are they not seeing that you constantly have to nag it to use modern syntax. Typescript, C#, Python, doesn't matter what you're writing in, it will regularly spit out code patterns that are 10 years out of date. I find that if I use it in a codebase with modern syntax it will stick to that syntax. A prompting trick I use a lot is "git clone org/repo into /tmp and look at that for inspiration" - that way even a fresh codebase will be able to follow some good conventions from the start. Plus the moment I see it write code in a style I don't like I tell it what I like instead. > And that's not even including the bit where the AI obviously decided to edit the wrong search function in a totally different part of the codebase that had nothing to do with what my colleague was doing. I usually tell it which part of the codebase to execute - or if it decides itself I spot that and tell it that it did the wrong thing - or discard the session entirely and start again with a better prompt.
	▲	MaybiusStrip an hour ago \| parent \| prev \| next [-]
		You can literally go look at some of antirez's PRs described here in this article. They're not seeing it because it's not there? Honestly, what you're describing sounds like the older models. If you are getting these sorts of results with Opus 4.5 or 5.2-codex on high I would be very curious to see your prompts/workflow.
	▲	jimmaswell 3 hours ago \| parent \| prev [-]
		> You ask it the right niche thing and it can only pump out a few variations of the same code, that's clearly someone else's code stolen almost verbatim. There are only so many ways to express the same idea. Even clean room engineers write incidentally identical code to the source sometimes.

▲ tymscar 2 hours ago | parent | prev | next [-]

I think you are right in saying that there is some deep intuition that takes months, if not years, to hone about current models, however, the intuition some who did nothing but talk and use LLMs nonstop two years ago would be just as good today as someone who started from scratch, if not worse because of antipatterns that don’t apply anymore, such as always starting a new chat and never using a CLI because of context drift.

Also, Simon, with all due respect, and I mean it, I genuinely look in awe at the amount of posts you have on your blog and your dedication, but it’s clear to anyone that the projects you created and launched before 2022 far exceed anything you’ve done since. And I will be the first to say that I don’t think that’s because of LLMs not being able to help you. But I do think it’s because what makes you really, really good at engineering you kept replacing slowly but surely with LLMs more and more by the month.

If I look at Django, I can clearly see your intelligence, passion, and expertise there. Do you feel that any of the projects you’ve written since LLMs are the main thing you focus on are similar?

Think about it this way: 100% of you wins against 100% of me any day. 100% of Claude running on your computer is the same as 100% of Claude running on mine. 95% of Claude and 5% of you, while still better than me (and your average Joe), is nowhere near the same jump from 95% Claude and 5% me.

I do worry when I see great programmers like you diluting their work.

▲

jcheng an hour ago | parent | next [-]

> 95% of Claude and 5% of you, while still better than me (and your average Joe), is nowhere near the same jump from 95% Claude and 5% me.

I see what you're saying, but I'm not sure it is true. Take simonw and tymscar, put them each in charge of a team of 19 engineers (of identical capabilities). Is the result "nowhere near the same jump" as simonw vs. tymscar alone? I think it's potentially a much bigger jump, if there are differences in who has better ideas and not just who can code the fastest.

	▲	tymscar 27 minutes ago \| parent [-]
		I agree, however there you don’t compare technical knowledge alone, you also compare managerial skills. With LLMs its admittedly a bit closer to doing it yourself because the feedback loop is much tighter

▲

simonw an hour ago | parent | prev [-]

My great regret from the past few years is that experimenting with LLMs has been such a huge distraction from my other work! My https://llm.datasette.io/ tool is from that era though, and it's pretty cool.

	▲	tymscar an hour ago \| parent [-]
		I do think your datasettes work is fantastic and I genuinely hope you take my previous message the right way. I’m not saying you do something bad, quite the opposite, I feel like we need more of you and I’m afraid because of LLMs we get less of you.

▲ csmpltn 4 hours ago | parent | prev | next [-]

So where’s all of this cutting edge amazing and flawless stuff you’ve built in a weekend that everybody else couldn’t because they were too dumb or slow or clueless?

▲ wild_egg 4 hours ago | parent | next [-]

This is such a tired response at this point.

People are under zero obligation to release their work to the public. Simon actually publishes and writes about a remarkable amount of the side projects he builds with AI.

The rest of us just build tons of cool stuff for personal use or for $JOB. Releasing stuff to the public is, in general, a massive amount of extra work for very little benefit. There are loads of FOSS maintainers trapped spending as much time managing their communities as they do their actual projects and many of us just don't have time for that.

▲ rgoulter an hour ago | parent | next [-]

> The rest of us just build tons of cool stuff for personal use or for $JOB. Releasing stuff to the public is, in general, a massive amount of extra work for very little benefit. There are loads of FOSS maintainers trapped spending as much time managing their communities as they do their actual projects and many of us just don't have time for that.

I wouldn't worry about this.

There are many examples of people sharing a project they've used LLMs to help write, and the result was not a huge amount of attention & expectation of burden.

Perhaps "I don't share it because I'm worried people will love it too much" even suggests the opposite: you can concretely demonstrate the kinds of things you've been able to build using LLMs.

> This is such a tired response at this point.

Lack of specificity & concrete examples frequently mean all that's left for discussion is emotion for hype and anti-hype, though.

In this thread, the discussion was:

  pro: use LLMs or get left behind

  conserve: okay, I'll start using LLMs when they're good

  pro: no no they won't be that good, it takes effort to get to use them

  conserve: do you have any examples?

  pro: why should we have to share examples?

I like LLMs. But making big claims while being reticent about concrete claims and demonstrations is irksome.

	▲	trollbridge 4 minutes ago \| parent [-]
		I’m waiting to see a huge burst of high quality open source code, which should be happening, right?

▲ Anamon 2 hours ago | parent | prev | next [-]

The response may be tired when asked in this personal way, but in general, it's a fair question. Nobody is forced to share their work. But with all the high praises, we'd expect to see at least some uptick in the software world. But there is no surge in open source projects. No surge in app store entries. And for the bigger companies claiming high GenAI use, they're not iterating faster or building more. They are continually removing features and their software is getting worse, slower, less robust, and less secure.

Software quality has been on a step downwards curve as far as quality and capabilities are concerned, for years before LLM coding had its breakthrough. For all the promises I'd have expected to, three years later, at least notice the downward trajectory easing off. But it hasn't been happening.

▲ grayhatter an hour ago | parent | prev [-]

All I took from your reply was

> I could if I wanted to, but I just don't feel like it.

What am I missing where I can understand that's not what you meant?

▲ simonw 2 hours ago | parent | prev | next [-]

I wouldn't call these flawless but here you go:

- https://github.com/simonw/denobox is a new Python library that gives you the ability to run arbitrary JavaScript and WASM in a sandbox provided by Deno, because it turns out a Python library can depend on deno these days. I built that on my phone in bed yesterday morning.

- https://github.com/simonw/pwasm is a WebAssembly runtime written in pure Python with no dependencies, built by feeding Claude Code the official WASM specification along with its conformance test suite and having it hack away at that (again via my phone) to get as many of the tests to pass as possible. It's pretty slow and not really useful yet but it's certainly interesting.

- https://github.com/datasette/datasette-transactions is a Datasette plugin which provides a JSON API for starting a SQLite transaction, running multiple queries within it and then executing or rolling back that transaction. I built that one on my phone on a BART (SF Bay Area metro) trip.

- https://github.com/simonw/micro-javascript is a pure Python, no dependency JavaScript interpreter which started as a port of MicroQuickJS. Here's a demo of that one running in a browser https://simonw.github.io/micro-javascript/playground.html - that's my JavaScript interpreter running inside Python running in Pyodide in WebAssembly in your browser of choice, which I find inherently amusing.

All of those are from the past three weeks. Most of them were built on my phone while I was doing other things.

▲

Cyph0n 24 minutes ago | parent | next [-]

I am not at all an AI sceptic, but probably less impressed by what LLMs are capable of.

Looking at these projects, I have a few questions:

1. These seem to be fairly self-contained and well specified problems, which is the best case scenario for “vibe coding”. Do you have any examples of projects where the solution was somewhat vague and open-ended? If not, how do you think Claude Code or similar would perform?

2. Did you feel excited or energized by having an LLM implement these projects end-to-end? Personally, I find LLMs useful as a closely guided assistant, particularly to interactively explore the space of solutions. I also don’t feel energized at all by having it implement anything non-trivial end to end, outside of writing tests (and even then, not all types of tests!).

3. Do you think others would find these projects useful? In particular, if you vibe coded them, why couldn’t someone else do the same thing? And once these projects are picked up by future model training runs, they’ll probably be even easier to one shot, reducing the value even further.

Let me provide an example of what I mean by (2), at least in the context of hobbyist dev. I could have Claude Code vibe code a Gameboy emulator and it would probably do a fine job given that it’s a well specified problem that is likely well represented in its training data. But the process would neither be exciting nor energizing. I would rather spend hours gradually getting more and more working and experience the fruits of my labor (I did this already btw).

At $DAYJOB, I simply do not have confidence in an LLM doing anything non-trivial end to end. Besides, the complexity remains in defining the requirements and constraints, designing the solution, gaining consensus, and devising a plan for implementation. The goal would be for the LLM to pick up discrete, well defined chunks of work.

▲

CjHuber 2 hours ago | parent | prev [-]

Based on those, it seems you are not actually using them to create big codebases from scratch, but rather for problems that would normally take quite a while, not because they are inherently difficult to implement, but because you would normally have to spend considerable time on the finicky implementation details.

I think that's the reason why LLMs work so well for some like you, and generate slop for others, because if you let them alone with projects that require opinionated code and actual decision making they most often don't grasp the users intention well or worse misinterpret it so confidently that you end up with something with all the wrong opinions and decisions compounding path-dependently into the strangest and most useless slop.

▲

simonw an hour ago | parent | next [-]

"for problems that would normally take quite a while, not because they are inherently difficult to implement, but because you would normally have to spend considerable time on the finicky implementation details"

Yes, exactly! How amazing is it that we have technology now that lets us quickly build projects where we would normally have to spend considerable time on the finicky implementation details?

▲

peteforde an hour ago | parent | prev [-]

Another lens is that many people either have terrible written communication skills, do not intuitively grasp how to describe a complex system design, or both. And yet, since everyone is a genius with 100% comprehensibility in their own mind, they simply aren't aware that the problem starts with them.

	▲	CjHuber an hour ago \| parent [-]
		Well I think it also has to do with communication with LLMs being different to communication with humans. If you tell a developer "don't do busywork" they surely wouldn't say "Oh the repo looks like a trash dump, but no busywork so I'm not going to clean it up, quickly document that as canonical structure, then continue"

▲ jstummbillig 4 hours ago | parent | prev | next [-]

I find it increasingly confusing that some people seem to believe, that other people not subjecting themselves to this continued interrogation, gives any credence to their position.

People seem to believe that there is a burden of proof. There is not. What do I care if you are on board?

I don't know what could change your mind, but of course the answer is "nothing" as long as you aer not open to it. Just look around. There is so much stuff, from so many credible people in all domains. If you can't find anything that is convincing or at least interesting to you, you are simply not looking.

	▲	pavlus 3 hours ago \| parent [-]
		> What do I care if you are on board? Without enough adoption expect some companies you are a client of to increase prices more, or close entirely down the road, due to insufficient cash inflow. So, you would care, if you want to continue to use these tools and see them evolve, instead of seeing the bubble pop.

▲ williamcotton 2 hours ago | parent | prev | next [-]

Over the last few days I made this ggplot2-looking plotting DSL as a CLI tool and a Rust library.

https://github.com/williamcotton/gramgraph

The motivation? I needed a declarative plotting language for another DSL I'm working on called Web Pipe:

  GET /weather.svg
    |> fetch: `https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m`
    |> jq: `
      .data.response.hourly as $h |
      [$h.time, $h.temperature_2m] | transpose | map({time: .[0], temp: .[1]})
    `
    |> gg({ "type": "svg", "width": 800, "height": 400} ): `
      aes(x: time, y: temp) 
        | line()
        | point()
    `

"Web Pipe is an experimental DSL and Rust runtime for building web apps via composable JSON pipelines, featuring native integration of GraphQL, SQL, and jq, an embedded BDD testing framework, and a sophisticated Language Server."

https://github.com/williamcotton/webpipe

https://github.com/williamcotton/webpipe-lsp

https://williamcotton.com/articles/basic-introduction-to-web...

I've been working at quite a clip for a solo developer who is building a new language with a full featured set of tooling.

I'd like to think that the approach to building the BDD-testing framework directly into the language itself and having the test runner using the production request handlers is at least somewhat novel!

  GET /hello/:world
    |> jq: `{ world: .params.world }`
    |> handlebars: `<p>hello, {{world}}</p>`

  describe "hello, world"
    it "calls the route"
      let world = "world"
      
      when calling GET /hello/{{world}}
      then status is 200
      and selector `p` text equals "hello, {{world}}"

I'm married with two young kids and I have a full-time job. Before these tools there was no way I could build all of these experiments with such limited resources.

▲ user34283 4 hours ago | parent | prev | next [-]

Where is all the amazing, much better stuff you implemented manually meanwhile?

▲ llmslave3 4 hours ago | parent | prev [-]

He's built lots of cool stuff with AI. Here is four random ones pulled from https://tools.simonwillison.net

- https://tools.simonwillison.net/bullish-bearish

- https://tools.simonwillison.net/user-agent

- https://tools.simonwillison.net/gemini-chat

- https://tools.simonwillison.net/token-usage

▲

m4nu3l 3 hours ago | parent | next [-]

All of the linked apps look trivial to me. Also, the first one, the UI has no feedback once you click the answer (plus some questions don't really make sense as they have the answer in them). There is more on the website, so there could be something interesting, but I'm having trouble finding it among all the noise. Not saying simple apps have no value. Even simple throwaway UIs can have value, especially if you develop them quickly.

	▲	simonw 2 hours ago \| parent [-]
		How about these ones, are these trivial too? https://news.ycombinator.com/item?id=46582192

▲

CamelCaseName 4 hours ago | parent | prev | next [-]

This is not really cool or impressive at all?

	▲	_yc_is_evil_ 4 hours ago \| parent [-]
		[dead]

▲

sesm 2 hours ago | parent | prev | next [-]

A page that outputs your user agent as an example of 'cool stuff built with AI'?

	▲	simonw 2 hours ago \| parent [-]
		See my comment here - I suspect that those were deliberately picked by llmslave3 to NOT be impressive: https://news.ycombinator.com/item?id=46582209 For more impressive examples see https://simonwillison.net/2025/Dec/10/html-tools/ and https://news.ycombinator.com/item?id=46574276#46582192

▲

simonw 2 hours ago | parent | prev [-]

llmslave3 appears to have deliberately picked the least interesting from my HTML+JavaScript tools collection here. This post describes a bunch of much more interesting ones: https://simonwillison.net/2025/Dec/10/html-tools/

▲

llmslave3 2 hours ago | parent [-]

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

▲

simonw 2 hours ago | parent [-]

Did you genuinely select those examples in good faith?

If you're here to converse in good faith, what's your opinion of the examples I shared in this post over here? https://news.ycombinator.com/item?id=46574276#46582192

	▲	llmslave3 18 minutes ago \| parent [-]
		[dead]

▲ systemf_omega 15 hours ago | parent | prev | next [-]

> Using this stuff well is a deep topic.

Just like the stuff LLMs are being used for today. Why wouldn't "using LLMs well" be not just one of the many things LLMs will simplify too?

Or do you believe your type of knowledge is somehow special and is resistant to being vastly simplified or even made obsolete by AI?

	▲	simonw 12 hours ago \| parent \| next [-]
		An interesting trend over the past year is that LLMs have learned how to prompt each other. Back in ~2024 a lot of people were excited about having "LLMs write the prompt!" but I found the results to be really disappointing - they were full of things like "You are the world's best expert in marketing" which was superstitious junk. As of 2025 I'm finding they actually do know how to prompt, which makes sense because there's a ton more information about good prompting approaches in the training data as opposed to a couple of years ago. This has unlocked some very interesting patterns, such as Claude Code prompting sub-agents to help it explore codebases without polluting the top level token window. But learning to prompt is not the key skill in getting good results out of LLMs. The thing that matters most is having a robust model of what they can and cannot do. Asking an LLM "can you do X" is still the kind of thing I wouldn't trust them to answer in a useful way, because they're always constrained by training data that was only aware of their predecessors.
	▲	leonidasv 11 hours ago \| parent \| prev [-]
		Unless we figure out how to make 1 billion+ tokens multimodal context windows (in a commercially viable way) and connect them to Google Docs/Slack/Notion/Zoom meetings/etc, I don't think it will simplify that much. Most of the work is adjusting your mental model to the fact that the agent is a stateless machine that starts from scratch every single time and has little-to-no knowledge besides what's in the code, so you have to be very specific about the context of the task in some ways. It's different from assigning a task to a co-worker who already knows the business rules and cross-implications of the code in the real world. The agent can't see the broader picture of the stuff it's making, it can go from ignoring obvious (to a human that was present in the last planning meeting) edge cases to coding defensively against hundreds of edge cases that will never occur, if you don't add that to your prompt/context material.

▲ grayhatter an hour ago | parent | prev | next [-]

Show me what you've made with AI?

What's the impressive thing that can convince me it's equivalent, or better than anything created before, or without it?

I understand you've produced a lot of things, and that your clout (which depends on the AI ferver) is based largely because of how refined a workflow you've invented. But I want to see the product, rather than the hype.

Make me say; I wish I was good enough to create this!

Without that, all I can see is the cost, or the negative impact.

edit: I've read some of your other posts, and for my question, I'd like to encourage you to pick only one. Don't use the scatter shot approach that LLMs love, giving plenty of examples, hoping I'll ignore the noise for the single that sounds interesting.

Pick only one. What project have you created that you're truly proud of?

I'll go first, (even though it's unfinished): Verse

▲ coffeemug 11 hours ago | parent | prev | next [-]

Strongly disagree. Claude Code is the most intuitive technology I've ever used-- way easier than learning to use even VS Code for example. It doesn't even take weeks. Maybe a day or two to get the hang of it and you're off to the races.

▲

johnsmith1840 5 hours ago | parent | next [-]

The difference is AI tooling lies to you. Day 0 you think it's perfect but the more you use ai tools you realize using them wrong can give you gnarly bugs.

It's intuitive to use but hard to master

	▲	coffeemug 30 minutes ago \| parent [-]
		It took me a couple of days to find the right level of detail to prompt it. Too high level, and the codebase gets away from me/the tooling goes off the rails. Too low level, and I may as well do it myself. Maybe also learn the sorts of things Claude Code isn't good at yet. But once I got in the groove it was very easy from there. I think the whole process took 2-3 days.

▲

simonw 10 hours ago | parent | prev [-]

Don't underestimate the number of developers who aren't comfortable with tools that live in the terminal.

	▲	coffeemug 32 minutes ago \| parent \| next [-]
		I actually don't use it in the terminal, I use the vs code extension. It's a better experience (bringing up the file being edited, nicer diffs, etc.) But both are trivial to pick up.
	▲	HDThoreaun 6 hours ago \| parent \| prev [-]
		Well these people are left behind either way. Competent devs can easily learn to use coding assistants in a day or two

▲ camel-cdr 4 hours ago | parent | prev | next [-]

How many thing you learned working with LLMs in 2022 are relevant today? How many things you learned now are relevant in the future?

	▲	y1n0 3 hours ago \| parent [-]
		This question misses the point. Everything you learn today informs how you learn in the future.

▲ 3 hours ago | parent | prev | next [-]

[deleted]

▲ rubslopes 11 hours ago | parent | prev | next [-]

I don't disagree, knowing how to use the tools is important. But I wanted to add that great prompting skill nowadays are far far less necessary for top-tier models that it was years ago. If I'm clear about what I want and how I want it to behave, Claude Opus 4.5 almost always nails it first time. The "extra" that I do often, that maybe newcomers don't, is to setup a system where the LLM can easily check the results of its changes (verbose logs in terminal and, in web, verbose logs in console and playwright).

▲ furyofantares 11 hours ago | parent | prev | next [-]

I think I'm also very good at getting great results out of coding agents and LLMs, and I disagree pretty heavily with you.

It is just way easier for someone to get up to speed today than it was a year ago. Partly because capabilities have gotten better and much of what was learned 6+ months ago no longer needs to be learned. But also partly because there is just much more information out there about how to get good results, you might have coworkers or friends you can talk to who have gotten good results, you can read comments on HN or blog posts from people who have gotten good results, etc.

I mean, ok, I don't think someone can fully catch up in a few weeks. I'll grant that for sure. But I think they can get up to speed much faster than they could have a year ago.

Of course, they will have to put in the effort at that time. And people who have been putting it off may be less likely to ever do that. So I think people will get left behind. But I think the alarm to raise is more, "hey, it's a deep topic and you're going to have to put in the effort" rather than "you better start now or else it's gonna be too late".

▲ jeroenhd 14 hours ago | parent | prev | next [-]

So far every new AI product and even model update has required me to relearn how to get decent results out of them. I'm honestly kind of sick of having to adjust my work flow every time.

The intuition just doesn't hold. The LLM gets trained and retrained by other LLM users so what works for me suddenly changes when the LLM models refresh.

LLMs have only gotten easier to learn and catch up on over the years. In fact, most LLM companies seem to optimise for getting started quickly over getting good results consistently. There may come a moment when the foundations solidify and not bothering with LLMs may put you behind the curve, but we're not there yet, and with the literally impossible funding and resources OpenAI is claiming they need, it may never come.

	▲	christophilus 2 hours ago \| parent [-]
		Really? Claude Code upgrades for me have been pretty seamless- basically better quality output, given the same prompts, with no discernible downsides.

▲ mmcnl 13 hours ago | parent | prev | next [-]

Why can't both be true at the same time? Maybe their problems are more complex than yours. Why do you assume it's a skill issue and ignore the contextual variables?

▲

simonw 12 hours ago | parent [-]

On the rare occasions that I can convince them to share the details of the problems they are tackling and the exact prompts they are using it becomes very clear that they haven't learned how to use the tools yet.

	▲	UncleEntity 10 hours ago \| parent [-]
		I'm kind of curious about the things you're seeing since I find the best way is to have them come up with a plan for the work they're about to do and then make sure they actually finish it because they like to skip stuff if it requires too much effort. I mean, I just think of them like a dog that'll get distracted and go off doing some other random thing if you don't supervise them enough and you certainly don't want to trust them to guard your sandwich.

▲ noosphr 3 hours ago | parent | prev | next [-]

I've been building Ai apps since gpt 3 so 5 years now.

The pro AI people don't understand what quadratic attention means and the anti-ai people don't understand how much information can be contained in a tb of weights.

At the end of the day both will be hugely disappointed.

>The best asset you can develop is an intuition for what works and what doesn't, and getting that intuition requires months if not years of personal experimentation.

Intuition does not translate between models. Whatever you think dense llms were good at deepseek completely upended it in an afternoon. The difference between major revisions of model families is substantial enough that intuition is a drawback not an asset.

▲

simonw 3 hours ago | parent [-]

What does quadratic attention mean?

I've so far found that intuition travels between models of a similar generation remarkably well. The conformance suite trick (find a 9,200 test existing conformance suite and tell an agent to build a fresh implementation that passes all those tests) I first found with GPT-5.2 turned out to work exactly as well against Claude Opus 4.5, for example.

▲

noosphr 3 hours ago | parent [-]

https://arxiv.org/abs/2209.04881

▲

simonw 2 hours ago | parent [-]

To save anyone else the click, this is the paper "On The Computational Complexity of Self-Attention" from September 2022, with authors from NYU and Microsoft.

It argues that the self-attention mechanism in transformers works by having every token "attend to" every other token in a sequence, which is quadratic - n^2 against input - which should limit the total context length available to models.

This would explain why the top models have been stuck at 1 million tokens since Gemini 1.5 in February 2024 (there has been a 2 million token Gemini but it's not in wide use, and Meta claimed their Llama 4 Scout could do 10 million but I don't know of anyone who's seen that actually work.)

My counter-argument here is that Claude Opus 4.5 has a comparatively tiny 200,000 token window which turns out to work incredibly well for the kinds of coding problems we're throwing at it, when accompanied by a cleverly designed harness such as Claude Code. So this limit from 2022 has been less "disappointing" than people may have expected.

▲

sdenton4 2 hours ago | parent [-]

The quadratic attention problem seems to be largely solved by practical algorithmic improvements. (Iterations on flash attention, etc.)

What's practically limiting context size IME is that results seem to get "muddy" and get off track when you have a giant context size. For a single-topic long session, I imagine you get a large number of places in the context which may be good matches for a given query, leading to ambiguous results.

I'm also not sure how much work is being put into reinforcement in extremely large context inference, as it's presumably quite expensive to do and hard to reliably test.

	▲	noosphr an hour ago \| parent [-]
		Indeed, filling the adversitsed context more than 1/4 full is a bad idea in general. 50k tokens is a fair bit, but works out to between 1 and 10k lines of code. Perfect for a demo or work on a single self contained file. Disastrous for a large code base with logic scattered all throughout it.

▲ tehnub 4 hours ago | parent | prev | next [-]

I guess this applies to the type of developer who needs years, not weeks, to become proficient in say Python?

▲ biophysboy 10 hours ago | parent | prev | next [-]

What are your tips? Any resources you would recommend? I use Claude code and all the chat bots, but my background isn't programming, so I sometimes feel like I'm just swimming around.

▲ hollowturtle 4 hours ago | parent | prev | next [-]

I can't buy it because for many people like you it's always the other that uses the tools wrong, proving the contrary for skeptics that keep getting bad results from llms it simply is impossible with this narrative as the base of the discourse, eg "you're not using it well". I don't even get why you need to praise yourself so much being really good at using these tools, if not for building some tech influencer status around here... same thing I believe antirez is trying to do(who knows why)

▲

kevin42 4 hours ago | parent [-]

Have you considered that maybe you aren't using it well? It's something that can and should be learned. It's a tool, and you can't expect to get the most out of a tool without really learning how to use it.

I've had this conversation with a few people so far, and I've offered to personally walk through a project of their choosing with them. Everyone who has done this has changed their perspective. You may not be convinced it will change the world, but if you approach it with an open mind and take the time to learn how to best use it, I'm 100% sure you will see that it has so much potential.

There are tons of youtube videos and online tutorials if you really want to learn.

	▲	hollowturtle 4 hours ago \| parent [-]
		> Have you considered that maybe you aren't using it well? Here we go, as I said, and again and again and again it's always out fault we're not using well. It is impossible to counter argument. Btw to reply to your question, yes many times and proved to be useful in very small specialized tasks and a couple of migrations. I really like how LLMs are helping me in my day to day, but still so far away from all this astroturfing

▲ Mawr 14 hours ago | parent | prev | next [-]

I don't see how your position is compatible with the constant hype about the ever-growing capabilities of LLMs. Either they are improving rapidly, and your intuition keeps getting less and less valuable, or they aren't improving.

▲

simonw 14 hours ago | parent [-]

They're improving rapidly, which means your intuition needs to be constantly updated.

Things that they couldn't do six months go might now be things that they can do - and knowing they couldn't do X six months ago is useful because it helps systematize your explorations.

A key skill here is to know what they can do, what they can't do and what the current incantations are that unlock interesting capabilities.

A couple I've learned in the past week:

1. Don't give Claude Code a URL to some code and tell it to use that, because by default it will use its WebFetch tool but that runs an extra summarization layer (as a prompt injection defense) which loses details. Telling it to use curl sometimes works but a guaranteed trick is to have it git clone the relevant repo to /tmp and look at the code there instead.

2. Telling Claude Code "use red/green TDD" is a quick to type shortcut that will cause it to write tests first, run them and watch them fail, then implement the feature and run the test again. This is a wildly effective technique for getting code that works properly while avoiding untested junk code that isn't needed.

Now multiply those learnings by three years. Sure, the stuff I figure out in 2023 mostly doesn't apply today - but the skills I developed in learning how to test and iterate on my intuitions from then still count and still keep compounding.

The idea that you don't need to learn these things because they'll get better to the point that they can just perfectly figure out what you need is AGI science fiction. I think it's safe to ignore.

▲

mmcnl 13 hours ago | parent | next [-]

Personally I think this is an extreme waste of time. Every week you're learning something new that is already outdated the next week. You're telling me AI can write complex code but isn't able to figure out how to properly guide the user into writing usable prompts?

A somewhat intelligent junior will dive deep for one week and be on the same knowledge level as you in roughly 3 years.

▲

simonw 13 hours ago | parent | next [-]

No matter how good AI gets we will never be in a situation where a person with poor communication skills will be able to use it as effectively as someone who's communication skills are razor sharp.

▲

q3k 12 hours ago | parent [-]

But the examples you've posted have nothing to do with communication skills, they're just hacks to get particular tools to work better for you, and those will change whenever the next model/service decides to do things differently.

▲

zahlman 10 hours ago | parent | next [-]

I'm generally skeptical of Simon's specific line of argument here, but I'm inclined to agree with the point about communication skill.

In particular, the idea of saying something like "use red/green TDD" is an expression of communication skill (and also, of course, awareness of software methodology jargon).

▲

habinero 10 hours ago | parent [-]

Ehhh, I don't know. "Communication" is for sapients. I'd call that "knowing the right keywords".

And if the hype is right, why would you need to know any of them? I've seen people unironically suggest telling the LLM to "write good code", which seems even easier.

	▲	zahlman 10 hours ago \| parent [-]
		I sympathize with your view on a philosophical level, but the consequence is really a meaningless semantic argument. The point is that prompting the AI with words that you'd actually use when asking a human to perform the task, generally works better than trying to "guess the password" that will magically get optimum performance out of the AI. Telling an intern to care about code quality might actually cause an intern who hasn't been caring about code quality to care a little bit more. But it isn't going to help the intern understand the intended purpose of the software.

▲

simonw 12 hours ago | parent | prev [-]

I'm going to resist the temptation to spend more time coming up with more examples. I'm sorry those weren't to your liking!

▲

danielmarkbruce 4 hours ago | parent [-]

Why do you bother with all this discussion? Like, I get it the first x times for some low x, it's fun to have the discussion. But after a while, aren't you just tired of the people who keep pushing back? You are right, they are wrong. It's obvious to anyone who has put the effort in.

	▲	peteforde an hour ago \| parent [-]
		Trying to have a discussion with people who aren't actually interested in being convinced is exhausting. Simon has a lot more patience than I do.

▲

habinero 10 hours ago | parent | prev [-]

[flagged]

▲

crakhamster01 13 hours ago | parent | prev [-]

I feel like both of these examples are insights that won't be relevant in a year.

I agree that CC becoming omniscient is science fiction, but the goal of these interfaces is to make LLM-based coding more accessible. Any strategies we adopt to mitigate bad outcomes are destined to become part of the platform, no?

I've been coding with LLMs for maybe 3 years now. Obviously a dev who's experienced with the tools will be more adept than one who's not, but if someone started using CC today, I don't think it would take them anywhere near that time to get to a similar level of competency.

▲

simonw 13 hours ago | parent [-]

I base part of my skepticism about that on the huge number of people who seem to be unable to get good results out of LLMs for code, and who appear to think that's a commentary on the quality of the LLMs themselves as opposed to their own abilities to use them.

	▲	pavlus 2 hours ago \| parent \| next [-]
		> huge number of people who seem to be unable to get good results out of LLMs for code Could it be, they use other definition of "good"?
	▲	svara 12 hours ago \| parent \| prev [-]
		I suspect that's neither a skill issue nor a technical issue. Being "a person who can code" carries some prestige and signals intelligence. For some, it has become an important part of their identity. The fact that this can now be said of a machine is a grave insult if you feel that way. It's quite sad in a way, since the tech really makes your skills even more valuable.

▲ yeasku 9 minutes ago | parent | prev [-]

[dead]