Remix.run Logo
Scaling long-running autonomous coding(simonwillison.net)
44 points by srameshc 4 hours ago | 11 comments

Related: Scaling long-running autonomous coding - https://news.ycombinator.com/item?id=46624541 - Jan 2026 (187 comments)

simonw 2 hours ago | parent | next [-]

One of the big open questions for me right now concerns how library dependencies are used.

Most of the big ones are things like skia, harfbuzz, wgpu - all totally reasonable IMO.

The two that stand out for me as more notable are html5ever for parsing HTML and taffy for handling CSS grids and flexbox - that's vendored with an explanation of some minor changes here: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

Taffy a solid library choice, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a "from scratch" rendering engine.

I don't think it detracts much if at all from FastRender as an example of what an army of coding agents can help a single engineer achieve in a few weeks of work.

sealeck 2 hours ago | parent | next [-]

I think the other question is how far away this is from a "working" browser. It isn't impossible to render a meaningful subset of HTML (especially when you use external libraries to handle a lot of this). The real difficulty is doing this (a) quickly, (b) correctly and (c) securely. All of those are very hard problems, and also quite tricky to verify.

I think this kind of approach is interesting, but it's a bit sad that Cursor didn't discuss how they close the feedback loop: testing/verification. As generating code becomes cheaper, I think effort will shift to how we can more cheaply and reliably determine whether an arbitrary piece of code meets a desired specification. For example did they use https://web-platform-tests.org/, fuzz testing (e.g. feed in random webpages and inform the LLM when the fuzzer finds crashes), etc? I would imagine truly scaling long-running autonomous coding would have an emphasis on this.

Of course Cursor may well have done this, but it wasn't super deeply discussed in their blog post.

I really enjoy reading your blog and it would be super cool to see you look at approaches people have to ensuring that LLM-produced code is reliable/correct.

simonw an hour ago | parent [-]

Yeah, I'm hoping they publish a lot more about this project! It deserves way more then the few sentences they've shared about it so far.

janoelze an hour ago | parent | prev [-]

Any views on the nature of "maintainability" shifting now? If a fleet of agents demonstrated the ability to bootstrap a project like that, would that be enough indication to you that orchestration would be able to carry the code base forward? I've seen fully llm'd codebases hit a certain critical weight where agents struggled to maintain coherent feature development, keeping patterns aligned, as well as spiralling into quick fixes.

simonw an hour ago | parent | next [-]

Almost no idea at all. Coding agents are messing with all 25+ years of my existing intuitions about what features cost to build and maintain.

Features that I'd normally never have considered building because they weren't worth the added time and complexity are now just a few well-structured prompts away.

But how much will it cost to maintain those features in the future? So far the answer appears to be a whole lot less than I would previously budget for, but I don't have any code more than a few months old that was built ~100% by coding agents, so it's way too early to judge how maintenance is going to work over a longer time period.

brianjeong 21 minutes ago | parent | prev [-]

I think there's a somewhat valid perspective that the Nth+1 model can simply clean up the previous models mess.

Essentially a bet that the rate of model improvement is going to be faster than the rate of decay from bad coding.

Now this hurts me personally to see as someone who actually enjoys having quality code but I don't see why it doesn't have a decent chance of holding

halfcat 27 minutes ago | parent | prev | next [-]

So AI makes it cheaper to remix anything already-seen, or anything with a stable pattern, if you’re willing to throw enough resources at it.

AI makes it cheap (eventually almost free) to traverse the already-discovered and reach the edge of uncharted territory. If we think of a sphere, where we start at the center, and the surface is the edge of uncharted territory, then AI lets you move instantly to the surface.

If anything solved becomes cheap to re-instantiate, does R&D reach a point where it can’t ever pay off? Why would one pay for the long-researched thing when they can get it for free tomorrow? There will be some value in having it today, just like having knowledge about a stock today is more valuable than the same knowledge learned tomorrow. But does value itself go away for anything digital, and only remain for anything non-copyable?

The volume of a sphere grows faster than the surface area. But if traversing the interior is instant and frictionless, what does that imply?

tinyhouse an hour ago | parent | prev | next [-]

Well, software is measured over time. The devil is always in the details.

anilgulecha 2 hours ago | parent | prev | next [-]

That's a wild idea-a browser from scratch! And ladybird has been moving at snails pace for a long time..

I think a good abstractions design and good test suite will make it break success of future coding projects.

vivzkestrel an hour ago | parent | prev [-]

I am waiting for that guy or a team that uses LLMs to write the most optimal version of Windows in existence, something that even surpasses what Microsoft has done over the years and honestly looking at the current state of Windows 11, it really feels like it shouldn't even be that hard to make something more user friendly

kimixa 14 minutes ago | parent [-]

Considering Microsoft's significant (and vocal) investment in LLMs, I fear the current state of Windows 11 is related to a team trying to do exactly that.