Remix.run Logo
moduspol 8 days ago

"Deal with" and "autonomously" are doing a lot of heavy lifting there. Cursor already does a pretty good job indexing all the files in a code base in a way that lets it ask questions and get answers pretty quickly. It's just a matter of where you set the goalposts.

yosefk 8 days ago | parent | next [-]

Cursor fails miserably for me even just trying to replace function calls with method calls consistently, like I said in the post. This I would hope is fixable. By dealing autonomously I mean "you don't need a programmer - a PM talks to an LLM and that's how the code base is maintained, and this happens a lot (rather than on one or two famous cases where it's pretty well known how they are special and different from most work)"

By "large" I mean 300K lines (strong prediction), or 10 times the context window (weaker prediction)

I don't shy away from looking stupid in the future, you've got to give me this much

adastra22 8 days ago | parent [-]

I'm pretty sure you can do that right now in Claude Code with the right subagent definitions.

(For what it's worth, I respect and greatly appreciate your willingness to put out a prediction based on real evidence and your own reasoning. But I think you must be lacking experience with the latest tools & best practices.)

yosefk 8 days ago | parent | next [-]

If you're right, there will soon be a flood of software teams with no programmers on them - either across all domains, or in some domains where this works well. We shall see.

Indeed I have no experience with Claude Code, but I use Claude via chat, and it fails all the time on things not remotely as hard as orientation in a large code base. Claude Code is the same thing with the ability to run tools. Of course tools help to ground its iterations in reality, but I don't think it's a panacea absent a consistent ability to model the reality you observe thru your use of tools. Let's see...

boxed 7 days ago | parent | next [-]

I was very skeptical of Claude Code but was finally convinced to try it and it does feel very different to use. I made three hobby projects in a weekend that I had pushed up for years due to "it's too much hassle to get started" inertia. Two of the projects it did very well with, the third I had to fight with it and it still is subtly wrong (swiftUI animations and claude code seemingly is not a good mix!)

That being said, I think your analysis is 100% correct. LLMs are fundamentally stupid beyond belief :P

Terretta 7 days ago | parent [-]

> SwiftUI animations and claude code seemingly is not a good mix

Where is the corpus of SwiftUI animations to train Claude what probable soup you probably want regurgitated?

Hypothesis: iOS devs don't share their work openly for reasons associated with how the App Store ecosystem (mis)behaves.

Relatedly, the models don't know about Swift 6 except from maybe mid-2024 WWDC announcements. It's worth feeding them your own context. If you are 5.10, great. If you want to ship iOS 26 changes, wait till 2026 or again, roll your own context.

boxed 7 days ago | parent [-]

In my case the big issue seems to be that if you hide a component in SwiftUI, it's by default animated with a fade. This not shown in the API surface area at all.

Vegenoid 7 days ago | parent | prev | next [-]

I am more skeptical of the rate of AI progress than many here, but Claude Code is a huge step. There were a few "holy shit" moments when I started using it. Since then, after much more experimentation, I see its limits and faults, and use it less now. But I think it's worth giving it a try if you want to be informed about the current state of LLM-assisted programming.

adastra22 7 days ago | parent | prev [-]

> Indeed I have no experience with Claude Code, but I use Claude via chat...

These are not even remotely similar, despite the name. Things are moving very fast, and the sort of chat-based interface that you describe in your article is already obsolete.

Claude is the LLM model. Claude Code is a combination of internal tools for the agent to track its goals, current state, priorities, etc., and a looped mechanism for keeping it on track, focused, and debugging its own actions. With the proper subagents it can keep its context from being poisoned from false starts, and its built-in todo system keeps it on task.

Really, try it out and see for yourself. It doesn't work magic out of the box, and absolutely needs some hand-holding to get it to work well, but that's only because it is so new. The next generation of tooling will have these subagent definitions auto selected and included in context so you can hit the ground running.

We are already starting to see a flood of software coming out with very few active coders on the team, as you can see on the HN front page. I say "very few active coders" not "no programmers" because using Claude Code effectively still requires domain expertise as we work out the bugs in agent orchestration. But once that is done, there aren't any obvious remaining stumbling blocks to a PM running a no-coder, all-AI product team.

TheOtherHobbes 7 days ago | parent [-]

Claude Code isn't an LLM. It's a hybrid architecture where an LLM provides the interface and some of the reasoning, embedded inside a broader set of more or less deterministic tools.

It's obvious LLMs can't do the job without these external tools, so the claim above - that LLMs can't do this job - is on firm ground.

But it's also obvious these hybrid systems will become more and more complex and capable over time, and there's a possibility they will be able to replace humans at every level of the stack, from junior to CEO.

If that happens, it's inevitable these domain-specific systems will be networked into a kind of interhybrid AGI, where you can ask for specific outputs, and if the domain has been automated you'll be guided to what you want.

It's still a hybrid architecture though. LLMs on their own aren't going to make this work.

It's also short of AGI, never mind ASI, because AGI requires a system that would create high quality domain-specific systems from scratch given a domain to automate.

adastra22 7 days ago | parent [-]

If you want to be pedantic about word definitions, it absolutely is AGI: artificial general intelligence.

Whether you draw the system boundary of an LLM to include the tools it calls or not is a rather arbitrary distinction, and not very interesting.

nomel 7 days ago | parent | next [-]

Nearly every definition I’ve seen that involves AGI (there are many) includes the ability to self learn and create “novel ideas”. The LLM behind it isn’t capable of this, and I don’t think the addition of the current set of tools enables this either.

adastra22 7 days ago | parent [-]

Artificial general intelligence was a phrase invented to draw distinction from “narrow intelligence” which are algorithms that can only be applied to specific problem domains. E.g. Deep Blue was amazing at playing chess, but couldn’t play Go much less prioritize a grocery list. Any artificial program that could be applied to arbitrary tasks not pre-trained on is AGI. ChatGPT and especially more recent agentic models are absolutely and unquestionably AGI in the original definition of the term.

Goalposts are moving though. Through the efforts of various people in the rationalist-connected space, the word has since morphed to be implicitly synonymous with the notion of superintellgence and self-improvement, hence the vague and conflicting definitions people now ascribe to it.

Also, fwiw the training process behind the generation of an LLM is absolutely able to discover new and novel ideas, in the same sense that Kepler’s laws of planetary motion were new and novel if all you had were Tycho Brache’s astronomical observations. Inference can tease out these novel discoveries, if nothing else. But I suspect also that your definition of creative and novel would also exclude human creativity if it were rigorously applied—our brains after all are merely remixing our own experiences too.

Vegenoid 7 days ago | parent | prev [-]

> If you want to be pedantic about word definitions, it absolutely is AGI: artificial general intelligence.

This isn't being pedantic, it's deliberately misinterpreting a commonly used term by taking every word literally for effect. Terms, like words, can take on a meaning that is distinct from looking at each constituent part and coming up with your interpretation of a literal definition based on those parts.

adastra22 7 days ago | parent [-]

I didn't invent this interpretation. It's how the word was originally defined, and used for many, many decades, by the founders of the field. See for example:

https://www-formal.stanford.edu/jmc/generality.pdf

Or look at the old / early AGI conference series:

https://agi-conference.org

Or read any old, pre-2009 (ImageNet) AI textbook. It will talk about "narrow intelligence" vs "general intelligence," a dichotomy that exists more in GOFAI than the deep learning approaches.

Maybe I'm a curmudgeon and this is entering get-off-my-lawn territory, but I find it immensely annoying when existing clear terminology (AGI vs ASI, strong vs weak, narrow vs. general) is superseded by a confused mix of popular meanings that lack any clear definition.

scoopdewoop 5 hours ago | parent | next [-]

I'm a week late, but I do appreciate you pointing out this real phenomenon of moving the goalpost. Language is really general, multimodal models even more-so. The idea that AGI should be way more anthropomorphic and omnipotent is really recent. New definitions almost disregard the possibility of stupid general intelligence, despite proof-by-existence living all around us.

Vegenoid 7 days ago | parent | prev [-]

The McCarthy paper doesn't use the term "artificial general intelligence" anywhere. It does use the word "general" a lot in relation to artificial intelligence.

I looked at the AGI conference page for 2009: https://agi-conference.org/2009/

When it uses the term "artificial general intelligence", it hyperlinks to this page: http://www.agiri.org/wiki/index.php?title=Artificial_General...

Which seems unavailable, so here is an archive from 2007: https://web.archive.org/web/20070106033535/http://www.agiri....

And that page says "In Nov. 1997, the term Artificial General Intelligence was first coined by Mark Avrum Gubrud in the abstract for his paper Nanotechnology and International Security". And here is that paper: https://web.archive.org/web/20070205153112/http://www.foresi...

That paper says: "By advanced artificial general intelligence, I mean AI systems that rival or surpass the human brain in complexity and speed, that can acquire, manipulate and reason with general knowledge, and that are usable in essentially any phase of industrial or military operations where a human intelligence would otherwise be needed."

I think that your insisting that AGI means something different than what everyone else means when they say it is not useful, and will only lead to people getting confused and disagreeing with you. I agree that it's not a great term.

alfalfasprout 7 days ago | parent | prev | next [-]

FWIW I do work with the latest tools/practices and completely agree with OP. It's also important to contextualize what "large" and "complex" codebases really mean.

Monorepos are large but the projects inside may, individually, not be that complex. So there are ways of making LLMs work with monorepos well (eg; providing a top level index of what's inside, how to find projects, and explaining how the repo is set up). Complexity within an individual project is something current-gen SOTA LLMs (I'm counting Sonnet 4, Opus 4.1, Gemini 2.5 Pro, and GPT-5 here) really suck at handling.

Sure, you can assign discrete little tasks here and there. But bigger efforts that require not only understanding how the codebase is designed but also why it's designed that way fall short. Even more so if you need them to make good architectural decisions on something that's not "cookie cutter".

Fundamentally, I've noticed the chasm between those that are hyper-confident LLMs will "get there soon" and those that are experienced but doubtful depends on the type of development you do. "ticket pulling" type work generally has the work scoped well enough that an LLM might seem near-autonomous. More abstract/complex backend/infra/research work not so much. Still value there, sure. But hardly autonomous.

adastra22 7 days ago | parent [-]

Could, e.g., a custom-made 100ktoken summary of the architecture and relevant parts of the giant repo and base index of where to find more info be sufficient to allow Opus to take a large task and split it into small enough subprojects that are farmed out to Sonnet instances with sufficient context?

This seems quite doable with even a small amount of tooling around Claude Code, even though I agree it doesn't have this capability out of the box. I think a large part of this gulf is "it doesn't work out of the box" vs "it can be made to work with a little customization."

bootsmann 7 days ago | parent | prev [-]

I feel like refutations like this (you aren't using the tool right | you should try this other tool) pop up often but are fundamentally worthless because as long as you're not showing code you might as well be making it up. The blog post gives examples of clear failures that can be reproduced by anyone by themselves, I think its time vibe code defenders are held to the same standard.

adastra22 7 days ago | parent [-]

The very first example is that LLMs lose their mental model of chess when playing a game. Ok, so instead ask Claude Code to design an MCP for tracking chess moves, and vibe code it. That’s the very first thing that comes to mind, and I expect it would work well enough.

jononor 8 days ago | parent | prev | next [-]

"LLM" as well, because coding agents are already more than just an LLM. There is very useful context management around it, and tool calling, and ability to run tests/programs, etc. Though they are LLM-based systems, they are not LLMs.

smnrchrds 8 days ago | parent | next [-]

Indeed. If the LLM calls a chess engine tool behind the scenes, it would be able to play excellent chess as well.

cavisne 8 days ago | parent [-]

The author would still be wrong in the tool-calling scenario. There is already perfect (or at least superhuman) chess engines. There is no perfect "coding engine". LLM's + tools being able to reliably work on large codebases would be a new thing.

yosefk 8 days ago | parent [-]

Correct - as long as the tools the LLM uses are non-ML-based algorithms existing today, and it operates on a large code base with no programmers in the loop, I would be wrong. If the LLM uses a chess engine, then it does nothing on top of the engine; similarly if an LLM will use another system adding no value on top, I would not be wrong. If the LLM uses something based on a novel ML approach, I would not be wrong - it would be my "ML breakthrough" scenario. If the LLM uses classical algorithms or an ML algo known today and adds value on top of them and operates autonomously on a large code base - no programmer needed on the team - then I am wrong

interstice 8 days ago | parent | prev [-]

This rapidly gets philosophical. If I use tools am I not handling the codebase? Are we classing LLM as tool or user in this scenario?

ameliaquining 8 days ago | parent | prev [-]

True, there'd be a need to operationalize these things a bit more than is done in the post to have a good advance prediction.