Remix.run Logo
simonw 4 hours ago

A bit odd that this talks about AutoGPT and declares it a failure. Gary quotes himself describing it like this:

> With direct access to the Internet, the ability to write source code and increased powers of automation, this may well have drastic and difficult to predict security consequences.

AutoGPT was a failure, but Claude Code / Codex CLI / the whole category of coding agents fit the above description almost exactly and are effectively AutoGPT done right, and they've been a huge success over the past 12 months.

AutoGPT was way too early - the models weren't ready for it.

lbrito 4 hours ago | parent | next [-]

>they've been a huge success over the past 12 months

They lose billions of dollars annually.

In what universe is that a business success?

simonw 3 hours ago | parent [-]

Coding agents are successful products which generate billions of dollars of revenue from millions of paying customers.

The organizations that provide them lose money because of the R&D costs involved in staying competitive in the model training arms race.

lbrito 3 hours ago | parent [-]

Revenue isn't profit.

Checking whether Claude Code by itself is profitable or not is probably impossible. It doesn't make a lot of sense divorcing R&D from the product. And obviously the running costs are not insignificant.

The company as a whole loses money.

simonw 3 hours ago | parent | next [-]

The most important question is whether they make or lose money on each customer, independent of their fixed R&D costs.

If they make money on each customer they have a credible business - they could become profitable even with their existing R&D losses provided they can sign up enough new paying customers.

If they lose money on every customer - such that signing a $1m new enterprise account costs them $1.1m in server costs - then their entire "business" is a sham.

I currently believe that Anthropic make money on almost every customer, such that their business is legit.

I guess we'll have to wait for the IPO paperwork to find out if I'm right about that.

kridsdale3 2 hours ago | parent | prev [-]

But humanity is gaining hugely productive (in financial terms) assets. It doesn't matter if the entity or its investors that created the asset goes kaboom.

Most of the investors and companies that built the rail network went bust. The iron remained.

Most of the investors and companies that built the telecom network went bust. The fiber remained.

Most of the investors and companies that are building models will go bust. The files (open weight or transfered to new owners for pennies) will remain, and yield economic benefits for as long as we flow current through them.

anonymous908213 4 hours ago | parent | prev [-]

Have they actually been a huge success, though? You're one of the most active advocates here, so I want to ask you what you make of "the Codex app". More specifically, the fact that it's a shitty Electron app. Is this not a perfect use case for agents? Why can OpenAI, with unlimited agents, not let them loose on the codebase with instructions to replace Electron with an appropriate cross-platform native framework, or even a per-platform native GUI? They said they chose Electron for ease of portability for cross-platform delivery, but they could allocate 1, 10, or 1000 agents to develop a native Linux and native Windows port of the MacOS codebase they started with. This is not even a particularly serious endeavour. I have coded a cross-platform chat application myself with more advanced features than what Codex offers, and chat GUIs are really among the most basic thing you can be doing; practically every consumer-targeted GUI application finds a time when they shove a chat box into a significantly more complex framework.

The conclusion that seems readily apparent to me, as it has always been, is that these "agents" are completely incapable of creating production-grade software suitable for shipping, or even meaningfully modifying existing software for a task like a port. Like the one-shot game they demo'd, they can make impressive proof-of-concepts, but nothing any user would use, nor with a suitable foundation for developers to actually build upon.

simonw 3 hours ago | parent | next [-]

My experience is that coding agents as-of November (GPT-5.2/Opus 4.5) produce high quality, production-worthy code against both small and large projects.

I base this on my own experience with them plus conversations with many other peers who I respect.

You can argue that OpenAI Codex using Electron disproves this if you like. I think it demonstrates a team making the safer choice in a highly competitive race against Anthropic and Google.

If you're wondering why we aren't seeing seismic results from these new tools yet, I'll point out that November was just over 2 months ago and we had the December holiday period in the middle of that.

anonymous908213 3 hours ago | parent [-]

I'm not sure I buy the safer choice argument. How much of a risk is it to assign a team of "agents" to independently work on porting the code natively? If they fail, it costs a trivial amount of compute relative to OAI's resources. If they succeed, what a PR coup that would be! It seems like they would have nothing to lose by at least trying, but they either did not try, or they did and it failed, neither of which inspires confidence in their supposedly life-changing, world-changing product.

I will note that you specifically said the agents have shown huge success over "the past 12 months", so it feels like the goalposts are growing legs when you say "actually, only for the last two months with Opus 4.5" now.

simonw 3 hours ago | parent [-]

Claude Code was released in February, it just had its 1 year birthday a few days ago.

OpenAI Codex CLI and Gemini CLI followed a few months afterwards

It took a little while for the right set of coding agent features to be developed and for the models to get good enough to use those features effectively.

I think this stuff went from interesting to useful around Sonnet 4, and from useful to "let it write most of my code" with the upgrades in November.

bandrami 3 hours ago | parent | prev [-]

"Why isn't there better software available?" is the 900 pound gorilla in the LLM room, but I do think there are enough anecdotes now to hypothesize that what agents seem to be good at is writing software that

1. wasn't economical to write in the first place previously, and

2. doesn't need to be sold to anyone else or maintained over time

So, Brad in logistics previously had to collate scanned manifests with purchase requests once a month, but now he can tell Claw to do it for him.

Which is interesting given the talk of The End of Software Development or whatever because "software that nobody was willing to pay for previously" kind of by definition isn't going to displace a lof of people who make software.

anonymous908213 3 hours ago | parent [-]

I do agree with this fully. I think LLMs have utility in making the creation of bad software extremely accessible. Bad software that happens to perfectly match some person's super specific need is by no means a bad thing to have in the world. A gap has been filled in creating niche software that previously was not worth paying anyone to create. But every single day we have multiple articles here proclaiming the end of software engineering, and I just don't get how the people hyping this up reconcile their hype with the lack of software being produced by agents that is good enough to replace any of the software people actually pay for.