Remix.run Logo
colechristensen 11 hours ago

Eh.

Right now, Claude is good enough. If LLM development hit a magical wall and never got any better, Claude is good enough to be terrifically useful and there's diminishing returns on how much good we get out of it being at $benchmark.

Saying we're satisfied with that... well how many years until efficiency gains from one side and consumer hardware from the other meet in the middle so "good enough for everybody" open models are available for anyone who wants to pay for a $4000 MacBook (and after another couple of years a $1000 MacBook, and several more and a fancy wristwatch).

Point being, unless we get to a point where we start developing "models" that deserve civil rights and citizenship, the years are numbered to where we NEED cloud infrastructure and datacenters full of racks and racks of $x0,000 hardware.

I strongly believe the top end of the S curve is nigh, and with it we're going to see these trillion dollar ambitions crumble. Everybody is going to want a big-ass GPU and a ton of RAM but that's going to quickly become boring because open models are going to exist that eat everybody's lunch and the trillion dollar companies trying to beat them with a premium product aren't going to stack up outside of niche cases and much more ordinary cloud compute motivations.

ACCount37 11 hours ago | parent | next [-]

Good enough? There's no such thing.

People said that "good enough" about GPT-4. Now you say that about Claude Opus 4.5. How long before the treadmill turns, and the very same Opus 4.5 becomes "the bare minimum" - the least capable AI you would actually consider using for simple and unimportant tasks?

We have miles and miles of AI advancements ahead of us. The end of that road isn't "good enough". It's "too powerful to be survivable".

cyanydeez 11 hours ago | parent | next [-]

Elon will boil the oceans if it means not having to deal with poor people.

cindyllm 8 hours ago | parent [-]

[dead]

colechristensen 10 hours ago | parent | prev [-]

I can build fully functional applications without writing a single line of code with Claude. In my free time. On a weekend. I'm going to release one of them pretty soon. A toddler being able to do this instead of an industry veteran isn't that compelling. Avoiding the few pitfalls of the LLM getting stuck and taking a while to get out isn't that valuable.

>Good enough? There's no such thing.

This is just wrong. Maybe you can't imagine good enough, I can. And I think "better" is going to start getting diminishing returns as the velocity of improvements I expect to slow and the value of improvements are going to become less meaningful. The "cost" of a LLM making mistakes is already pretty low, cutting it in half is better, sure, but it's so low already I don't particularly care if it gets some multiple more rare.

buu700 9 hours ago | parent | prev [-]

Coding capability in and of itself may be "good enough" or close to it, but there's a long way to go before AI can build and operate a product end-to-end. In fairness, a lot of the gap may be tooling.

But the end state in my mind is telling an AI "build me XYZ", having it ask all the important questions over the course of a 30-minute chat while making reasonable decisions on all lower-level issues, then waking up the next morning to a live cloud-hosted test environment at a subdomain of the domain it said it would buy along with test builds of native apps for Android, iOS, Linux, macOS, and Windows, all with near-100% automated test coverage and passing tests. Coding agents feel like magic, but we're clearly not there yet.

And that's just coding. If someone wanted to generate a high-quality custom feature-length movie within the usage limits of a $20/mo AI plan, they'd be sorely disappointed.

sureglymop 32 minutes ago | parent | next [-]

Given that natural language is ambiguous, what if the LLM makes some mistakes though?

I'm wondering because, it's not like it's a human that can then take accountability/responsibility for that...

colechristensen 5 hours ago | parent | prev [-]

>But the end state in my mind is telling an AI "build me XYZ", having it ask all the important questions over the course of a 30-minute chat while making reasonable decisions on all lower-level issues, then waking up the next morning to a live cloud-hosted test environment at a subdomain of the domain it said it would buy along with test builds of native apps for Android, iOS, Linux, macOS, and Windows, all with near-100% automated test coverage and passing tests. Coding agents feel like magic, but we're clearly not there yet.

I'm pretty sure we're there. I'm not sure how interested I am in completely closing that loop and completely removing the human from the loop. But I'm also pretty confident that I could do it with nothing but existing models and software built around them.

buu700 5 hours ago | parent [-]

I'm not aware that we are there, but would be very interested if you have information to the contrary. Even if we were there, the product/service that does it would have to be at a reasonable cost in order to be useful for most people.

As I said, a lot of the gap may be tooling. But I'm skeptical that even the models themselves are capable of that given sufficiently advanced tooling. I'm not saying we're not close (certainly much closer than we were at the start of the decade), but if we were actually there, you would have zero reservations about removing the human from the loop of an initial prototype.

colechristensen 3 hours ago | parent [-]

My information to the contrary is my experience in the last few weeks building things with LLMs including tooling to help build things with LLMs. The is experience is one of ... I'm a product manager and devsecops engineer bullying an LLM with the psychology of a toddler into building great software which it can do very successfully. A single instance of a model with a single rolling context window and one set of prompts absolutely can't do what you want, but that's not what I've been doing.

Oneshotting applications isn't interesting to me because I do want to be involved, there are things I have opinions about that I won't know I will have until we get there and there are definitely times where I want to pivot a little or a lot in the middle of development based on experience, an actually agile development cycle.

In the same way I wouldn't want to hire a wedding planner or house builder to plan my wedding or build my home based entirely on a single short meeting before anything started, I don't want to one shot software.

There are all sorts of things where I want to get myself out of the loop because they're stupid problems, some of them I've fixed, others I'd rather fix later because doing the thing is more interesting than pausing and building the tools to make the thing.

There is I think an inverse relationship between the complexity of the tooling and the amount of human involvement; for me I've reached or am quite near the amount of human involvement where I'm much more excited about building stuff than saving more of my attention.

I'm being a bit vague because I'm not sure I want to share all of my secrets just yet.

buu700 an hour ago | parent [-]

Just to be clear, what I was proposing was a single tool which would, on the basis of a single ~30-minute interaction, purchase a domain name, set up a cloud environment, build a full-stack application + cross-platform native apps + useful tests with near-100% coverage, deploy a live test environment, and compile each platform's native app — all entirely autonomously. Are you saying you've used or built something similar to that? That is super interesting if so, even if you're unable to share. A major subset of that could also still be incredibly useful, but the whole solution I described is a very high bar.

I've been very successful building with custom LLM workflows and automation myself, but that's beyond the capabilities of any tooling I've seen, and I wouldn't necessarily expect great results with current models even if current tooling were fully capable of what I described. Even with such tooling, the cost of inference is high enough to deter careless usage without much more rigorous work on the initial spec and/or micromanagement of the development process.

I'm not necessarily advocating for one-shotting in any given context. I'm simply pointing out that there would be huge advantages to LLMs and tooling sufficiently advanced to be fully capable of doing so end-to-end, especially at dramatically lower cost than current models and at superhuman quality. Such an AI could conceivably one-shot any possible project idea, in the same sense that a competent human dev team with nothing but a page of vague requirements and unlimited time could at least eventually produce something functional.

The value of such an AI is that we'd use it in ways that sound ridiculous today. Maybe a chat with some guy at a bar randomly inspires a neat idea, so you quickly whip out your phone and fire off some bullet point notes; by the time you get home, you have 10 different near-production-ready variations to choose from, each with documentation on the various decisions its agent made and why, and each one only cost $5 in account credit. None is quite perfect, but through the process you've learned a lot and substantially refined the idea; you give it a second round of notes and wake up to a new testable batch. One of those has the functional requirements just right, so you make the final decisions on non-functional requirements and let it roll one last time with strict attention to detail on code quality and a bunch of cycles thrown at security review.

That evening, you check back in and find a high-quality final implementation that meets all of your requirements with a performant and scalable architecture, with all infrastructure deployed and apps submitted to all stores/repositories. You subsequently allocate a sales and marketing budget to the AI, and eventually notice that you suddenly have a new source of income. Now imagine that instead of you, this was actually your friend who's never written a line of code and barely knows how to use a computer.

I still agree with you that current models have been "good enough" for some time, in the sense that if LLMs froze today we could spend the next decade collectively building on and with them and it would totally transform the economy. But at the same time, there's definitely latent demand for more and/or better inference. If LLMs were to become radically more efficient, we wouldn't start shuttering data centers; the economy would just become that much more productive.