All true if you one shot the code.

If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

The net is that unsupervised AI engineering isn’t really cheaper better or faster than human engineering right now. Does that mean in two years it will be? Possibly.

There will be a lot of optimizations in the message traffic, token uses, foundational models, and also just the Moore’s law of the hardware and energy costs.

But really it’s the sophistication of the agent systems that control quality more than anything. Simply following waterfall (I know, right? Yuck… but it worked) increased code quality tremoundously.

I also gave it the SelfDocumentingCode pattern language that I wrote (on WikiWikiWeb) as a code review agent and quality improved tremendously again.

▲

theshrike79 2 days ago | parent | next [-]

> Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

Currently it's just VC funded. The $20 packages they're selling are in no way cost-effective (for them).

That's why I'm driving all available models like I stole them, building every tool I can think of before they start charging actual money again.

By then local models will most likely be at a "good enough" level especially when combined with MCPs and tool use so I don't need to pay per token for APIs except for special cases.

▲

tempoponet 2 days ago | parent [-]

Once local models are good enough there will be a $20 cloud provider that can give you more context, parameters, and t/s than you could dream of at home. This is true today with services like groq.

	▲	theshrike79 a day ago \| parent \| next [-]
		Anthropic used to have unlimited subscriptions, then people started running angents 24/7. Now they have 5 hour buckets of limited use. Groq most likely stays afloat because they're a bit player - and propped by VC money. With a local system I can run it at full blast all the time, nobody can suddenly make it stupid by reallocating resources to training their new model, nobody can censor it or do stealth updates that make it perform worse.
	▲	sunir 2 days ago \| parent \| prev \| next [-]
		Not exactly. Those models are based on intermittent usage. If you're using an AI engineer using a sophisticated agent flow, the usage is constant and continuous. That can price to an equivalent of a dedicated cube at home over 2 years. I had 3 projects running today. I hit my Claude Max Pro session limits twice today in about 90 minutes. I'm now keeping it down to 1 project, and I may interrupt it until the evening when I don't need Claude Web. If I could run it passively on my laptop, I would.
	▲	hatefulmoron 2 days ago \| parent \| prev [-]
		Groq and Cerebras definitely have the t/s, but their hardware is tremendously expensive, even compared to the standard data center GPUs. Worth keeping in mind if we're talking about a $20 subscription.

▲

zarzavat 2 days ago | parent | prev | next [-]

> If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Just an hour ago I asked Claude to find bugs in a function and it found 1 real bug and 6 hallucinated bugs.

One of the "bugs" it wanted to "fix" was to revert a change that I had made previously to fix a bug in code it had written.

I just don't understand how people burning tokens on sophisticated multi-agent systems are getting any value from that. These LLMs don't know when they are doing something wrong, and throwing more money at the problem won't make them any smarter. It's like trying to build Einstein by hiring more and more schoolkids.

Don't get me wrong, Claude is a fantastic productivity boost but letting it run around unsupervised would slow me down rather than speed me up.

▲

oblio a day ago | parent | prev [-]

> and also just the Moore’s law of the hardware and energy costs.

What Moore's law?