Remix.run Logo
robbrown451 5 hours ago

Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?

I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself: https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x... (cnc router that cuts plywood, and is made out of cnc-router cut plywood)

This is my own effort at an AI assisted coding environment optimized for building itself: https://recursi.dev/ (just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )

Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.

jrflo 5 hours ago | parent | next [-]

I think harnesses would count, AI != LLMs. Any piece of code that helps the computer reason for itself is AI, the harnesses are AI in a sense.

fluoridation 3 hours ago | parent | next [-]

By that interpretation, neither the harness nor the LLM is the AI. The computer (or system of computers) taken as a whole is the AI. You can't remove any piece and still have an intelligent system.

Jtarii 3 hours ago | parent | prev [-]

People are specifically talking about the engine itself and not the tools used.

We wouldn't call humans creating a calculator "recursive self improvement".

robbrown451 an hour ago | parent [-]

I wouldn't call the harness an AI, but I might call a tool that plays a major role in creating another one like it "recursive self improvement." For instance in the industrial revolution a metal lathe and a milling machine were instrumental in creating the next generation of themselves. Same thing with a robot that is fabricated by similar (i.e. older model of the same) robots. All of them lead to exponential improvement.

lanthissa 5 hours ago | parent | prev | next [-]

yes? the future for any verifiable task is the model attempts to verify initial state and a goal then decomposes its tasks in to every smaller verifiable subtasks, with /memory being the persistence between runs and then /dreaming on the results of those memory files + run data to introduce new ideas.

i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.

maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.

my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.

marcosdumay 3 hours ago | parent | prev | next [-]

You need the AI eventually building another AI for the name to apply. This page is just bullshit. They vibe-code their harnesses, and yes, it shows.

Anyway, what does recursive self-improvement even means for neural-network based AIs? It's not clear it's possible at all.

robbrown451 an hour ago | parent [-]

Where do you see evidence of vibe coding the harness? (and who are you talking about, Anthropic or the link I shared?)

It seems odd to complain about a AI coding tool being coded with AI. That's just eating your own dog food. In my opinion it makes it better, because the tool is very well tested.

cyanydeez 5 hours ago | parent | prev | next [-]

If you want to get out ahead of what's coming, it'll be small models that bootstrap the harness rather than anything else.

robbrown451 5 hours ago | parent [-]

I used to think that, but ended up going the other direction, partly because I don't have the wherewithall to build a model but then I realized, with existing models that can take more than a tiny amount of context, you can just let any model bootstrap itself with a good prompt sent by the system.

There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)

That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.

andai 5 hours ago | parent | next [-]

> mine also makes extensive use of things like abstract syntax tree library to help with surgical edits from the LLM

Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)

robbrown451 5 hours ago | parent [-]

Start off with my video!!! You can also try it with zero setup (you can code right there on the static web page, it will save your edits in the browser indexed DB, and hotpatch them back into the code before it runs it.... also you can grant permission to the browser to read/write to a local directory)

recursi.dev

Seriously, I'm looking for collaborators.

There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.

cyanydeez 4 hours ago | parent | prev [-]

I'm aware we're not there yet, but think of something like https://chatjimmy.ai/ ; at some point, you're going to be able to dynamically build the harness so it creates the necessary consistency & dynamicism at a speed unheard of.

But yes, I'm aware no ones got anywhere near there, mostly because most of the focus is on exploding the context and parameters. I'm saying that phase is done.

robbrown451 4 hours ago | parent [-]

I'm not sure what I am looking at with chatjimmy.... what is special about it? Speed?

I'm also not sure what you mean by "we aren't there yet." Where?

Sorry, not trying to be difficult or dense, I'm just not sure what you are referring to.

> mostly because most of the focus is on exploding the context and parameters.

Large context allows a surprising amount of "learning" to happen at inference time rather than training time. I think that is relatively unexplored. As long as the model itself has passed a certain threshold of smarts, and the context is large enough (Gemini and its million token context being WAY past that point) you are not really limited by the model, you are only limited by how good the stuff you feed into that context is.

That's what happened when, nearly a year ago, I saw a major leap in capabilities that happened entirely on my end.... not in the AI, but in code written by the AI. I found it genuinely frighting to be honest. I think OpenClaw tapped into something similar, which seemed to surprise a lot of people. There were latent capabilities in the AI that were unknown until brought out by a clever harness.

cyanydeez 3 hours ago | parent [-]

image a streamlined model whose only job is to build then execute the harness at the speed youre seeing in chat jimmy.

robbrown451 an hour ago | parent [-]

Speed isn't really a big deal for me. I want good quality code. It's already able to generate code 10-100X as fast as I could code it myself.

Anyway, are you speaking of the harness? The harness on mine isn't AI, so speed just isn't an issue.

reddozen 4 hours ago | parent | prev [-]

> Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?

Shhh just let the marketing slop wash over you.