Remix.run Logo
visarga 9 hours ago

The story is entertaining, but it has a big fallacy - progress is not a function of compute or model size alone. This kind of mistake is almost magical thinking. What matters most is the training set.

During the GPT-3 era there was plenty of organic text to scale into, and compute seemed to be the bottleneck. But we quickly exhausted it, and now we try other ideas - synthetic reasoning chains, or just plain synthetic text for example. But you can't do that fully in silico.

What is necessary in order to create new and valuable text is exploration and validation. LLMs can ideate very well, so we are covered on that side. But we can only automate validation in math and code, but not in other fields.

Real world validation thus becomes the bottleneck for progress. The world is jealously guarding its secrets and we need to spend exponentially more effort to pry them away, because the low hanging fruit has been picked long ago.

If I am right, it has implications on the speed of progress. Exponential friction of validation is opposing exponential scaling of compute. The story also says an AI could be created in secret, which is against the validation principle - we validate faster together, nobody can secretly outvalidate humanity. It's like blockchain, we depend on everyone else.

tomp 7 hours ago | parent | next [-]

Did we read the same article?

They clearly mention, take into account and extrapolate this; LLM have first scaled via data, now it's test time compute, but recent developments (R1) clearly show this is not exhausted yet (i.e. RL on synthetically (in-silico) generated CoT) which implies scaling with compute. The authors then outline further potential (research) developments that could continue this dynamic, literally things that have already been discovered just not yet incorporated into edge models.

Real-world data confirms their thesis - there have been a lot of sceptics about AI scaling, somewhat justified ("whoom" a.k.a. fast take-off hasn't happened - yet) but their fundamental thesis has been wrong - "real-world data has been exhausted, next algorithmic breakthroughs will be hard and unpredictable". The reality is, while data has been exhausted, incremental research efforts have resulted in better and better models (o1, r1, o3, and now Gemini 2.5 which is a huge jump! [1]). This is similar to how Moore's Law works - it's not given that CPUs get better exponentially, it still requires effort, maybe with diminishing returns, but nevertheless the law works...

If we ever get to models be able to usefully contribute to research, either on the implementation side, or on research ideas side (which they CANNOT yet, at least Gemini 2.5 Pro (public SOTA), unless my prompting is REALLY bad), it's about to get super-exponential.

Edit: then once you get to actual general intelligence (let alone super-intelligence) the real-world impact will quickly follow.

Jianghong94 7 hours ago | parent [-]

Well based on what I'm reading, the OP's intent is that, not all (hence 'fully') validation, if not most of, can be done in-silico. I think we all agree that and that's the major bottleneck making agents useful - you have to have human-in-the-loop to closely guardrail the whole process.

Of course you can get a lot of mileage via synthetically generated CoT but does that lead to LLM speed up developing LLM is a big IF.

tomp 7 hours ago | parent [-]

No, the entire point of this article is that when you get to self-improving AI, it will become generally intelligent, then you can use that to solve robotics, medicine etc. (like a generally-intelligent baby can (eventually) solve how to move boxes, assemble cars, do experiments in labs etc. - nothing special about a human baby, it's just generally intelligent).

Jianghong94 6 hours ago | parent | next [-]

Not only does the article claim that when we get to self-improving ai it becomes generally intelligent, it's assuming that AI is pretty close right now:

> OpenBrain focuses on AIs that can speed up AI research. They want to win the twin arms races against China (whose leading company we’ll call “DeepCent”)16 and their US competitors. The more of their research and development (R&D) cycle they can automate, the faster they can go. So when OpenBrain finishes training Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research.

> It’s good at this due to a combination of explicit focus to prioritize these skills, their own extensive codebases they can draw on as particularly relevant and high-quality training data, and coding being an easy domain for procedural feedback.

> OpenBrain continues to deploy the iteratively improving Agent-1 internally for AI R&D. Overall, they are making algorithmic progress 50% faster than they would without AI assistants—and more importantly, faster than their competitors.

> what do we mean by 50% faster algorithmic progress? We mean that OpenBrain makes as much AI research progress in 1 week with AI as they would in 1.5 weeks without AI usage.

To me, claiming today's AI IS capable of such thing is too hand-wavy. And I think that's the crux of the article.

polynomial 6 hours ago | parent | prev [-]

You had me at "nothing special about a human baby"

nikisil80 8 hours ago | parent | prev | next [-]

Best reply in this entire thread, and I align with your thinking entirely. I also absolutely hate this idea amongst tech-oriented communities that because an AI can do some algebra and program an 8-bit video game quickly and without any mistakes, it's already overtaking humanity. Extrapolating from that idea to some future version of these models, they may be capable of solving grad school level physics problems and programming entire AAA video games, but again - that's not what _humanity_ is about. There is so much more to being human than fucking programming and science (and I'm saying this as an actual nuclear physicist). And so, just like you said, the AI arm's race is about getting it good at _known_ science/engineering, fields in which 'correctness' is very easy to validate. But most of human interaction exists in a grey zone.

Thanks for this.

m11a 3 hours ago | parent | next [-]

> that's not what _humanity_ is about

I've not spent too long thinking on the following, so I'm prepared for someone to say I'm totally wrong, but:

I feel like the services economy can be broadly broken down into: pleasure, progress and chores. Pleasure being poetry/literature, movies, hospitality, etc; progress being the examples you gave like science/engineering, mathematics; and chore being things humans need to coordinate or satisfy an obligation (accountants, lawyers, salesmen).

In this case, if we assume AI can deal with things not in the grey zone, then it can deal with 'progress' and many 'chores', which are massive chunks of human output. There's not much grey zone to them. (Well, there is, but there are many correct solutions; equivalent pieces of code that are acceptable, multiple versions of a tax return, each claiming different deductions, that would fly by the IRS, etc)

wruza 6 hours ago | parent | prev | next [-]

programming entire AAA video games

Even this is questionable, cause we're seeing it making forms and solving leetcodes, but no llm yet created a new approach, reduced existing unnecessary complexity (which we created mountains of), made something truly new in general. All they seem to do is rehash of millions of "mainstream" works, and AAA isn't mainstream. Cranking up the parameter count or the time of beating around the bush (aka cot) doesn't magically substitute for lack of a knowledge graph with thick enough edges, so creating a next-gen AAA video game is far out of scope of llm's abilities. They are stuck in 2020 office jobs and weekend open source tech, programming-wise.

JFingleton 2 hours ago | parent | next [-]

"They are stuck in 2020 office jobs and weekend open source tech, programming-wise."

You say that like it's nothing special! Honestly I'm still in awe at the ability of modern LLMs to do any kind of programming. It's weird how something that would have been science fiction 5 years ago is now normalised.

m11a 3 hours ago | parent | prev [-]

"stuck" is a bit strong of a term. 6 months ago I remember preferring to write even Python code myself because Copilot would get most things wrong. My most successful usage of Copilot was getting it to write CRUD and tests. These days, I can give Claude Sonnet in Cursor's agent mode a high-level Rust programming task (e.g. write a certain macro that would allow a user to define X) and it'll modify across my codebase, and generally the thing just works.

At current rate of progress, I really do think in another 6 months they'll be pretty good at tackling technical debt and overcomplication, at least in codebases that have good unit/integration test coverage or are written in very strongly typed languages with a type-friendly structure. (Of course, those usually aren't the codebases needing significant refactoring, but I think AIs are decent at writing unit tests against existing code too.)

boshalfoshal 36 minutes ago | parent | prev | next [-]

I don't necessarily think you're wrong, and in general I do agree with you to an extent that this seems like self-centeted Computer Scientist/SWE hubris to think that automating programming is ~AGI.

HOWEVER there is a case to be made that software is an insanely powerful lever for many industries, especially AI. And if current AI gets good enough at software problems that it can improve its own infrastructure or even ideate new model architectures, then we would (in this hypothetical case), potentially reach an "intelligence explosion," which would (may) _actually_ yield a true, generalized intelligence.

So as a cynic, while I think the intermediary goal of many of these so-called-agi companies is just your usual SaaS automation slop because thats the easiest industry to disrupt and extract money from (and the people at these companies only really know how software works, as opposed to having knowledge of other things like chemistry, biology, etc), I also think that in theory, being a very fast and low cost programming agent is a bit more powerful than you think.

loandbehold 7 hours ago | parent | prev [-]

OK but getting good at science/engineering is what matters because that's what gives AI and people who wield it power. Once AI is able to build chips and datacenters autonomously, that's when singularity starts. AI doesn't need to understand humans or act human-like to do those things.

nfc 3 hours ago | parent | prev | next [-]

I agree with your point about the validation bottleneck becoming dominant over raw compute and simple model scaling. However, I wonder if we're underestimating the potential headroom for sheer efficiency breakthroughs at our levels of intelligence.

Von Neumann for example was incredibly brilliant, yet his brain presumably ran on roughly the same power budget as anyone else's. I mean, did he have to eat mountains of food to fuel those thoughts? ;)

So it looks like massive gains in intelligence or capability might not require proportionally massive increases in fundamental inputs at least at the highest levels of intelligence a human can reach, and if that's true for the human brain why not for other architecture of intelligence.

P.S. It's funny, I was talking about something along the lines of what you said with a friend just a few minutes before reading your comment so when I saw it I felt that I had to comment :)

the8472 5 hours ago | parent | prev | next [-]

Many tasks are amenable to simulation training and synthetic data. Math proofs, virtual game environments, programming.

And we haven't run out of all data. High-quality text data may be exhausted, but we have many many life-years worth of video. Being able to predict visual imagery means building a physical world model. Combine this passive observation with active experimentation in simulated and real environments and you get millions of hours of navigating and steering a causal world. Deepmind has been hooking up their models to real robots to let them actively explore and generate interesting training data for a long time. There's more to DL than LLMs.

7 hours ago | parent | prev [-]
[deleted]