Based on Tao’s description of how the proof came about - a human is taking results backwards and forwards between two separate AI tools and using an AI tool to fill in gaps the human found?

I don’t think it can really be said to have occurred autonomously then?

Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.

EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

▲

D-Machine a day ago | parent | next [-]

You're understanding correctly, this is back and forth between Aristotle and ChatGPT and a (very smart) user.

▲

MyFirstSass a day ago | parent | next [-]

I'm not sure i understand the wild hype here in this thread then.

Seems exactly like the tests at my company where even frontier models are revealed to be very expensive rubber ducks, but completely fails with non experts or anything novel or math heavy.

Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

▲

markusde a day ago | parent | next [-]

Yes, the contributions of the people promoting the AI should be considered, as well as the people who designed the Lean libraries used in-the-loop while the AI was writing the solution. Any talk of "AGI" is, as always, ridiculous.

But speaking as a specialist in theorem proving, this result is pretty impressive! It would have likely taken me a lot longer to formalize this result even if it was in my area of specialty.

▲

falcor84 a day ago | parent [-]

> Any talk of "AGI" is, as always, ridiculous.

How did you arrive at "ridiculous"? What we're seeing here is incredible progress over what we had a year ago. Even ARC-AGI-2 is now at over 50%. Given that this sort of process is also being applied to AI development itself, it's really not clear to me that humans would be a valuable component in knowledge work for much longer.

▲

DiscourseFan a day ago | parent | next [-]

It requires constant feedback, critical evaluation, and checks. This is not AGI, its cognitive augmentation. One that is collective, one that will accelerate human abilities far beyond what the academic establishment is currently capable of, but that is still fundamentally organic. I don't see a problem with this--AGI advocates treat machine intelligence like some sort of God that will smite non-believers and reward the faithful. This is what we tell children so that they won't shit their beds at night, otherwise they get a spanking. The real world is not composed of rewards and punishments.

▲

komali2 20 hours ago | parent | next [-]

It does seem that the venn diagram of "roko's basilisk" believers and "AGI is coming within our lifetimes" believers is nearly a circle. Would be nice if there were some less... religious... arguments for AGI's imminence.

	▲	DiscourseFan 20 hours ago \| parent [-]
		I think the “Roko’s Basilisk” thing is mostly a way for readers of Nick Land to explain part of his philosophical perspective without the need for, say, an actual background in philosphy. But the simplicity reduces his nuanced thought into a call for a sheeplike herd—they don’t even need a shepherd! Or perhaps there is, but he is always yet to come…best to stay in line anyway, he might be just around the corner.

▲

falcor84 17 hours ago | parent | prev | next [-]

> It requires constant feedback, critical evaluation, and checks. This is not AGI, its cognitive augmentation.

To me that doesn't sound qualitatively different from a PhD student. Are they just cognitive augmentation for their mentor?

In any case, I wasn't trying to argue that this system as-is is AGI, but just that it's no longer "ridiculous", and that this to me looks like a herald of AGI, as the portion being done by humans gets smaller and smaller

	▲	DiscourseFan 12 hours ago \| parent [-]
		People would say the same thing about a calculator, or computation in general. Just like any machine it must be constructed purposefully to be useful, and once we require something which exceeds that purpose it must be constructed once again. Only time will tell the limits of human intelligence, now that AI is integrating into society and industry.

▲

frozenseven 15 hours ago | parent | prev [-]

>AGI advocates treat machine intelligence like some sort of God that will smite non-believers and reward the faithful.

>The real world is not composed of rewards and punishments.

Most "AGI advocates" say that AGI is coming, sooner rather than later, and it will fundamentally reshape our world. On its own that's purely descriptive. In my experience, most of the alleged "smiting" comes from the skeptics simply being wrong about this. Rarely there's talk of explicit rewards and punishments.

▲

DiscourseFan 12 hours ago | parent [-]

You should look into “Roko’s Basilisic,” its a genuine belief that often goes alongside that of AGI.

	▲	frozenseven 3 hours ago \| parent [-]
		I should be the target audience for this stuff, but I honestly can't name a single person who believes in this "Roko's basilisk" thing. To my knowledge, even the original author abandoned it. There probably are a small handful out there, but I've never seen 'em myself.

▲

markusde a day ago | parent | prev | next [-]

> it's really not clear to me that humans would be a valuable component in knowledge work for much longer.

To me, this sounds like when we first went to the moon, and people were sure we'd be on Mars be the end of the 80's.

> Even ARC-AGI-2 is now at over 50%.

Any measure of "are we close to AGI" is as scientifically meaningful as "are we close to a warp drive" because all anyone has to go on at this point is pure speculation. In my opinion, we should all strive to be better scientists and think more carefully about what an observation is supposed to mean before we tout it as evidence. Despite the name, there is no evidence that ARC-AGI tests for AGI.

▲

ogogmad 15 hours ago | parent [-]

> To me, this sounds like when we first went to the moon, and people were sure we'd be on Mars be the end of the 80's.

Unlike space colonisation, there are immediate economic rewards from producing even modest improvements in AI models. As such, we should expect much faster progress in AI than space colonisation.

But it could still turn out the same way, for all we know. I just think that's unlikely.

	▲	zeroonetwothree 13 hours ago \| parent [-]
		The minerals in the asteroid belt are estimated to be worth in the $100s of quintillions. I would say that’s a decent economic incentive to develop space exploration (not necessarily colonization, but it may make it easier).

▲

jacquesm a day ago | parent | prev | next [-]

You either have a case of human augmented AI here or AI augmented human. Either by themself would not have made the step.

	▲	emil-lp 21 hours ago \| parent [-]
		If I were to place my money, it would be ok Terence Tao.

▲

feastingonslop a day ago | parent | prev | next [-]

Excellent! Humans can then spend their time on other activities, rather than get bogged down in the mundane.

	▲	navels a day ago \| parent \| next [-]
		Other activites such as the sublime pursuit of truth and beauty . . . aka mathematics ;-)
	▲	latexr 17 hours ago \| parent \| prev [-]
		Not going to happen as long as the society we live in has this big of a hard on for capitalism and working yourself to the bone is seen as a virtue. Every time there’s a productivity boost, the newly gained free time is immediately consumed by more work. It’s a sick version of Parkinson’s law where work is infinite. https://en.wikipedia.org/wiki/Parkinson%27s_law

▲

a day ago | parent | prev | next [-]

[deleted]

▲

catlifeonmars a day ago | parent | prev [-]

“Much longer” is doing a lot of heavy lifting there.

▲

falcor84 17 hours ago | parent | next [-]

Let me put it like this: I expect AI to replace much of human wage labor over the next 20 years and push many of us, and myself almost certainly included, into premature retirement. I'm personally concerned that in a few years, I'll find my software proficiency to be as useful as my chess proficiency today is useful to Stockfish. I am afraid of a massive social upheaval both for myself and my family, and for society at large.

▲

dehsge 10 hours ago | parent | next [-]

There are other bounds here at play that are often not talked about.

Ai runs on computers. Consider the undecidability of Rices theorem. Where compiled code of non trivial statements may or may not be error free. Even an ai can’t guarantee its compiled code is error free. Not because it wouldn’t write sufficient code that solves a problem, but the code it writes is bounded by other externalities. Undecidability in general makes the dream of generative ai considerably more challenging than how it’s being ‘sold.

▲

chongli 16 hours ago | parent | prev [-]

Here “much of” is doing the heavy lifting. Are you willing to commit to a percentage or a range?

I work at an insurance company and I can’t see AI replacing even 10% of the employees here. Too much of what we do is locked up in decades-old proprietary databases that cannot be replaced for legal reasons. We still rely on paper mail for a huge amount of communication with policyholders. The decisions we make on a daily basis can’t be trusted to AI for legal reasons. If AI caused even a 1% increase in false rejections of claims it would be an enormous liability issue.

▲

falcor84 15 hours ago | parent [-]

Yes, absolutely willing to commit. I can't find a single reliable source, but from what I gather, over 70% of people in the West do "pure knowledge work", which doesn't include any embodied actuvities. I am happy to put my money that these jobs will start being fully taken over by AI rapidly soon (if they aren't already), and that by 2035, less than 50% of us will have a job that doesn't require "being there".

And regarding your example of an insurance company, I'm not sure about that industry, but seeing the transformation of banking over the last decade to fully digital providers like Revolut, I would expect similar disruption there.

	▲	zeroonetwothree 13 hours ago \| parent [-]
		I would easily take the other side of this bet. It just reminds me when everyone was sure back in 2010 that we’d have self driving cars within 10 years and human drivers would be obsolete. Today replacing human drivers fully is still about 10 years away.

▲

markusde a day ago | parent | prev [-]

As is "even if it was in my area of specialty". I would not be able to do this proof, I can tell you that much.

▲

jacquesm a day ago | parent | prev | next [-]

This accurately mirrors my experience. It never - so far - has happened that the AI brought any novel insight at the level that I would see as an original idea. Presumably the case of TFA is different but the normal interaction is that that the solution to whatever you are trying to solve is a millimeter away from your understanding and the AI won't bridge that gap until you do it yourself and then it will usually prove to you that was obvious. If it was so obvious then it probably should have made the suggestion...

Recent case:

I have a bar with a number of weights supported on either end:

|---+-+-//-+-+---|

What order and/or arrangement or of removing the weights would cause the least shift in center-of-mass? There is a non-obvious trick that you can pull here to reduce the shift considerably and I was curious if the AI would spot it or not but even after lots of prompting it just circled around the obvious solutions rather than to make a leap outside of that box and come up with a solution that is better in every case.

I wonder what the cause of that kind of blindness is.

▲

jiggawatts 20 hours ago | parent | next [-]

That problem is not clearly stated, so if you’re pasting that into an AI verbatim you won’t get the answer you’re looking for.

My guess is: first move the weights to the middle, and only then remove them.

However “weights” and “bar” might confuse both machines and people into thinking that this is related to weight lifting, where there’s two stops on the bar preventing the weights from being moved to the middle.

▲

jacquesm 16 hours ago | parent [-]

The problem is stated clearly enough that humans that we ask the question of will sooner or later see that there is an optimum and that that optimum relies on understanding.

And no, the problem is not 'not clearly stated'. It is complete as it is and you are wrong about your guess.

And if machines and people think this is related to weight lifting then they're free to ask follow up questions. But even in the weight lifting case the answer is the same.

▲

red75prime 16 hours ago | parent [-]

Illusion of transparency. You are imagining yourself asking this question, while standing in the gym and looking at the bar (or something like this). I, for example, have no idea how the weights are attached and which removal actions are allowed.

Yeah, LLMs have a tendency to run with some interpretation of a question without asking follow-up questions. Probably, it's a consequence of RLHFing them in that way.

▲

jacquesm 15 hours ago | parent [-]

And none of those details matter to solve the problem correctly. I'm purposefully not putting any answers here because I want to see if future generations of these tools suddenly see the non-obvious solution. But you are right about the fact that the details matter, one detail is mentioned very explicitly that holds the key.

If you do solve it don't post the answer.

▲

Mawr 13 hours ago | parent [-]

Sure they, do, the problem makes no sense as stated. The solution to the stated problem is to remove all weights all at once, solved. Or even two at a time, opposite the centre of gravity. Solved, but not what you're asking I assume?

You didn't even label your ASCII art, so I've no clue what you mean, are the bars at the end the supports or weights? Can I only remove one weight at a time? Initially I assumed you mean a weightlifting bar the weights on which can only be removed from its ends. Is that the case or what? What's the double slash in the middle?

Also: "what order and/or arrangement or of removing the weights" this isn't even correct English. Arrangement of removing the weights? State the problem clearly, from first principles, like you were talking to a 5 year old.

The sibling comment is correct, you're clearly picturing something in your mind that you're failing to properly describe. It seems obvious to you, but it's not.

▲

jacquesm 12 hours ago | parent [-]

And yet, two people have solved it independently, so apparently it is adequately specified for some.

	▲	jiggawatts 6 hours ago \| parent [-]
		“Luck is not a strategy.” I can successfully interpret total gibberish sometimes, but that’s not a robust approach even with humans let alone machines. People have wildly different experiences utilising AI because of their own idiosyncrasies more than issues with the tools themselves. It was pointed out by multiple groups (such as Anthropic) that their tools do a lot better with well organised codebases that are liberally commented. I’ve worked on codebases where the AIs are just… lost. So are people! Sure, some people can navigate the spaghetti… sometimes… but the success rate of changes is much lower. Occasional success is not proof of correctness of approach. Consistent success is.

▲

TeodorDyakov 18 hours ago | parent | prev | next [-]

Tokenizationnnnnnn

▲

ogogmad 15 hours ago | parent | prev [-]

The problem is unclear. I think you have a labelled graph G=(V, E) with labels c:V->R, such that each node in V consists of a triple (L, R, S) where L is a sequence of weights are on the left, R is a sequence of weights that are on the right, and S is a set of weight that have been taken off. Define c(L, R, S) to be the centre of mass. Introduce an undirected edge e={(L, R, S), (L', R', S')} between (L, R, S) and (L', R', S') either if (i) (L', R', S') results from taking the first weight off L and adding it to S, or (ii) (L', R', S') results from taking the first weight off R and adding it to S, or (iii) (L', R', S') results from taking a weight from W and adding it to L, or (iv) (L', R', S') results from taking a weight from W and adding it to R.

There is a starting node (L_0, R_0, {}) and an ending node ({}, {}, W) , with the latter having L=R={}.

I think you're trying to find the path (L_n, R_n, S_n) from the starting node to the ending node that minimises the maximum absolute value of c(L_n, R_n, S_n).

I won't post a solution, as requested.

	▲	jacquesm 14 hours ago \| parent [-]
		You are overthinking it.

▲

encyclopedism 14 hours ago | parent | prev | next [-]

Lots of users seem to think LLM's think and reason so this sounds wonderful. A mechanical process isn't thinking, certainly it does NOT mirror human thinking. The processes being altogether different.

▲

krzat 21 hours ago | parent | prev | next [-]

In other words, LLMs work best when *you are absolutely right" and "this is a very insightful question" are actually true.

▲

EA-3167 9 hours ago | parent | prev | next [-]

Do you have any idea how many people here have paychecks that depend on the hype, or hope to be in that position? They were the same way for Crypto until it stopped being part of the get-rich-quick dream.

▲

Davidzheng a day ago | parent | prev | next [-]

The proof is ai generated?

▲

MyFirstSass a day ago | parent [-]

Eh? The text reads:

"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"

Not saying it's not an amazing setup, i just don't understand the word "AI" being used like this when it's the setup / system that's brilliant in conjunction with absolute experts.

	▲	kortex a day ago \| parent \| next [-]
		That's literally AI though. AI has been around formally since 1956. https://en.wikipedia.org/wiki/Dartmouth_workshop AI != AGI != neural networks != LLMs But Tao did mention ChatGPT so i believe LLMs were involved at least partially.
	▲	a day ago \| parent \| prev [-]
		[deleted]

▲

SecretDreams a day ago | parent | prev | next [-]

> Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

This hits so true to home. Just today in my field a manager without expertise in a topic gave me an AI solution to something I am an expertise in. The AI was very plainly and painfully wrong, but it comes down to the user prompting really poorly. When I gave a el formulated prompt to the same topic, I got the correct answer on the first go.

▲

HDThoreaun a day ago | parent | prev | next [-]

"the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument." From the Tao thread. The ability to quickly iterate on research is a big change because "This is sharp contrast to existing practice where....large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors."

▲

anthem2025 a day ago | parent | prev [-]

[dead]

▲

adityaathalye 20 hours ago | parent | prev | next [-]

Exactly "The Geordi LaForge Paradox" of "AI" systems. The most sophisticated work requires the most sophisticated user, who can only become sophisticated the usual way --- long hard work, trial and error, full-contact kumite with reality, and a degree of devotion to the field.

▲

NooneAtAll3 a day ago | parent | prev [-]

https://www.erdosproblems.com/forum/thread/728#post-2808

> There seems to be some confusion on this so let me clear this up. No, after the model gave its original response, I then proceeded to ask it if it could solve the problem with C=k/logN arbitrarily large. It then identified for itself what both I and Tao noticed about it throwing away k!, and subsequently repaired its proof. I did not need to provide that observation.

so it was literally "yo, your proof is weak!" - "naah, watch this! [proceeds to give full proof all on its own]"

I'd say that counts

▲

jasonfarnon a day ago | parent | prev | next [-]

I had the impression Tao/community weren't even finding the gaps, since they mentioned using an automatic proof verifier. And that the main back and forth involved re-reading Erdos' paper to find out the right problem Erdos intended. So more like 90/10 LLM/human. Maybe I misread it.

	▲	NewsaHackO a day ago \| parent [-]
		This is what I got from Tao's post as well.

▲

dpacmittal a day ago | parent | prev | next [-]

There's a lot more detail in this reddit post from the author - https://www.reddit.com/r/OpenAI/comments/1q6yw5g/how_we_used...

▲

Tenobrus a day ago | parent | prev | next [-]

strongly think you should go read the thread to get a sense of the level of expertise and amount of effort put in by the humans involved: https://www.erdosproblems.com/forum/thread/728#post-2852

▲

mmphosis a day ago | parent | prev | next [-]

This website was made by Thomas Bloom, a mathematician who likes to think about the problems Erdős posed. Technical assistance with setting up the code for the website was provided by ChatGPT -from the FAQ

▲

a day ago | parent | prev | next [-]

[deleted]

▲

Davidzheng a day ago | parent | prev | next [-]

Do you need to be a super expert to find gaps in proofs? Debatable

▲

naasking 14 hours ago | parent | prev | next [-]

> EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

"solved more or less autonomously by AI" were Tao's exact words, so I think we can trust his judgment about how much work he or the AI did, and how this indicates a meaningful increase in capabilities.

▲

Yeask a day ago | parent | prev [-]

Is a good economic decision to hype a bit the importance of the LLM$.