This is great, there is still so much potential in AI once we move beyond LLMs to specialized approaches like this.

EDIT: Look at all the people below just reacting to the headline and clearly not reading the posts. Aristotle (https://arxiv.org/abs/2510.01346) is key here folks.

EDIT2: It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.

▲

NewsaHackO a day ago | parent | next [-]

> It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.

I think it's because it comes off as you are saying that we should move off of GenAI, and alot of people use LLM when they mean GenAI.

	▲	D-Machine 20 hours ago \| parent [-]
		Ugh, you're right. This was not intended. Conflating LLMs with GenAI is a serious error, but you're right, it is obviously a far more common error than I realized. I clearly should have said "move beyond solely LLMs" or "move beyond LLMs in isolation", perhaps this would have avoided the confusion. This is a really hopeful result for GenAI (fitting deep models tuned by gradient descent on large amounts of data), and IMO this is possible because of specific domain knowledge and approaches that aren't there in the usual LLM approaches.

▲

XCSme a day ago | parent | prev | next [-]

> beyond LLMs to specialized approached

Do you mean that in this case, it was not a LLM?

▲

D-Machine a day ago | parent [-]

It could not be done without Aristotle (https://arxiv.org/pdf/2510.01346), as clearly described in Tao's posts.

▲

Davidzheng a day ago | parent | next [-]

Never mind what Aristotle is, verifier llm models are definitely strong enough to verify proofs of elementary methods used here.

▲

TeMPOraL a day ago | parent | prev [-]

Aristotle is an LLM system.

▲

D-Machine a day ago | parent [-]

"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"

It is far more than an LLM, and math != "language".

▲

TeMPOraL a day ago | parent [-]

> Aristotle integrates three main components (...)

The second one being backed by a model.

> It is far more than an LLM

It's an LLM with a bunch of tools around it, and a slightly different runtime that ChatGPT. It's "only" that, but people - even here, of all places - keep underestimating just how much power there is in that.

> math != "language".

How so?

▲

D-Machine a day ago | parent | next [-]

Transformer != LLM. See my edited top-level post. Just because Aristotle uses a transformer doesn't mean it is an LLM, just as Vision Transformers and AlphaFold use transformers but are not LLMs.

LLM = Large Language Model. Large refers to both the number of parameters (and in practice, depth) of the model, and also implicitly the amount of data used for training, and "language" means human (i.e. written, spoken) language. A Vision Transformer is not an LLM because it is trained on images, and AlphaFold is not an LLM because it is trained molecular configurations.

Aristotle works heavily with formalized LEAN statements and expressions. While you can certainly argue this is a language of sorts, it is not at all the same "language" as the "language" in LLMs. Calling Aristotle an "LLM" just because it has a transformer is more misleading than truthful, because every other single aspect of it is far more clever and involved.

▲

TeMPOraL a day ago | parent [-]

The paper you keep linking literally says they're using a large language model (search for that very string in it).

	▲	D-Machine a day ago \| parent [-]
		Sigh. If I start with a pre-trained LLM architecture, and then do extensive further training / fine-tuning with different data and loss functions and custom similarity metrics for specialized search and specialized training procedures, and use feedback from other automated systems, we are far, far more than an LLM. That's the point. Calling something like this an LLM is as deeply misleading as calling AlphaFold an LLM. These tools goes far beyond simple LLMs. The special losses and metrics are really so important here and are why these tools can be so game-changing.

▲

XCSme a day ago | parent | prev [-]

I kind of agree, "math" can be a "language". Same as "images" can be a language. You can use anything as tokens.

	▲	raincole a day ago \| parent \| next [-]
		In this context, we're not even talking about "math" (as a broad, abstract concept). We're strictly talking about converting English to Lean. Both are just languages. Lean isn't just something that can be a language. It's a language. There is no reason or framing where you can say Aristotle isn't a language model.
	▲	TeMPOraL a day ago \| parent \| prev [-]
		That's true, and a good fundamental point. But here it's much simpler than that: math is a language the same way code is, and if there's one thing LLMs excel at, it's reading and writing code and translating back and forth between code and natural language.

▲

nextaccountic a day ago | parent | prev | next [-]

It was done by a LLM (ChatGPT)

▲

D-Machine a day ago | parent [-]

It could not be done without Aristotle (https://arxiv.org/pdf/2510.01346), did you even read the links?

▲

doormatt a day ago | parent [-]

And this is Aristotle - https://aristotle.harmonic.fun/

It's an LLM.

▲

D-Machine a day ago | parent [-]

"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"

It is far more than an LLM, and math != "language".

	▲	nextaccountic a day ago \| parent [-]
		Apparently Aristotle is a LLM with tool calling? Sounds similar to most coding agents

▲

stonogo a day ago | parent | prev | next [-]

Every stage of this 3-stage pipeline is an LLM.

1. "The search algorithm is a highly parallel Monte Carlo Graph Search (MCGS) using a large transformer as its policy and value functon." ... "We use a generative policy to take progressively widened [7] samples from the large action space of Lean tactics, conditioning on the Lean proof state, proof history, and, if available, an informal proof. We use the same model and prompt (up to a task token) to compute the value function which guides the search."

See that 'large transformer' phrase? That's where the LLM is involved.

2. "A lemma-based informal reasoning system which generates informal proofs of mathematical state-ments, breaks these proofs down into lemmas, formalizes each lemma into Lean, and iterates this process based on formal feedback" ... "First, the actions it generates consist of informal comments in addition to Lean tactics. Second, it uses a hidden chain of thought with a dynamically set thinking budget before predicting an action."

Unless you're proposing that this team solved AGI, "chain of thought" is a specific term of art in LLMs.

3. "A geometry solver which solves plane geometry problems outside of Lean using an approach based on AlphaGeometry [45]." ... following the reference: "AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. "

AlphaGeometry, like all of Deepmind's Alpha tools, is an LLM.

Instead of accusing people of not reading the paper, perhaps you should put some thought into what the things in the paper actually represent.

▲

D-Machine a day ago | parent [-]

If you think "transformer" = LLM, you don't understand the basic terminology of the field. This is like calling AlphaFold an LLM because it uses a transformer.

▲

stonogo a day ago | parent [-]

No, it isn't. They call out ExIt as an inspiration as well as AlphaZero, and the implementation of these things (available in many of their authors' papers) is almost indistinguishable from LLMs. The architecture isn't novel, which is why this paper is about the pipeline instead of about any of the actual processing tools. Getting prickly about meaningless terminology differences is definitely your right, but for anyone who isn't trying to define a policy algorithm for a transformer network, the difference is immaterial to understanding the computation involved.

	▲	D-Machine a day ago \| parent [-]
		Equating LLMs and transformers is not a meaningless terminology difference at all, Aristotle is so different from the things people call LLMs in terms of training data, loss function, and training that this is a grievous error.

▲

ctoth a day ago | parent | prev [-]

[flagged]

	▲	D-Machine a day ago \| parent [-]
		Did you not read the parts where Aristotle (https://arxiv.org/pdf/2510.01346) was an integral component of all this?