I believe that it may be misguided to focus on compute that much, and it would be more instructive to consider the effort that went into curating the training set. The easiest way of solving math problems with an LLM is to make sure that very similar problems are included in the training set. Many of the AI achievements would probably look a lot less miraculous if one could check the training data. The most crass example is OpenAI paying off the FrontierMath creators last year to get exclusive secret access to the problems before the evaluation [1]. Even without resorting to cheating, competition formats are vulnerable to this. It is extremely difficult to come up with truly original questions, so by spending significant resources on re-hashing all kinds of permutations of previous question, one will probably end up very close to the actual competition set. The first rule I learned about training neural networks is to make damn sure there is no overlap between the training and validation sets. It it interesting that this rule has gone completely out of the window in the age of LLMs.

[1] https://www.lesswrong.com/posts/8ZgLYwBmB3vLavjKE/some-lesso...

▲

OtherShrezzing 2 days ago | parent | next [-]

> The easiest way of solving math problems with an LLM is to make sure that very similar problems are included in the training set. Many of the AI achievements would probably look a lot less miraculous if one could check the training data

I'm fairly certain this phenomenon is responsible for LLM capabilities on GeoGuesser type games. They have unreasonably good performance. For example, being able to identify obscure locations from featureless/foggy pictures of a bench. GeoGuesser's entire dataset, including GPS metadata, is definitely included in all of the frontier model training datasets - so it should be unsurprising that they have excellent performance in that domain.

▲

ACCount36 a day ago | parent | next [-]

People tried VLMs on "closed set" GeoGuessr-type tasks - i.e. non-Street View photos in similar style, not published anywhere.

They still kicked ass.

It seems like those AIs just have an awful lot of location familiarity. They've seen enough tagged photos to be able to pick up on the patterns, and generalize that to kicking ass at GeoGuessr.

▲

YetAnotherNick 2 days ago | parent | prev [-]

> GeoGuesser's entire dataset

No, it is not included, however there must be quite a lot of pictures on internet for most cities.. Geoguesser data is same as Google's street view data and it probably contains billions of 360 degree photos.

▲

suddenlybananas 2 days ago | parent | next [-]

Why do you say it's not included? Why wouldn't they include it.

▲

sebzim4500 a day ago | parent [-]

If every photo in streetview was included in the training data of a multimodal LLM it would be like 99.9999% of the training data/resource costs.

It just isn't plausible that anyone has actually done that. I'm sure some people include a small sample of them, though.

▲

bluefirebrand a day ago | parent | next [-]

Why would every photo in streetview be required in order to have Geoguessr's dataset in the training data?

▲

bee_rider a day ago | parent [-]

I’m pretty sure they are saying that Geoguessr's just pulls directly from Google Streetview. There isn’t a separate Geoguessr dataset, it just pulls from Google’s API (at least that’s what Wikipedia says).

▲

bluefirebrand a day ago | parent [-]

I suspect that Geoguessr's dataset is a subset of Google Streetview, but maybe it really is just pulling everything directly

	▲	bee_rider a day ago \| parent [-]
		My guess would be that they pull directly from street-view, maybe with some extra filtering for interesting locations. Why bother to create a copy, if it can be avoided, right?

▲

clbrmbr 21 hours ago | parent | prev [-]

Yet.

This is a good rebuttal when someone quips that we “are about to run out of data”. There’s oh so much more, just not in the form of books and blogs.

▲

ivape 2 days ago | parent | prev [-]

I just saw a video on Reddit where a woman still managed to take a selfie while being literally face to face with a black bear. There’s definitely way too much video training data out there for everything.

	▲	lutusp a day ago \| parent [-]
		> I just saw a video on Reddit where a woman still managed to take a selfie while being literally face to face with a black bear. This is not uncommon. Bears aren't always tearing people apart, that's a movie trope with little connection to reality. Black bears in particular are smart and social enough to befriend their food sources. But a hungry bear, or a bear with cubs, that's a different story. Even then bears may surprise you. Once in Alaska, a mama bear got me to babysit her cubs while she went fishing -- link: https://arachnoid.com/alaska2018/bears.html .

▲

eru 19 hours ago | parent | prev | next [-]

> It is extremely difficult to come up with truly original questions, [...]

No, that's actually really easy. What's hard is coming up with original questions of a specific level of difficulty. And that's what you need for a competition.

To elaborate: it's really easy to find lots and lots of elementary, unsolved questions. But it's not clear whether you can actually solve them or how hard solving them is, so it's hard to judge the performance of LLMs on them.

> It it interesting that this rule has gone completely out of the window in the age of LLMs.

No, it hasn't.

▲

astrange 2 days ago | parent | prev | next [-]

> The easiest way of solving math problems with an LLM is to make sure that very similar problems are included in the training set.

An irony here is that math blogs like Tao's might not be in LLM training data, for the same reason they aren't accessible to screen readers - they're full of math, and the math is rendered as images, so it's nonsense if you can't read the images.

(The images on his blog do have alt text, but it's just the LaTeX code, which isn't much better.)

▲

alansammarone 2 days ago | parent | next [-]

As others have pointed out, LLMs have no trouble with LaTeX. I can see why one might think they're not - in fact, I made the same assumption myself sometime ago. LLMs, via transformers, are exceptionally good any _any_ sequence or one-dimensional data. One very interesting (to me anyway) example is base64 - pick some not-huge sentence (say, 10 words), base64-encode it, and just paste it in any LLM you want, and it will be able to understand it. Same works with hex, ascii representation, or binary. Here's a sample if you want to try: aWYgYWxsIEEncyBhcmUgQidzLCBidXQgb25seSBzb21lIEIncyBhcmUgQydzLCBhcmUgYWxsIEEncyBDJ3M/IEFuc3dlciBpbiBiYXNlNjQu

I remember running this experiment some time ago in a context where I was certain there was no possibility of tool use to encode/decode. Nowadays, it can be hard to certain whether there is any tool use or not, in some cases, such as Mistral, the response is quick enough to make it unlikely there's any tool use.

▲

throwanem a day ago | parent [-]

I've just tried it, in the form of your base64 prompt and no other context, with a local Qwen-3 30b instance that I'm entirely certain is not actually performing tool use. It produced a correct answer ("Tm8="), which in a moment of accidental comedy it spontaneously formatted with LaTeX. But it did talk about invoking an online decoder, just before the first appearance of the (nearly) complete decoded string in its CoT.

It "left out" the A in its decode and still correctly answered the proposition, either out of reflexive familiarity with the form or via metasyntactic reasoning over an implicit anaphor; I believe I recall this to be a formulation of one of the elementary axioms of set theory, though you will excuse me for omitting its name before coffee, which makes the pattern matching possibility seem somewhat more feasible. ('Seem' may work a little too hard there. But a minimally more novel challenge I think would be needed to really see more.)

There's lots of text in lots of languages about using an online base64 decoder, and nearly none at all about decoding the representation "in your head," which for humans would be a party trick akin to that one fellow who could see a city from a helicopter for 30 seconds and then perfectly reproduce it on paper from memory. It makes sense to me that a model trained on the Internet would "invent" the "metaphor" of an online decoder here, I think. What in its "experience" serves better as a description?

▲

kaffekaka 17 hours ago | parent [-]

I assume you're referring to Stephen Wiltshire: https://en.m.wikipedia.org/wiki/Stephen_Wiltshire

	▲	throwanem 12 hours ago \| parent [-]
		I am! Good grief, it must have been thirty years ago I saw that news story, and apparently I misremembered several whole decades onto his age; I hadn't imagined he would still be alive. Thank you!

▲

prein 2 days ago | parent | prev | next [-]

What would be a better alternative than LaTex for the alt text? I can't think of a solution that makes more sense, it provides an unambiguous representation of what's depicted.

I wouldn't think an LLM would have issue with that at all. I can see how a screen reader might, but it seems like the same problem faced by a screen reader with any piece of code, not just LaTex.

▲

mbowcut2 a day ago | parent | prev | next [-]

LLMs are better at LaTeX than humans. ChatGPT often writes LaTeX responses.

	▲	neutronicus a day ago \| parent [-]
		Yeah, it's honestly one of the things they're best at! I've been working on implementing some E&M simulations with Claude Code and it's so-so on the C++ and TERRIBLE at the actual math (multiplying a couple 6x6 matrix differential operators is beyond it). But I can dash off some notes and tell Claude to TeXify and the output is great.

▲

QuesnayJr 2 days ago | parent | prev | next [-]

LLMs understand LaTeX extraordinarily well.

▲

constantcrying a day ago | parent | prev | next [-]

>(The images on his blog do have alt text, but it's just the LaTeX code, which isn't much better.)

LLMs are extremely good at outputting LaTeX, ChatGPT will output LaTeX, which the website will render as such. Why do you think LLMs have trouble understanding it?

	▲	astrange a day ago \| parent [-]
		I don't think LLMs will have trouble understanding it. I think people using screen readers will. …oh I see, I accidentally deleted the part of the comment about that. But the people writing the web page extraction pipelines also have to handle the alt text properly.

▲

MengerSponge 2 days ago | parent | prev [-]

LLMs are decent with LaTeX! It's just markup code after all. I've heard from some colleagues that they can do decent image to code conversion for a picture of an equation or even some handwritten ones.

▲

disruptbro 2 days ago | parent | prev [-]

Language modeling is compression, whittle down graph to reduce duplication and data with little relationship: https://arxiv.org/abs/2309.10668

Let’s say everyone agrees to refer to one hosted copy of a token “cat”, and instead generate a unique vector to represent their reference to “cat”.

Blam. Endless unique vectors which are nice and precise for parsing. No endless copies of arbitrary text like “cat”.

Now make that your globally distributed data base to bootstrap AI chips from. The data driven programming dream where other machines on the network feed new machines boot strap.

American tech industry is IBM now. Stuck on recent success of web SaaS and way behind the plans of AI.