Remix.run Logo
wavemode 3 days ago

You can write scripts that correct bad math, too. In fact most of the time ChatGPT will just call out to a calculator function. This is a smart solution, and very useful for end users! But, still, we should not try to use that to make the claim that LLMs have a good understanding of math.

afro88 3 days ago | parent | next [-]

If a script were applied that corrected "bad math" and now the LLM could solve complex math problems that you can't one-shot throw at a calculator, what would you call it?

sixfiveotwo 3 days ago | parent | next [-]

It's a good point.

But this math analogy is not quite appropriate: there's abstract math and arithmetic. A good math practitioner (LLM or human) can be bad at arithmetic, yet good at abstract reasoning. The later doesn't (necessarily) requires the former.

In chess, I don't think that you can build a good strategy if it relies on illegal moves, because tactics and strategies are tied.

danparsonson 3 days ago | parent | prev | next [-]

If I had wings, I'd be a bird.

Applying a corrective script to weed out bad answers is also not "one-shot" solving anything, so I would call your example an elaborate guessing machine. That doesn't mean it's not useful, but that's not how a human being does maths, when they understand what they're doing - in fact you can readily program a computer to solve general maths problems correctly the first time. This is also exactly the problem with saying that LLMs can write software - a series of elaborate guesses is undeniably useful and impressive, but without a corrective guiding hand, ultimately useless, and not demonastrating generalised understanding of the problem space. The dream of AI is surely that the corrective hand is unnecessary?

at_a_remove 3 days ago | parent | prev [-]

Then you could replace the LLM with a much cheaper RNG and let it guess until the "bad math filter" let something through.

I was once asked by one of the Clueless Admin types if we couldn't just "fix" various sites such that people couldn't input anything wrong. Same principle.

vunderba 3 days ago | parent | prev | next [-]

Agreed. It's not the same thing and we should strive for precision (LLMs are already opaque enough as it is).

An LLM that recognizes an input as "math" and calls out to a NON-LLM to solve the problem vs an LLM that recognizes an input as "math" and also uses next-token prediction to produce an accurate response ARE DIFFERENT.

henryfjordan 3 days ago | parent | prev [-]

At what point does "knows how to use a calculator" equate to knowing how to do math? Feels pretty close to me...

Tepix 3 days ago | parent [-]

Well, LLMs are bad at math but they're ok at detecting math and delegating it to a calculator program.

It's kind of like humans.