Remix.run Logo
sigmar 3 days ago

Don't think that analogy works unless you could write a script that automatically removes incorrect medical advice, because then you would indeed have an LLM-with-a-script that was an expert doctor (which you can do for illegal chess move, but obviously not for evaluating medical advice)

wavemode 3 days ago | parent | next [-]

You can write scripts that correct bad math, too. In fact most of the time ChatGPT will just call out to a calculator function. This is a smart solution, and very useful for end users! But, still, we should not try to use that to make the claim that LLMs have a good understanding of math.

afro88 3 days ago | parent | next [-]

If a script were applied that corrected "bad math" and now the LLM could solve complex math problems that you can't one-shot throw at a calculator, what would you call it?

sixfiveotwo 3 days ago | parent | next [-]

It's a good point.

But this math analogy is not quite appropriate: there's abstract math and arithmetic. A good math practitioner (LLM or human) can be bad at arithmetic, yet good at abstract reasoning. The later doesn't (necessarily) requires the former.

In chess, I don't think that you can build a good strategy if it relies on illegal moves, because tactics and strategies are tied.

danparsonson 3 days ago | parent | prev | next [-]

If I had wings, I'd be a bird.

Applying a corrective script to weed out bad answers is also not "one-shot" solving anything, so I would call your example an elaborate guessing machine. That doesn't mean it's not useful, but that's not how a human being does maths, when they understand what they're doing - in fact you can readily program a computer to solve general maths problems correctly the first time. This is also exactly the problem with saying that LLMs can write software - a series of elaborate guesses is undeniably useful and impressive, but without a corrective guiding hand, ultimately useless, and not demonastrating generalised understanding of the problem space. The dream of AI is surely that the corrective hand is unnecessary?

at_a_remove 3 days ago | parent | prev [-]

Then you could replace the LLM with a much cheaper RNG and let it guess until the "bad math filter" let something through.

I was once asked by one of the Clueless Admin types if we couldn't just "fix" various sites such that people couldn't input anything wrong. Same principle.

vunderba 3 days ago | parent | prev | next [-]

Agreed. It's not the same thing and we should strive for precision (LLMs are already opaque enough as it is).

An LLM that recognizes an input as "math" and calls out to a NON-LLM to solve the problem vs an LLM that recognizes an input as "math" and also uses next-token prediction to produce an accurate response ARE DIFFERENT.

henryfjordan 3 days ago | parent | prev [-]

At what point does "knows how to use a calculator" equate to knowing how to do math? Feels pretty close to me...

Tepix 3 days ago | parent [-]

Well, LLMs are bad at math but they're ok at detecting math and delegating it to a calculator program.

It's kind of like humans.

kcbanner 3 days ago | parent | prev [-]

It would be possible to employ an expert doctor, instead of writing a script.

ben_w 3 days ago | parent [-]

Which is cheaper:

1. having a human expert creating every answer

or

2. having an expert check 10 answers each of which have a 90% chance of being right and then manually redoing the one which was wrong

Now add a complications that:

• option 1 also isn't 100% correct

• nobody knows which things in option 2 are correlated or not and if those are or aren't correlated with human errors so we might be systematically unable to even recognise the errors

• even if we could, humans not only get lazy without practice but also get bored if the work is too easy, so a short-term study in efficiency changes doesn't tell you things like "after 2 years you get mass resignations by the competent doctors, while the incompetent just say 'LGTM' to all the AI answers"