| ▲ | sigmar 3 days ago |
| Don't think that analogy works unless you could write a script that automatically removes incorrect medical advice, because then you would indeed have an LLM-with-a-script that was an expert doctor (which you can do for illegal chess move, but obviously not for evaluating medical advice) |
|
| ▲ | wavemode 3 days ago | parent | next [-] |
| You can write scripts that correct bad math, too. In fact most of the time ChatGPT will just call out to a calculator function. This is a smart solution, and very useful for end users! But, still, we should not try to use that to make the claim that LLMs have a good understanding of math. |
| |
| ▲ | afro88 3 days ago | parent | next [-] | | If a script were applied that corrected "bad math" and now the LLM could solve complex math problems that you can't one-shot throw at a calculator, what would you call it? | | |
| ▲ | sixfiveotwo 3 days ago | parent | next [-] | | It's a good point. But this math analogy is not quite appropriate: there's abstract math and arithmetic. A good math practitioner (LLM or human) can be bad at arithmetic, yet good at abstract reasoning. The later doesn't (necessarily) requires the former. In chess, I don't think that you can build a good strategy if it relies on illegal moves, because tactics and strategies are tied. | |
| ▲ | danparsonson 3 days ago | parent | prev | next [-] | | If I had wings, I'd be a bird. Applying a corrective script to weed out bad answers is also not "one-shot" solving anything, so I would call your example an elaborate guessing machine. That doesn't mean it's not useful, but that's not how a human being does maths, when they understand what they're doing - in fact you can readily program a computer to solve general maths problems correctly the first time. This is also exactly the problem with saying that LLMs can write software - a series of elaborate guesses is undeniably useful and impressive, but without a corrective guiding hand, ultimately useless, and not demonastrating generalised understanding of the problem space. The dream of AI is surely that the corrective hand is unnecessary? | |
| ▲ | at_a_remove 3 days ago | parent | prev [-] | | Then you could replace the LLM with a much cheaper RNG and let it guess until the "bad math filter" let something through. I was once asked by one of the Clueless Admin types if we couldn't just "fix" various sites such that people couldn't input anything wrong. Same principle. |
| |
| ▲ | vunderba 3 days ago | parent | prev | next [-] | | Agreed. It's not the same thing and we should strive for precision (LLMs are already opaque enough as it is). An LLM that recognizes an input as "math" and calls out to a NON-LLM to solve the problem vs an LLM that recognizes an input as "math" and also uses next-token prediction to produce an accurate response ARE DIFFERENT. | |
| ▲ | henryfjordan 3 days ago | parent | prev [-] | | At what point does "knows how to use a calculator" equate to knowing how to do math? Feels pretty close to me... | | |
| ▲ | Tepix 3 days ago | parent [-] | | Well, LLMs are bad at math but they're ok at detecting math and delegating it to a calculator program. It's kind of like humans. |
|
|
|
| ▲ | kcbanner 3 days ago | parent | prev [-] |
| It would be possible to employ an expert doctor, instead of writing a script. |
| |
| ▲ | ben_w 3 days ago | parent [-] | | Which is cheaper: 1. having a human expert creating every answer or 2. having an expert check 10 answers each of which have a 90% chance of being right and then manually redoing the one which was wrong Now add a complications that: • option 1 also isn't 100% correct • nobody knows which things in option 2 are correlated or not and if those are or aren't correlated with human errors so we might be systematically unable to even recognise the errors • even if we could, humans not only get lazy without practice but also get bored if the work is too easy, so a short-term study in efficiency changes doesn't tell you things like "after 2 years you get mass resignations by the competent doctors, while the incompetent just say 'LGTM' to all the AI answers" |
|