| ▲ | simianwords 2 days ago | ||||||||||||||||
“ When researchers tested the same performance on a new set of benchmark questions, they noticed that models experienced “significant performance drops.”” This is very misleading because the generalisation ability of LLMs is very very high. It doesn’t just memorise problems - that’s nonsense. At high school level maths you genuinely can’t get gpt-5 thinking to make a single mistake. Not possible at all. Unless you give some convoluted ambiguous prompt that no human can understand. If you assume I’m correct, how does gpt memorise then? In fact even undergraduate level mathematics is quite simple for gpt-5 thinking. IMO gold was won.. by what? Memorising solutions? I challenge people to find ONE example that gpt-5 thinking gets wrong in high school or undergrad level maths. I could not achieve it. You must allow all tools though. | |||||||||||||||||
| ▲ | YeGoblynQueenne a day ago | parent | next [-] | ||||||||||||||||
The best performance on GSM8K is currently at 0.973, so less than perfect [1]. Given that GSM8K is a grade school math question data set, and the leading LLMs still don't get all answers correctly it's safe to assume that they won't get all high school questions' answers correctly either, since those are going to be harder than grade school questions. This means there has got to be at least one example that GPT-5 as well as every other LLM fails on [2]. If you don't think that's the case I think it's up to you to show that it's not. ___________________ [1] GSM8K leaderboard: https://llm-stats.com/benchmarks/gsm8k [2] This is regardless of what GSM8K or any other benchmark is measuring. | |||||||||||||||||
| |||||||||||||||||
| ▲ | geoduck14 a day ago | parent | prev | next [-] | ||||||||||||||||
>At high school level maths you genuinely can’t get gpt-5 thinking to make a single mistake. Not possible at all. If you give an LLM an incomplete question, it will guess at an answer. They don't know what they don't know, and they are trained to guess | |||||||||||||||||
| |||||||||||||||||
| ▲ | autop0ietic a day ago | parent | prev | next [-] | ||||||||||||||||
I would think GPT5 is great at high school level math but what high school level math problems are not in the training data? I think the problem is that GPT5 is not "memorising" but conversely that doesn't automatically mean it is "reasoning". These are human attributes that we are trying to equate to machines and it just causes confusion. | |||||||||||||||||
| |||||||||||||||||
| ▲ | callmesnek a day ago | parent | prev [-] | ||||||||||||||||
"You must allow all tools though" | |||||||||||||||||
| |||||||||||||||||