Remix.run Logo
lemonish97 6 hours ago

What is your evidence for this claim?

fooker 6 hours ago | parent [-]

They say hill climbing

https://microsoft.ai/news/building-a-hillclimbing-machine-la...

Unless they specifically clarify that the testing and training benchmarks are completely separate, we have to assume they test on the same 'hill' the model climbs.

artemisart 5 hours ago | parent | next [-]

Hill climbing doesn't mean much but absolutely doesn't imply they cheat on benchmarks. They have more details here https://microsoft.ai/news/introducing-mai-thinking-1/ it seems to be "RL on everything".

5 hours ago | parent [-]
[deleted]
jongalloway2 6 hours ago | parent | prev | next [-]

[dead]

ajyoon 6 hours ago | parent | prev [-]

[flagged]