| ▲ | NitpickLawyer 7 hours ago | ||||||||||||||||
> gpt-oss that games the benchmarks just for PR. gpt-oss is killing the ongoing AIME3 competition on kaggle. They're using a hidden, new set of problems, IMO level, handcrafted to be "AI hardened". And gpt-oss submissions are at ~33/50 right now, two weeks into the competition. The benchmarks (at least for math) were not gamed at all. They are really good at math. | |||||||||||||||||
| ▲ | lostmsu 6 hours ago | parent [-] | ||||||||||||||||
Are they ahead of all other recent open models? Is there a leaderboard? | |||||||||||||||||
| |||||||||||||||||