Remix.run Logo
tripzilch 3 days ago

> I don't see the plateauing in capabilities. LLMs are plateauing only in benchmarks

Don't you mean the opposite? Like, it beat an IMO, which is a benchmark, but it's nowhere remotely close to having any of even the basic mathematical capabilities someone who beat an IMO can be expected to have.

Like being unable to deal with negations ... or not getting confused by a question being stated in something other than their native alphabet ...