Remix.run Logo
energy123 4 days ago

Basic prose is a saturated bench. You can't go above 100% so by definition progress will stall on such benchmarks.

mattw1810 4 days ago | parent [-]

All the same they choose to highlight basic prose (and internal knowledge, for that matter) in their marketing material.

They’ve achieved a lot to make recent models more reliable as a building block & more capable of things like math, but for LLMs, saturating prose is to a degree equivalent to saturating usefulness.