Remix.run Logo
lukev 10 hours ago

Or else it trained/overfit to the benchmarks. We won't really know until people have a chance to use it for real-world tasks.

Also, models are already pretty good but product/market fit (in terms of demonstrated economic value delivered) remains elusive outside of a couple domains. Does a model that's (say) 30% better reach an inflection point that changes that narrative, or is a more qualitative change required?