I don’t see definitive evidence that there is some kind of Moore’s law for model improvement though. Just because this year’s model performs better than last year’s model doesn’t mean next year’s model will be another leap. Most of the big improvements this year seem to be around tooling - I still see Opus 4.6 (which is my daily driver at work) making lots of mistakes.

▲

nl 15 hours ago | parent [-]

Things like the METR benchmark aren't sufficient?

I mean Moore's law is just a rule of thumb but the curve fits METR just as well..

	▲	gzread 10 hours ago \| parent [-]
		Was that the benchmark that showed developers think they're 20% faster with AI, but are 20% slower?