How is that different than the models today are actually usable for non trivial things and more capable than yesterdays and it’s also true that tomorrow’s models will also probably be more capable than today’s?

For example, I dismissed AI three years ago because it couldn’t do anything I needed it to. Today I use it for certain things and it’s not quite capable of other things. Tomorrow it might be capable of a lot more.

Yes, priors have to be updated when the ground truth changes and the capabilities of AI change rapidly. This is how chess engines on supercomputers were competitive in the 90s then hybrid systems became the leading edge competitive and then machines took over for good and never looked back.

▲

Eggpants 4 days ago | parent [-]

It’s not that the LLMs are better, it’s the internal tools/functions being called that do the actual work are better. They didn’t spend millions to retrain a model to statistically output the number of r’s in strawberry, but just offloaded that trivial question to a function call.

So I would say the overall service provided is better than it was, thanks to functions being built based on user queries, but not the actual LLM models themselves.

	▲	vlovich123 4 days ago \| parent \| next [-]
		LLMs are definitely better quality today than 3 years ago at codegen quality - there’s quantitative benchmarks as well as for me my personal qualitative experience (given the gaming that companies engage in). It is also true that the tooling and context management has gotten more sophisticated (often using models by the way). That doesn’t negate that the models themselves have gotten better at reliable tool calling so that the LLM is driving more of the show rather than purpose built coordination into the LLM and that the codegen quality is higher than it used to be.
	▲	int_19h 3 days ago \| parent \| prev [-]
		This is a good example of making statements that are clearly not based in fact. Anyone who works with those models knows full well what a massive gap there is between e.g. GPT 3.5 and Opus 4.1 that has nothing to do with the ability to use tools.