▲ | Eggpants 4 days ago | |
It’s not that the LLMs are better, it’s the internal tools/functions being called that do the actual work are better. They didn’t spend millions to retrain a model to statistically output the number of r’s in strawberry, but just offloaded that trivial question to a function call. So I would say the overall service provided is better than it was, thanks to functions being built based on user queries, but not the actual LLM models themselves. | ||
▲ | vlovich123 4 days ago | parent | next [-] | |
LLMs are definitely better quality today than 3 years ago at codegen quality - there’s quantitative benchmarks as well as for me my personal qualitative experience (given the gaming that companies engage in). It is also true that the tooling and context management has gotten more sophisticated (often using models by the way). That doesn’t negate that the models themselves have gotten better at reliable tool calling so that the LLM is driving more of the show rather than purpose built coordination into the LLM and that the codegen quality is higher than it used to be. | ||
▲ | int_19h 3 days ago | parent | prev [-] | |
This is a good example of making statements that are clearly not based in fact. Anyone who works with those models knows full well what a massive gap there is between e.g. GPT 3.5 and Opus 4.1 that has nothing to do with the ability to use tools. |