Progress is hard to keep track of in this fast paced environment, but aren't there already models that can add external tools and simply offload parts of he reasoning there? Maybe over MCP or some other mechanism, so it can offload e.g. calculations, or test code in a sandbox, or even write code to answer part of a question, execute the code somewhere, and take the results into the rest of the inference process as context?

Or is there a more subtle issue which prevents or makes this hard?

Is there something fundamentally impossible about having a model detecting the amount of Rs in 'strawberry' to be a string search operation and in some sandbox execute something like:

% echo "strawberry" | tr -dc "r" | wc -c

It seems agents do this already, but regular GPT style environments seem to lack it?

▲ yunohn 4 days ago | parent [-]

My observation of AI progress over the past 2yrs has shown that LLM companies are focusing purely on raw model knowledge instead of optimised usable tooling. Unsure when this will ever change, but that’s why your example is not the industry’s standard yet.

	▲	mvdwoord 4 days ago \| parent [-]
		My intuition, which is of course woefully inadequate in this area, says there is a ton of accuracy to be gained, and I feel also a lot of offloading and therefore pruning or better use for the rest of the parameters... Anyway,. let me refresh my page, as I am sure while typing this some new model architecture is dropping. ;)