This has been my dream ever since. Instead of encoding "all the knowledge" into those parameters, how about just making a model that has the same size, but all (or rather most) it does is reasoning? Just give it the ability to browse the net (e.g. language specifications, documentation and best practices) and just have it do its thing. Why does my coding agent need to know the population of New York, know a cheese cake recipe or the general lifespan of an ostrich? Just give it the bare minimum knowledge to think and reason about, and let it figure out the rest.

Sadly that's not how LLMs work, since all they do is "token prediction". At least the models we have to today ...

▲

dandaka 3 hours ago | parent | next [-]

I think this is a well known concept, which we can't deliver yet. LLM/transformer give us reasoning engine as a byproduct of its design, but it is quite ineffective. If we can distill reasoning, if reasoning can be achieved without general knowledge, it will be a very effective machine.

Some amount of knowledge is required for reasoning. Maybe such model can dynamically knowledge domains to have taxonomy. For example, model can't effective reason about development task, if it has no knowledge about development best practices. But population of New York or recipies can definitely be loaded run time with tools.

	▲	XCSme 10 minutes ago \| parent \| next [-]
		Yup, you still need knowledge. Even if you have access to all the data and tools, you still need to know what to search for, what tools to use and to understand what the user is asking. Our computers can already do everything, have access to all the tools and information, yet they still need a human/intelligence to use it and apply to specific problems. Even defining the problem requires knowledge. As for the tools, if the model has access to 1000 tools, how would it know which one to use if it doesn't have any knowledge itself? What if I ask for "table tennis spin" it had a "magnus effect calculator", how would it know to make the connection between the two?
	▲	sigmoid10 an hour ago \| parent \| prev [-]
		>Some amount of knowledge is required for reasoning. This is the root of problem. If you think about STEM universities, they don't really teach you things you need in the real world. They teach you what you need to know in order to go out there and accumulate the necessary information which can then be used to solve problems. Giving a person access to the internet or a super powerful calculator (like Mathematica) won't mean that they can do anything useful. They need tons of experience to use these tools in an effective way. That experience is basically all that implicit adjacent knowledge that we pick up along the way getting our degrees. And LLMs pick that up during pre-training. Drop this part and the outcome will be worthless.

▲

athrowaway3z 5 hours ago | parent | prev | next [-]

This is me vibe-splaining something I don't know a lot about, but I doubt there is such a thing.

If "all the knowledge" is what our models now do, what exactly would be the most extreme "none of the knowledge +search" ?

> language specifications.

It would load in all the knowledge to figure it what "language" means, then it would continue trying to decode what "specifications" means.

That might sound absurd, but to figure out the population of New York It's either: Just going to google it, or derive from primary sources.

But how is it ever going to interpret the primary sources? It needs to understand the question, how complex a question is, and how complete an answer is and how things relate. Thats just _too_ much language.

There might be a way to compact this down into a LLM-native language such that the request of `the population of New York` or `use best practices` is encoded without our messy human language for a reasoning model to work with, but the encoding itself has to be done by the "all the knowledge" llm. Now it seems we just rebuild something related to MoE with extra step afaict.

▲

3eb7988a1663 7 hours ago | parent | prev | next [-]

It would also reduce training costs to nothing. Current methodology requires continual retraining to scoop up new facts. If you can do a one time "this is how to think" - that could conceptually work forever, just plug in a new database layer that can be queried as required.

	▲	fjsoxjdnwk an hour ago \| parent [-]
		But isn’t that what “training” is anyway? They train LLM today like that and the database becomes the parameters. You can post train on smaller corpus for purpose-built stuff.

▲

tomaskafka 5 hours ago | parent | prev | next [-]

Education had this sad 15 year period where it thought “competences” are all you need.

Turns out that without the world knowledge to have a base of facts, it is not.

	▲	gmac 2 hours ago \| parent [-]
		Basically: you can't teach people to think without giving them some facts and ideas to think with. It's like trying to teach woodworking without giving the students any wood.

▲

dminik 4 hours ago | parent | prev | next [-]

I mean, this really doesn't sound useful even if LLMs worked that way.

First, if you know nothing you don't even know what you're missing or what to search for.

Then, without unlimited context, you have to do research for every task all over again every time.

▲

regularfry 4 hours ago | parent | next [-]

> First, if you know nothing you don't even know what you're missing or what to search for.

RAG on the initial prompt would be the first thing to try.

> Then, without unlimited context, you have to do research for every task all over again every time.

Thing is, we're really really good at building very fast search engines. Doing research all over again every time shouldn't be a problem.

▲

vitro 3 hours ago | parent [-]

Couldn't you build some internal knowledge that would stay and you could teach a model this way. A very fast local memory of some sort. You could also specialize model this way so it is very skilled in your domain. The more you use it, the smarter it gets. I guess the problem is for the model to decide whether the information stored in memory is sufficient or not.

	▲	regularfry 2 hours ago \| parent [-]
		You could, but it's driving in the wrong direction to try to build that knowledge into the model weights because you'll always run into a capacity limit sooner with a small model than with a larger one. The thing the model is specialised for is linguistic understanding and the reasoning process itself, and you max that out at the expense of domain-specific knowledge. If you take "as few weights as possible" as a given, I think the interesting question is how small you can make the model with externalised memory. The openclaw and hermes people are all over this sort of memory problem: using the local filesystem or a local database of some sort is exactly a "very fast local memory" where the more you use it, the more knowledge it gathers. Whether that translates to it being "smarter" is a deeper question than it looks.

▲

scotty79 3 hours ago | parent | prev [-]

The model they built knows a fair bit apparently. You can't get 94.3 on AIME26 knowing nothing.

▲

LoganDark 2 hours ago | parent | prev [-]

Reasoning alone can’t always predict all the bits of knowledge you’d need to sufficiently solve a problem, that you would research when planning.