I know there was a downloadable version of Wikipedia (not that large). Maybe soon we'll have a lot of data stored locally and expose it via MCP, then the AIs can do "web search" locally.

I think 99% of web searches lead to the same 100-1k websites. I assume it's only a few GBs to have a copy of those locally, thus this raises copyright concerns.

▲

Aurornis 8 days ago | parent [-]

The mostly static knowledge content from sites like Wikipedia is already well represented in LLMs.

LLMs call out to external websites when something isn’t commonly represented in training data, like specific project documentation or news events.

▲

XCSme 8 days ago | parent [-]

That's true, but the data is only approximately represented in the weights.

Maybe it's better to have the AI only "reason", and somehow instantly access precise data.

▲

stirfish 5 days ago | parent | next [-]

Is this Retrieval Augmented Generation, or something different?

	▲	XCSme 5 days ago \| parent [-]
		Yes, RAG, but have the model specifically optimzied for RAG.

▲

adsharma 8 days ago | parent | prev [-]

What use cases will gain from this architecture?

	▲	XCSme 8 days ago \| parent [-]
		Data processing, tool calling, agentic use. Those are also the main use-cases outside "chatting".