▲ | XCSme 9 days ago | ||||||||||||||||||||||||||||||||||||||||
I know there was a downloadable version of Wikipedia (not that large). Maybe soon we'll have a lot of data stored locally and expose it via MCP, then the AIs can do "web search" locally. I think 99% of web searches lead to the same 100-1k websites. I assume it's only a few GBs to have a copy of those locally, thus this raises copyright concerns. | |||||||||||||||||||||||||||||||||||||||||
▲ | Aurornis 8 days ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||
The mostly static knowledge content from sites like Wikipedia is already well represented in LLMs. LLMs call out to external websites when something isn’t commonly represented in training data, like specific project documentation or news events. | |||||||||||||||||||||||||||||||||||||||||
|