Remix.run Logo
da25 2 days ago

AI systems still struggle with hallucination, especially when your intent of the query is to obtain the latest information. These models have become very good at handling stored information, but their ability to stay updated remains slow. There is a large volume of queries whose responses and underlying concepts don't change much, and the current AI systems, through innovation and user-friendly interfaces, has managed to cast a wide net over such pre-learned responses.

However, unless these AI base model businesses create strong enough incentives for information owners to provide updates and engage in information pipelines that provide fresh data, the current moat of AI might not extend far enough to cover the full spectrum of queries.

The “fossil fuel” of LLMs-static public internet data-is running out.

Current efforts in RL help systems answer queries beyond their pre-learned knowledge by expanding on the user’s prompt, wherein the system ventures into unknown data territories through agents and self-talk, resulting in action-result memories. These, in turn, serve as a large enough context-rich prompt to have all the needles-in-hay-stack that form the final answer or response. This is made possible by large context windows.

For live internet queries, RL can work by expanding context with latest results fetched from the public web using a crawler. However, this is often done without the explicit consent from information providers and offers them little incentive beyond a link in the AI’s response summary. As a result, some providers have started withholding data, and many services now offer tools to block AI crawlers. Meanwhile, multimodal AI systems-capable of understanding text, visuals, and audio-are developing agents that can access content through simulated browser use, effectively bypassing traditional crawler firewalls.

This reality highlights the need for a good incentive system for information providers, one that encourages them to share dense, efficiently and ai-structured data. Certain domains have already begun embracing this and sharing their information in ai-native formats, since they have no moat in that information and rather see positive incentives - for example, certain documentation websites for tools and frameworks now provide formatted versions of their docs at /LLMs.txt links.

If the information is the resource exchanged on these internet pathways, businesses fundamentally operate either by generating this resource or by retrieving it once it exists, and the other businesses enable this whole endeavour. Ultimately, individuals and organizations will, seek, share and exchange information in ways that enables them to efficiently take decisions and their next actions. Therefore, the incentive to access the most up-to-date information becomes critical when those actions depend on accuracy and timeliness.