Remix.run Logo
pjsousa79 5 hours ago

One thing that seems to be missing in most discussions about "context" is infrastructure.

The dream system for AI agents is probably something like a curated data hub: a place where datasets are continuously ingested, cleaned, structured and documented, so agents can query it to obtain reliable context.

Right now most agents spend a lot of effort stitching context together from random APIs, web scraping, PDFs, etc. The result is brittle and inconsistent.

If models become interchangeable, the real leverage might come from shared context layers that many agents can query.

sorobahn 5 hours ago | parent | next [-]

Am working on making this layer currently. It’s a more interesting problem even when you remove AI agents from the picture, I feel a context layer can be equally as useful for humans and deterministic programs. I view it as a data structure sitting on top of your entire domain and this data structure’s query interface plus some basic tools should be enough to bootstrap non trivial agents imo. I think the data structure that is best suited for this problem is a graph and the different types of data represented as graphs.

Stitching api calls is analogous to representing relationships between entities and that’s ultimately why I think graph databases have a chance in this space. As any domain grows, the relationships usually grow at a higher rate than the nodes so you want a query language that is optimal for traveling relationships between things. This is where a pattern matching approach provided by ISO GQL inspired by Cypher is more token efficient compared to SQL. The problem is that our foundation models have seen way way way more SQL so there is a training gap, but I would bet if the training data was equally abundant we’d see better performance on Cypher vs SQL.

I know there is GraphRAG and hybrid approaches involving vector embeddings and graph embeddings, but maybe we also need to reduce API calls down to semantic graph queries on their respective domains so we just have one giant graph we can scavenge for context.

dworks 2 hours ago | parent | prev [-]

Data should not be ingested. Data should originate from the same environment that you want to activate it in. That means you need build a system from the ground up for your searches, your document creation etc, so that this data is native to your system and then easily referenced in your commands to the llm interface.

The best example of this is probably CrewAI and Alibaba CoPaw. CoPaw has a demo up.