Remix.run Logo
cs702 3 days ago

The paper's title is too clickbaitish for my taste, but its subject is important:

How should we rethink query interfaces, query processing techniques, long-term data stores, and short-term data stores to be able to handle the greater volume of agentic queries we will likely see, whether we want it or not, in coming years, if people and organizations continue to adopt AI systems for more and more tasks.

The authors study the characteristics of agentic queries they identify (scale, heterogeneity, redundancy, and steerability) and outline several new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores.

andai 3 days ago | parent | next [-]

The issue we have is that websites (including small websites) are getting hammered by bots. Apparently ChatGPT makes 2000 http requests per web search.

I think the real problem here is answering the question. But there's no way to intelligently get information out of the internet. (I assume Google is building one, but it apparently hasn't yet, and if they did, it's not what OpenAI would use.)

Hammering every WP site with infinite queries every time someone asks a question seems like the wrong solution to the problem. I'm not sure what the right solution looks like.

I got an 80% solution in like ten lines of python by doing "just Google it then look at the top 10 search results" (i.e. dump them into GPT). That works surprisingly well, although the top n results are increasingly AI generated.

I had a funny experience when Bard first came out (the original name for Gemini). I asked it a question, it gave me the precise opposite of the truth (the truth but negated). It even cited sources. The sources were both AI blogspam. That still makes me laugh.

yunohn 3 days ago | parent [-]

> Apparently ChatGPT makes 2000 http requests per web search.

Can you source that claim? It sounds absolutely ridiculous and costly/wasteful. It would be nigh impossible to ingest 1000s of webpages into a single chat.

andai 3 days ago | parent [-]

It turned out I remembered the number incorrectly. It was actually 5000 http requests!

https://news.ycombinator.com/item?id=42726827

However, upon further investigation, this is a special case triggered by a security researcher, and not the normal mode of operation.

yunohn 2 days ago | parent [-]

If one reads the security advisory - the security researcher’s claim is that a particular API endpoint would accept URLs without deduping, so they were able to send 5000 URLs to it - nothing more sophisticated.

croes 3 days ago | parent | prev [-]

Isn’t it bad to tailor the data for specific type of AI?

That could hinder other and maybe better approaches.

cs702 3 days ago | parent | next [-]

That's why my comment was conditional (emphasizing the "if" here, for clarity): "... if people and organizations continue to adopt AI systems for more and more tasks".

If people and organizations don't do that, the research evidently becomes pointless.

lyu07282 3 days ago | parent | prev [-]

It sounded to me that's not what they are doing, it's more about making your existing data accessible via *hand-waving* "agentic" architectures (= an unimaginable inefficient burning of tokens/s) it's all nonsense if you asked me