Remix.run Logo
manmal 2 days ago

ChatGPT lists clickable sources in a lot of nontrivial queries. Those sites don’t even need to pay OpenAI for the traffic (yet). If you ask „what’s happening in the world today“, you might get 20 links. How is this worse, exactly?

croes 2 days ago | parent [-]

How many people click the links? What happens to LLMs if people don’t provide training data anymore because nobody visits their sites?

esnard 2 days ago | parent | next [-]

Cloudflare publishes a "crawl-to-refer" ratio, which can be used to estimate the traffic from LLMs:

https://radar.cloudflare.com/ai-insights#crawl-to-refer-rati...

robryan 2 days ago | parent | prev [-]

They will either pay for it to be generated or get good enough at producing synthetic data that actually improves LLM quality.

croes 2 days ago | parent [-]

So either even higher costs and hope that a bug problem of LLMs get solved somehow.

Given how much data they need that will be pretty expensive, I mean really really expensive. How many people can write good training data and how much per day?

Doesn’t sound sustainable.