Remix.run Logo
Gigachad 2 hours ago

At work the conversation is that simultaneously everyone is using LLMs now, yet we receive virtually no traffic through them. The LLMs scrape our data, provide an answer to the user, and we see nothing from it.

Barbing an hour ago | parent | next [-]

How often are they scraping?

Also generally wondering… Do labs view scraping as legally safer than trying to cache the Internet? I figure it’s easy to mark certain content as all but evergreen (can do a quick secondary check for possible new news).

Maybe caching everything is too expensive?

jrmg 2 hours ago | parent | prev [-]

I have the same worry about LLMs in general - I know that ‘model collapse’ seems to be an unfashionable idea, but when the internet’s just full of garbage (soon?…), what are we going to train these things on?