| ▲ | Timon3 16 hours ago | |||||||
> Redit and Twitter didn't restrict their API use because of LLMs. Meta haven't locked down Instagram because of LLMs. they do it because they need people locked into their ecosystem. Yet the recent wave of API & public site lockdowns were mostly kicked off when Musk took over Twitter, and he publicly stated that a big reason was using the data for AI training. Similarly, platforms like Reddit have started selling access to that data for the same purpose. > LLMs are just the latest way to scrape data, but the practice isn't new. Search engines did it before. LLMs aren't used to scrape data, they're trained on that scraped data. When search engines did it, it was useful for the sites, since it lead people to them. With LLMs they no longer have to visit the sites, which is why the platforms want to monetize their data directly. > You're misremembering. Literally the only reason I have a Facebook account because I needed to check someone's profile and couldn't without signing up. This was back in the early to mid 00s (I can't recall exactly when, but it was long before Facebook was a household name. Back when MySpace was still cool and before Twitter was launched) It's a bit ridiculous to tell me I'm misremembering when you're talking about a different feature. Yes, to look at most profile data you needed (need?) to be logged in. But you could view public posts without logging in as long as you had the link, I used to do that for various types of communities explicitly after I'd deleted my Facebook account. > Giving people free and anonymous access isn't profitable. It wasn't before and it still isn't now. AI hasn't changed that. Literally most of the web is open, for free and anonymously, and is profitable due to ads & selling visitor data. This is changing because 1) people are no longer visiting the pages, they're instead asking LLM clients, and 2) free and anonymous access is getting harder due to sites getting hammered by crawlers for LLM training purposes. This has been in the news a lot over the last few months. | ||||||||
| ▲ | hnlmorg 15 hours ago | parent [-] | |||||||
> Yet the recent wave of API & public site lockdowns were mostly kicked off when Musk took over Twitter, and he publicly stated that a big reason was using the data for AI training. Similarly, platforms like Reddit have started selling access to that data for the same purpose. Exactly. LLMs aren't the cause of that change. > LLMs aren't used to scrape data, they're trained on that scraped data. Clearly I know that. My point wasn't that LLMs are literally scraping the sites but instead making the differentiation between scraping that happened before LLMs and scraping that happened after. > When search engines did it, it was useful for the sites, since it lead people to them. With LLMs they no longer have to visit the sites, which is why the platforms want to monetize their data directly. Actually, that's not always true. Search engines have included snippets from sites for years and that's also been a well-discussed point of contention. Then there's also Google's attempt to switch people to AMP to further lock people into Google's walled garden. I accept this isn't quite the same thing but it's still an example of how search engines fight to prevent people from leaving their ecosystem. Some sites, like MSN, literally host news articles from others sites on their own site too. I'm sure Microsoft has an agreement to do this, but it's yet another example of how companies try to lock visitors into their own site. I accept the AMP and MSN examples are tangential, but they do still illustrate the same point I'm making about how it's not a new thing for platforms to use dark patterns to keep people from navigating away from their platform. This isn't something new that's happened in the last couple of years. > It's a bit ridiculous to tell me I'm misremembering when you're talking about a different feature Would you rather I just said you were citing falsehoods like you accused me of? Also I'm not talking about a different feature. I'm talking about the exact same stuff I was talking about from my original comment in this thread. > Yes, to look at most profile data you needed (need?) to be logged in. But you could view public posts without logging in as long as you had the link, I used to do that for various types of communities explicitly after I'd deleted my Facebook account. So you agree that platforms have locked content down and this isn't a recent phenomenon then ;) Making the distinction between profile data and public comments is a little strained when it's clear that Facebook has invested heavily into their walled garden and the vast majority of content on Facebook has always been hidden behind that walled garden. > Literally most of the web is open, for free and anonymously, and is profitable due to ads & selling visitor data. Smaller sites make money from ads. But we were talking about big platforms like Facebook, Twitter and Instagram. Sites that make money from ads are just making small change compared to platforms. > This is changing because 1) people are no longer visiting the pages, they're instead asking LLM clients, and 2) free and anonymous access is getting harder due to sites getting hammered by crawlers for LLM training purposes. This has been in the news a lot over the last few months. This I do agree with. But that wasn't the statement that was originally made. Those sites will remain open or shutdown entirely. They're not going to go private ala Twitter and Instagram. Their business model is entirely different -- often intentionally not run as a business in the first place. Sometimes just passion projects with no ads and/or run at a loss. The part I was disagreeing with was that the dark patterns seen in Instagram et al are a result of the rise of LLMs. That simply isn't true. | ||||||||
| ||||||||