Remix.run Logo
hnlmorg 2 days ago

People already knew the value of data long before LLMs were popularised and web scraping has been a thing since the very beginnings of the web.

Why you’re describing isn’t a recent phenomenon. Not even remotely.

Facebook has never allowed people read only views to their platform. And Expert Stack Overflow like Quora used the same dark patterns you described too.

hnlmorg 2 days ago | parent [-]

Getting down voted for stating a fact. Just goes to show how short some people’s memories are.

Timon3 a day ago | parent [-]

You're getting downvoted for stating falsehoods.

> Why you’re describing isn’t a recent phenomenon. Not even remotely.

The big platforms were accessible without login a few years ago, now they're not. That is literally a recent phenomenon.

> Facebook has never allowed people read only views to their platform.

In the past, I've often looked at Facebook posts without logging in.

hnlmorg 17 hours ago | parent [-]

> You're getting downvoted for stating falsehoods.

I'm getting downvoted because people are either to young to remember the web in the 00s, or just misremembering what the web was like.

> The big platforms were accessible without login a few years ago, now they're not. That is literally a recent phenomenon.

I gave examples of big platforms that weren't accessible without a login. And modern platforms were also heading this way long before LLMs existed.

Redit and Twitter didn't restrict their API use because of LLMs. Meta haven't locked down Instagram because of LLMs. they do it because they need people locked into their ecosystem. LLMs are just the latest way to scrape data, but the practice isn't new. Search engines did it before. And before then, it was just people leeching off other people's work. This is a tale as old as the web. And I remember it well, having been both a web developer and user of the web since 1994.

Lets also not forget all the attempts that Microsoft took to try and control the internet and how AOL had their own walled gardens too. Yahoo had a plethora of cool features, most of which weren't available without a Yahoo account. And so on and so forth.

Walled gardens are not a recent phenomenon.

> In the past, I've often looked at Facebook posts without logging in.

You're misremembering. Literally the only reason I have a Facebook account because I needed to check someone's profile and couldn't without signing up. This was back in the early to mid 00s (I can't recall exactly when, but it was long before Facebook was a household name. Back when MySpace was still cool and before Twitter was launched)

For example this archived page from Facebook. Notice how there's no way to advance without signing up? https://web.archive.org/web/20070630190243/https://register....

---

I know people want to blame AI for everything that goes wrong these days be that simply isn't the reason that platforms lock down. They do it because thats how you make money. You either:

1. lock down and charge people for access

or

2. lock down and sell your user data

(or, depressingly too often, both)

Giving people free and anonymous access isn't profitable. It wasn't before and it still isn't now. AI hasn't changed that.

What AI has changed is the increase in invasive bot detection on sites that don't monetise anonymous access.

Timon3 16 hours ago | parent [-]

> Redit and Twitter didn't restrict their API use because of LLMs. Meta haven't locked down Instagram because of LLMs. they do it because they need people locked into their ecosystem.

Yet the recent wave of API & public site lockdowns were mostly kicked off when Musk took over Twitter, and he publicly stated that a big reason was using the data for AI training. Similarly, platforms like Reddit have started selling access to that data for the same purpose.

> LLMs are just the latest way to scrape data, but the practice isn't new. Search engines did it before.

LLMs aren't used to scrape data, they're trained on that scraped data. When search engines did it, it was useful for the sites, since it lead people to them. With LLMs they no longer have to visit the sites, which is why the platforms want to monetize their data directly.

> You're misremembering. Literally the only reason I have a Facebook account because I needed to check someone's profile and couldn't without signing up. This was back in the early to mid 00s (I can't recall exactly when, but it was long before Facebook was a household name. Back when MySpace was still cool and before Twitter was launched)

It's a bit ridiculous to tell me I'm misremembering when you're talking about a different feature. Yes, to look at most profile data you needed (need?) to be logged in. But you could view public posts without logging in as long as you had the link, I used to do that for various types of communities explicitly after I'd deleted my Facebook account.

> Giving people free and anonymous access isn't profitable. It wasn't before and it still isn't now. AI hasn't changed that.

Literally most of the web is open, for free and anonymously, and is profitable due to ads & selling visitor data. This is changing because 1) people are no longer visiting the pages, they're instead asking LLM clients, and 2) free and anonymous access is getting harder due to sites getting hammered by crawlers for LLM training purposes. This has been in the news a lot over the last few months.

hnlmorg 15 hours ago | parent [-]

> Yet the recent wave of API & public site lockdowns were mostly kicked off when Musk took over Twitter, and he publicly stated that a big reason was using the data for AI training. Similarly, platforms like Reddit have started selling access to that data for the same purpose.

Exactly. LLMs aren't the cause of that change.

> LLMs aren't used to scrape data, they're trained on that scraped data.

Clearly I know that. My point wasn't that LLMs are literally scraping the sites but instead making the differentiation between scraping that happened before LLMs and scraping that happened after.

> When search engines did it, it was useful for the sites, since it lead people to them. With LLMs they no longer have to visit the sites, which is why the platforms want to monetize their data directly.

Actually, that's not always true. Search engines have included snippets from sites for years and that's also been a well-discussed point of contention.

Then there's also Google's attempt to switch people to AMP to further lock people into Google's walled garden. I accept this isn't quite the same thing but it's still an example of how search engines fight to prevent people from leaving their ecosystem.

Some sites, like MSN, literally host news articles from others sites on their own site too. I'm sure Microsoft has an agreement to do this, but it's yet another example of how companies try to lock visitors into their own site.

I accept the AMP and MSN examples are tangential, but they do still illustrate the same point I'm making about how it's not a new thing for platforms to use dark patterns to keep people from navigating away from their platform. This isn't something new that's happened in the last couple of years.

> It's a bit ridiculous to tell me I'm misremembering when you're talking about a different feature

Would you rather I just said you were citing falsehoods like you accused me of?

Also I'm not talking about a different feature. I'm talking about the exact same stuff I was talking about from my original comment in this thread.

> Yes, to look at most profile data you needed (need?) to be logged in. But you could view public posts without logging in as long as you had the link, I used to do that for various types of communities explicitly after I'd deleted my Facebook account.

So you agree that platforms have locked content down and this isn't a recent phenomenon then ;)

Making the distinction between profile data and public comments is a little strained when it's clear that Facebook has invested heavily into their walled garden and the vast majority of content on Facebook has always been hidden behind that walled garden.

> Literally most of the web is open, for free and anonymously, and is profitable due to ads & selling visitor data.

Smaller sites make money from ads. But we were talking about big platforms like Facebook, Twitter and Instagram. Sites that make money from ads are just making small change compared to platforms.

> This is changing because 1) people are no longer visiting the pages, they're instead asking LLM clients, and 2) free and anonymous access is getting harder due to sites getting hammered by crawlers for LLM training purposes. This has been in the news a lot over the last few months.

This I do agree with. But that wasn't the statement that was originally made. Those sites will remain open or shutdown entirely. They're not going to go private ala Twitter and Instagram. Their business model is entirely different -- often intentionally not run as a business in the first place. Sometimes just passion projects with no ads and/or run at a loss.

The part I was disagreeing with was that the dark patterns seen in Instagram et al are a result of the rise of LLMs. That simply isn't true.

hnlmorg 7 hours ago | parent [-]

Also Facebook feeds weren’t even a feature back before Twitter was around. Zuckerberg added it to compete with Twitter. So you couldn’t even access “public feeds” in the mid-00s because no such thing existed.