Remix.run Logo
m_ke 3 days ago

I used to work on video generation models and was shocked at how hard it was to find any videos online that were not hosted on YouTube, and YouTube has made it impossibly hard to download more than a few videos at a time.

raincole 3 days ago | parent | next [-]

> YouTube has made it impossibly hard to download more than a few videos at a time

I wonder why. Perhaps because people use bots to mass-crawl contents from youtube to train their AI. And Youtube prioritizes normal users who only watch a few videos at most at the same time, over those crawling bots.

Who knows?

m_ke 3 days ago | parent [-]

I wonder how Google built their empire. Who knows? I’m sure they didn’t scrape every page and piece of media on the internet and train models on it.

My point was that the large players have monopoly hold on large swaths of the internet and are using it to further advantage themselves over the competition. See Veo 3 as an example, YouTube creators didn’t upload their work to help Google train a model to compete with them but Google did it anyways, and creators didn’t have a choice because all eye balls are on YouTube.

raincole 3 days ago | parent [-]

> how Google built their empire. Who knows

By scraping every page and directing the traffic back to the site owners. That was how Google built their empire.

Are they abusing the empire's power now? In multiple ways, such as the AI overview stuff. But don't pretend that crawling Youtube and training video generation models is the same as what Google (once) brought to the internet. And it's ridiculous to expect Youtube to make it easy for crawlers.

fibers 3 days ago | parent | prev | next [-]

you have to feed it multiple arguments with rate limiting and long wait times. i am not sure if there have been recent updates other than the js interpreter but ive had to spin up a docker instance of a browser to feed it session cookies as well.

m_ke 3 days ago | parent [-]

Yeah we had to roll through a bunch of proxy servers on top of all the other tricks you mentioned to reliably download at a decent pace

trenchpilgrim 3 days ago | parent [-]

What are your thoughts on the load scrapers are putting on website operators?

immibis 2 days ago | parent [-]

What are your thoughts on the load website operators are putting on themselves to block scrapers?

aihell 3 days ago | parent | prev [-]

[flagged]

CaptainOfCoit 3 days ago | parent [-]

Unusually well-argued post, hard to disagree with...

What exactly is the problem? That they worked on video generation models? That they only used YouTube? That they downloaded videos from YouTube? That they downloaded multiple videos from YouTube?