nateb2022 10 hours ago

I have yet to be convinced the broader population has an appetite for AI produced cinematography or videos. Independence from Nvidia is no more of a liability than dependence on electricity rates; it's not as if it's in Nvidia's interest to see one of its large customers fail. And pretty much any of the other Mag7 companies are capable of developing in-house TPUs + are already independently profitable, so Google isn't alone here.

▲

ralph84 9 hours ago | parent | next [-]

The value of YouTube for AI isn't making AI videos, it's that it's an incredibly rich source for humanity's current knowledge in one place. All of the tutorials, lectures, news reports, etc. are great for training models.

▲

Nextgrid 9 hours ago | parent [-]

Is that actually a moat? Seems like all model providers managed to scrape the entire textual internet just fine. If video is the next big thing I don’t see why they won’t scrape that too.

▲

jmb99 5 hours ago | parent | next [-]

Scraping text across the entire internet is orders of magnitudes easier than scraping YouTube. Even ignoring the sheer volume of data (exabytes), you simply will get blocked at an IP and account level before you make a reasonable dent. Even if you controlled the entire IPv4 space I’m not sure you could scrape all of YouTube without getting every single address banned. IPv6 makes address bans harder, true, but then you’re still left with the problem of actually transferring and then storing that much data.

	▲	earthnail 3 hours ago \| parent \| next [-]
		For now, you actually get pretty far with Tor. Just reset your connection when you hit an IP ban by sending SIGHUP to the Tor daemon. I did that when I was retraining Stable Audio for fun and it really turned out to be trivial enough to pull of as a little evening side project.
	▲	tucnak an hour ago \| parent \| prev [-]
		IPv6 doesn't make it "harder," as they would typically ban whole /48 prefixes.

▲

monocasa 9 hours ago | parent | prev | next [-]

And we're probably already starting to see that, given the semirecent escalations in game of cat and also cat of youtube and the likes of youtube-dl.

Reminds me of Reddit's cracking down on API access after realizing that their data was useful. But I'd expect both youtube to be quicker on the gun knowing about AI data collection, and have more time because of the orders of magnitude greater bandwidth required to scrape video.

▲

8 hours ago | parent | next [-]

[deleted]

▲

jakeydus 7 hours ago | parent | prev [-]

And reddit turned around and sold it all for a mess of pottage…

	▲	satvikpendem 4 hours ago \| parent [-]
		Sold being the operative word, rather than giving it away for free.

▲

awesome_dude 8 hours ago | parent | prev [-]

> Seems like all model providers managed to scrape the entire textual internet just fine

Google, though, has been doing it for literal decades. That could mean that they have something nobody else (except archive.org) has - a history on how the internet/knowledge has evolved.

▲

fooblaster 10 hours ago | parent | prev | next [-]

If you think they are going to catch up with Google's software and hardware ecosystem on their first chip, you may be underestimating how hard this is. Google is on TPU v7. meta has already tried with MTIA v1 and v2. those haven't been deployed at scale for inference.

▲

nateb2022 9 hours ago | parent [-]

I don't think many of them will want to, though. I think as long as Nvidia/AMD/other hardware providers offer inference hardware at prices decent enough to not justify building a chip in-house, most companies won't. Some of them will probably experiment, although that will look more like a small team of researchers + a moderate budget rather than a burn-the-ships we're going to use only our own hardware approach.

▲

fooblaster 9 hours ago | parent [-]

Well, anthropic just purchased a million TPUs from Google because even with a healthy margin from Google, it's far more cost effective because of Nvidia's insane markup. That speaks for itself. Nvidia will not drop their margin because it will tank their stock price. it's half of the reason for all this circular financing - lowering their effective margin without lowering it on paper.

	▲	fragmede 7 hours ago \| parent [-]
		And, don't forget everyone's buying from TSMC in every case!

▲

margalabargala 9 hours ago | parent | prev | next [-]

It's in Nvidia's interest to charge the absolute maximum they can without their customers failing. Every dollar of Nvidia's margin is your own lost margin. Utilities don't do that. Nvidia is objectively a way bigger liability than electricity rates.

▲

bdangubic 9 hours ago | parent [-]

it is in every business’s best interest to charge the maximum…

▲

wrs 7 hours ago | parent [-]

Utilities and insurance companies are two examples of business regulated to not charge the maximum, for public policy reasons.

▲

bdangubic 7 hours ago | parent [-]

we suggesting that nvidia/google/.. be regulated for like utilities?

▲

margalabargala 6 hours ago | parent [-]

[flagged]

	▲	AnonHP 3 hours ago \| parent [-]
		Not GP and haven’t participated in this thread. I’m clueless on what the point in your earlier comment is. Can you elaborate, please?

▲

Ekaros 6 hours ago | parent | prev | next [-]

I think it will be accepted by broader population. But if generation is easy and cheap I wonder if there is demand. And I mean as total demand in the segment. Will there be enough impressions to go around to actually profit from the content. Especially if storage is also considered.

▲

Seattle3503 9 hours ago | parent | prev | next [-]

The video data is probably good for training models, including text models.

▲

why-o-why 6 hours ago | parent | prev [-]

Given the fact that Apple and Coke but rushed to produce AI slop, and the agreements with Disney, we are going to see a metric fuck-ton of AI-generated cinema in the next decade. The broader population's tastes are absolute harbage when it comes to cinema, so I don't see why you need convincing. 40+ superhero films should be enough.