Where do they get the video training data?

From the paper:

> Datasets. We construct a diverse and high-quality collection of video datasets to train STARFlow-V. Specifically, we leverage the high-quality subset of Panda (Chen et al., 2024b) mixed with an in-house stock video dataset, with a total number of 70M text-video pairs.

▲

justinclift 5 hours ago | parent [-]

> in-house stock video dataset

Wonder if "iCloud backups" would be counted as "stock video" there? ;)

▲

anon7000 5 hours ago | parent | next [-]

I have to delete as many videos as humanly possible before backing up to avoid blowing through my iCloud storage quota so I guess I’m safe

▲

fragmede 5 hours ago | parent | prev [-]

Turn on advanced data protection so they don't train on yours.

	▲	givinguflac 2 hours ago \| parent [-]
		That has nothing to do with it, and Apple wouldn’t train on user content, they’re not Google. If they ever did there would be opt in at best. There’s a reason they’re walking and observing, not running and trying to be the forefront cloud AI leader, like some others.