Remix.run Logo
supermatt 9 hours ago

This entire article reads like some hand wavey nonsense, throwing pretty much every cutting edge AI buzzword around to solve a problem that doesnt exist.

All the top models are moving towards synthetic data - not because they want more data but because they want quality data that is structured to train utility.

Having zettabytes of “invisible” data is effectively pointless. You can’t train on it because there is so much of it, it’s way more expensive to train per byte because of homomorphic magic (if it’s even possible), and most importantly - it’s not quality training data!

williamtrask 9 hours ago | parent [-]

This article is meant for a policy audience, so that does keep the technical depth pretty thin. It's rooted in more rigorous deep learning work. Happy to send your way if interested.

supermatt 9 hours ago | parent [-]

Posting info on that “rigorous deep learning work” here would be more beneficial to all than just sending to me.

williamtrask 8 hours ago | parent [-]

I'm relatively close to publishing my PhD thesis which is broadly a survey paper of what you're describing. Will share (almost done with revisions).