Remix.run Logo
Manabu-eo 5 hours ago

How likely this problem is already on the training set by now?

simonw 5 hours ago | parent | next [-]

If anyone trains a model on https://simonwillison.net/tags/pelican-riding-a-bicycle/ they're going to get some VERY weird looking pelicans.

suddenlybananas 4 hours ago | parent [-]

Why would they train on that? Why not just hire someone to make a few examples.

simonw 4 hours ago | parent [-]

I look forward to them trying. I'll know when the pelican riding a bicycle is good but the ocelot riding a skateboard sucks.

suddenlybananas 4 hours ago | parent [-]

But they could just train on an assortment of animals and vehicles. It's the kind of relatively narrow domain where NNs could reasonably interpolate.

simonw 4 hours ago | parent [-]

The idea that an AI lab would pay a small army of human artists to create training data for $animal on $transport just to cheat on my stupid benchmark delights me.

suddenlybananas 4 hours ago | parent [-]

When you're spending trillions on capex, paying a couple of people to make some doodles in SVGs would not be a big expense.

simonw 3 hours ago | parent [-]

The embarrassment of getting caught doing that would be expensive.

throwup238 5 hours ago | parent | prev | next [-]

For every combination of animal and vehicle? Very unlikely.

The beauty of this benchmark is that it takes all of two seconds to come up with your own unique one. A seahorse on a unicycle. A platypus flying a glider. A man’o’war piloting a Portuguese man of war. Whatever you want.

recursive 5 hours ago | parent [-]

No, not every combination. The question is about the specific combination of a pelican on a bicycle. It might be easy to come up with another test, but we're looking at the results from a particular one here.

svara 5 hours ago | parent [-]

More likely you would just train for emitting svg for some description of a scene and create training data from raster images.

zarzavat 5 hours ago | parent | prev | next [-]

You can always ask for a tyrannosaurus driving a tank.

verdverm 5 hours ago | parent | prev | next [-]

I've heard it posited that the reason the frontier companies are frontier is because they have custom data and evals. This is what I would do too

5 hours ago | parent | prev [-]
[deleted]