| ▲ | Manabu-eo 5 hours ago |
| How likely this problem is already on the training set by now? |
|
| ▲ | simonw 5 hours ago | parent | next [-] |
| If anyone trains a model on https://simonwillison.net/tags/pelican-riding-a-bicycle/ they're going to get some VERY weird looking pelicans. |
| |
| ▲ | suddenlybananas 4 hours ago | parent [-] | | Why would they train on that? Why not just hire someone to make a few examples. | | |
| ▲ | simonw 4 hours ago | parent [-] | | I look forward to them trying. I'll know when the pelican riding a bicycle is good but the ocelot riding a skateboard sucks. | | |
| ▲ | suddenlybananas 4 hours ago | parent [-] | | But they could just train on an assortment of animals and vehicles. It's the kind of relatively narrow domain where NNs could reasonably interpolate. | | |
| ▲ | simonw 4 hours ago | parent [-] | | The idea that an AI lab would pay a small army of human artists to create training data for $animal on $transport just to cheat on my stupid benchmark delights me. | | |
| ▲ | suddenlybananas 4 hours ago | parent [-] | | When you're spending trillions on capex, paying a couple of people to make some doodles in SVGs would not be a big expense. | | |
| ▲ | simonw 3 hours ago | parent [-] | | The embarrassment of getting caught doing that would be expensive. |
|
|
|
|
|
|
|
| ▲ | throwup238 5 hours ago | parent | prev | next [-] |
| For every combination of animal and vehicle? Very unlikely. The beauty of this benchmark is that it takes all of two seconds to come up with your own unique one. A seahorse on a unicycle. A platypus flying a glider. A man’o’war piloting a Portuguese man of war. Whatever you want. |
| |
| ▲ | recursive 5 hours ago | parent [-] | | No, not every combination. The question is about the specific combination of a pelican on a bicycle. It might be easy to come up with another test, but we're looking at the results from a particular one here. | | |
| ▲ | svara 5 hours ago | parent [-] | | More likely you would just train for emitting svg for some description of a scene and create training data from raster images. |
|
|
|
| ▲ | zarzavat 5 hours ago | parent | prev | next [-] |
| You can always ask for a tyrannosaurus driving a tank. |
|
| ▲ | verdverm 5 hours ago | parent | prev | next [-] |
| I've heard it posited that the reason the frontier companies are frontier is because they have custom data and evals. This is what I would do too |
|
| ▲ | 5 hours ago | parent | prev [-] |
| [deleted] |