Remix.run Logo
sempron64 2 hours ago

The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.

tripleee 2 hours ago | parent | next [-]

I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.

yreg 42 minutes ago | parent | next [-]

I really don't understand what's interesting about this test and why is it always on top.

depr 15 minutes ago | parent | next [-]

Same reason you would always see the same top comments on reddit during a certain era.

simonw 41 minutes ago | parent | prev [-]

It's funny.

scrollaway an hour ago | parent | prev | next [-]

Do you seriously have a dedicated “bad takes on AI” hn account?

tripleee 18 minutes ago | parent [-]

yeah, although I do combine it with "replies to snarky questions" for efficiency

jurgenaut23 an hour ago | parent | prev [-]

True that

h4ny 15 minutes ago | parent | prev | next [-]

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

fwipsy 11 minutes ago | parent [-]

SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.

quantumwoke an hour ago | parent | prev [-]

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.