Remix.run Logo
jubilanti 5 hours ago

I wonder when pelican riding a bicycle will be useless as an evaluation task. The point was that it was something weird nobody had ever really thought about before, not in the benchmarks or even something a team would run internally. But now I'd bet internally this is one of the new Shirley Cards.

abustamam 4 hours ago | parent | next [-]

Simon has an article on this

https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

amelius 3 hours ago | parent | prev | next [-]

Yeah try it with something else, or e.g. add a tiger to the back seat.

SwellJoe an hour ago | parent | prev | next [-]

Pelicanmaxxing

rafaelmn 4 hours ago | parent | prev | next [-]

I mean look at the result where he asked about a unicycle - the model couldn't even keep the spokes inside the wheels - would be rudimentary if it "learned" what it means to draw a bicycle wheel and could transfer that to unicycle.

duzer65657 3 hours ago | parent [-]

it's the frame that's surprisingly - and consistentnly - wrong. You'd think two triangles would be pretty easy to repro; once you get that the rest is easy. It's not like he's asking "draw a pelican on a four-bar linkage suspension mountainbike..."

Reddit_MLP2 3 hours ago | parent | next [-]

This is older, but even humans don't have a great concept of how a bicycle works... https://twistedsifter.com/2016/04/artist-asks-people-to-draw...

yndoendo 2 hours ago | parent [-]

Wouldn't this be more about being capable of mentally remembering how a bicycle looks versus how it works?

This reminds me of Pictionary. [0] Some people are good and some are really bad.

I am really bad a remembering how items look in my head and fail at drawing in Pictionary. My drawing skills are tied to being able to copy what I see.

[0] https://en.wikipedia.org/wiki/Pictionary

quinnjh an hour ago | parent | prev [-]

is it possible to have greater success with the specificity? I don't think i ever drew a bike frame properly as a kid despite riding them and understanding the concept of spokes and wheels...

MagicMoonlight 4 hours ago | parent | prev [-]

They’ll hardcode it in 4.8, just like they do when they need to “fix” other issues