Remix.run Logo
Stevvo 4 days ago

The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect.

golly_ned 4 days ago | parent | next [-]

Compared to the other benchmarks which are much more gameable, I trust PelicanBikeEval way more.

4 days ago | parent | next [-]
[deleted]
refulgentis 4 days ago | parent | prev [-]

[flagged]

getnormality 4 days ago | parent | prev | next [-]

Well, the variance is itself interesting.

throwaway102398 4 days ago | parent | prev [-]

[dead]