Remix.run Logo
VibeBench: Measuring 1k Engineers' Opinions of New Models(vibebench.standardagents.ai)
12 points by jpschroeder 2 days ago | 4 comments
mhi3 2 days ago | parent | next [-]

"Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."

Is this true?

But I love this concept!

jpschroeder 2 days ago | parent [-]

Oh very true. Benchmaxxing itself is basically gaming them.

ramon156 a day ago | parent | prev | next [-]

Love the idea!

Page is incredibly slow on mobile, probably the avatars

memoryleakgame 2 days ago | parent | prev [-]

800 commits in a year...