Remix.run Logo
verve_rat 2 hours ago

My theory is we will end up in a similar spot to hiring people. You can look at a CV (benchmarks) but you won't know for sure until you've worked with them for six months.

We as an industry cannot determine if one software engineer is objectively better than another, on practically any dimension, so why do we think we can come to an objective ranking of models?

roymain an hour ago | parent | next [-]

The CV-to-six-months analogy is actually exactly right and it's also why benchmarks for hiring people stopped being useful. The signal that holds up is what you see when something breaks, which is hard to compress into a number.

zelphirkalt 2 hours ago | parent | prev | next [-]

Not many things are as manifold broken as hiring these days. I hope we do not end up there.

pishpash 2 hours ago | parent | prev [-]

You do not interview 1000 rounds on problems you're actually solving. If you did, hiring would be fine. Minus the social fit aspect, which isn't as relevant for a model.